Title
The evaluation of data sources using multivariate entropy tools.
Abstract
We provide crucial insights into a recently proposed Shannon-type entropy balance equation for multivariate joint distributions.The decomposition can be plotted in an entropy ternary diagram.Each axis of the ternary diagram provides specific information about the distributions.We use both tools in the exploratory analysis of machine learning datasets.These tools are applicable to supervised and unsupervised tasks. We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels.As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for.
Year
DOI
Venue
2017
10.1016/j.eswa.2017.02.010
Expert Syst. Appl.
Keywords
Field
DocType
Machine learning evaluation,Dataset entropy,Multivariate entropy,Entropic measures,Exploratory analysis,Entropy ternary diagram,Entropy balance equation
Cross entropy,Data mining,Transfer entropy,Data analysis,Computer science,Multivariate statistics,Information diagram,Joint entropy,Artificial intelligence,Principle of maximum entropy,Conditional entropy,Machine learning
Journal
Volume
Issue
ISSN
78
C
0957-4174
Citations 
PageRank 
References 
1
0.36
9
Authors
2
Name
Order
Citations
PageRank
Francisco J. Valverde-Albacete111620.84
Carmen Peláez-moreno213022.07