Abstract | ||
---|---|---|
We provide crucial insights into a recently proposed Shannon-type entropy balance equation for multivariate joint distributions.The decomposition can be plotted in an entropy ternary diagram.Each axis of the ternary diagram provides specific information about the distributions.We use both tools in the exploratory analysis of machine learning datasets.These tools are applicable to supervised and unsupervised tasks. We introduce from first principles an analysis of the information content of multivariate distributions as information sources. Specifically, we generalize a balance equation and a visualization device, the Entropy Triangle, for multivariate distributions and find notable differences with similar analyses done on joint distributions as models of information channels.As an example application, we extend a framework for the analysis of classifiers to also encompass the analysis of data sets. With such tools we analyze a handful of UCI machine learning task to start addressing the question of how well do datasets convey the information they are supposed to capture about the phenomena they stand for. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1016/j.eswa.2017.02.010 | Expert Syst. Appl. |
Keywords | Field | DocType |
Machine learning evaluation,Dataset entropy,Multivariate entropy,Entropic measures,Exploratory analysis,Entropy ternary diagram,Entropy balance equation | Cross entropy,Data mining,Transfer entropy,Data analysis,Computer science,Multivariate statistics,Information diagram,Joint entropy,Artificial intelligence,Principle of maximum entropy,Conditional entropy,Machine learning | Journal |
Volume | Issue | ISSN |
78 | C | 0957-4174 |
Citations | PageRank | References |
1 | 0.36 | 9 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Francisco J. Valverde-Albacete | 1 | 116 | 20.84 |
Carmen Peláez-moreno | 2 | 130 | 22.07 |