Title
Efficient Visualization Of Large-Scale Data Tables Through Reordering And Entropy Minimization
Abstract
Visualization of data tables with n examples and m columns using heatmaps provides a holistic view of the original data. As there are n! ways to order rows and m! ways to order columns, and data tables are typically ordered without regard to visual inspection, heatmaps of the original data tables often appear as noisy images. However, if rows and columns of a data table are ordered such that similar rows and similar columns are grouped together, a heatmap may provide a deep insight into the underlying data distribution. We propose an information-theoretic approach to produce a well-ordered data table. In particular, we search for ordering that minimizes entropy of residuals of predictive coding applied on the ordered data table. This formalization leads to a novel ordering procedure, EM-ordering, that can be applied separately on rows and columns. For ordering of rows, EM-ordering repeats until convergence the steps of (1) rescaling columns and (2) solving a Traveling Salesman Problem (TSP) where rows are treated as cities. To allow fast ordering of large data tables, we propose an efficient TSP heuristic with modest O(n log(n)) time complexity. When compared to the existing state-of-the-art reordering approaches, we show that the method often provides heatmaps of higher visual quality, while being significantly more scalable. Moreover, analysis of real-world traffic and financial data sets using the proposed method, which allowed us to readily gain deeper insights about the data, further confirmed that EM-ordering can be a valuable tool for visual exploration of large-scale data sets.
Year
DOI
Venue
2013
10.1109/ICDM.2013.63
2013 IEEE 13TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)
Keywords
Field
DocType
data visualisation,computational complexity,entropy
Data mining,Data set,Computer science,Theoretical computer science,Artificial intelligence,Time complexity,Cluster analysis,Row,Data visualization,Heuristic,Visualization,Table (information),Machine learning
Conference
ISSN
Citations 
PageRank 
1550-4786
0
0.34
References 
Authors
22
2
Name
Order
Citations
PageRank
Nemanja Djuric135225.83
Slobodan Vucetic263756.38