Title
Classifying high dimensional data by interactive visual analysis
Abstract
Data mining techniques such as classification algorithms are applied to data which are usually high dimensional and very large. In order to assist the user to perform a classification task, visual techniques can be employed to represent high dimensional data in a more comprehensible 2D or 3D space. However, such representation of high dimensional data in the 2D or 3D space may unavoidably cause overlapping data and information loss. This issue can be addressed by interactive visualization. With expert domain knowledge, the user can build classifiers that are as competitive as automated ones using a 2D or 3D visual interface interactively. Several visual techniques have been proposed for classifying high dimensional data. However, the user's interaction with those techniques is highly dependent on the experience of the user in the visual identification of classifying data, and as a result, the classification results of those techniques may vary and may not be repeatable. To address this deficiency, this article presents an interactive visual approach to the classification of high dimensional data. Our approach employs the enhanced separation feature of a visual technique called HOV3 by which the user plots the training dataset by applying statistical measurements on a 2D space in order to separate data points into groups with the same class labels. A data group with its corresponding statistical measurement which separated it from the others is taken as a visual classifier. Then the user mixes the data points in a classifier with the unlabeled dataset and plots them in HOV3 by the measurement of the classifier. The data points which overlap the labeled ones in the 2D space are assigned the corresponding label. Our approach avoids the randomness in the existing interactive visual classification techniques, as the visual classifier in this approach only depends on the training dataset and its statistical measurement. As a result, this work provides an intuitive and effective approach to classify high dimensional data by interactive visualization.
Year
DOI
Venue
2016
10.1016/j.jvlc.2015.11.003
Journal of Visual Languages and Computing
Keywords
Field
DocType
classification
Data point,Data mining,Clustering high-dimensional data,Pattern recognition,Domain knowledge,Computer science,Interactive visual analysis,Interactive visualization,Visual approach,Artificial intelligence,Classifier (linguistics),Statistical classification
Journal
Volume
Issue
ISSN
33
C
1045-926X
Citations 
PageRank 
References 
1
0.35
14
Authors
4
Name
Order
Citations
PageRank
Ke-Bing Zhang1356.06
Mehmet A. Orgun21366155.15
Rajan Shankaran311315.86
Du Zhang428542.16