Data Understanding using Semi-Supervised Clustering - Citegraph

Paper Info

Title
Data Understanding using Semi-Supervised Clustering

Abstract
In the era of E-science, most scientific endeavors depend on intense data analysis to understand the underlying physical phenomenon. Predictive modeling is one of the popular machine learning tasks undertaken in such endeavors. Labeled data used for training the predictive model reflects understanding of the domain. In this paper we introduce data understanding as a computational problem and propose a solution for enhancing domain understanding based on semisupervised clustering The proposed DU-SSC (Data Understanding using SemiSupervised Clustering) algorithm is incremental, parameterless and performs single scan of data. Given labeled (training) data is discretized at user specified resolution and finer (micro) data distributions are identified within classes, along with outliers. The discovery process is based on grouping similar instances in data space, while taking into account the degree of influence each attribute exercises on the class label. Maximal Information Coefficient measure is used during similarity computations for this purpose. The study is supported by experiments and a detailed account of understanding gained is presented for two selected UCI data sets. General observations on nine other UCI datasets are presented, along with experiments that demonstrate use of discovered knowledge for improved classification.

Year	DOI	Venue
2012	10.1109/CIDU.2012.6382192	Intelligent Data Understanding
Keywords	Field	DocType
data analysis,data mining,learning (artificial intelligence),pattern classification,pattern clustering,set theory,DU-SSC algorithm,UCI data sets,computational problem,data analysis,data space,data understanding using semi-supervised clustering algorithm,e-science,finer data distribution,labeled data discretization,machine learning,maximal information coefficient measure,microdata distribution,outliers,physical phenomenon,predictive model training	Set theory,Data mining,Computational problem,Algorithm design,Pattern clustering,Computer science,Artificial intelligence,Labeled data,Merge (version control),Cluster analysis,Machine learning	Conference
ISBN	Citations	PageRank
978-1-4673-4625-2	2	0.37
References	Authors
3	4

Authors (4 rows)

Cited by (2 rows)

References (3 rows)

Name	Order	Citations	PageRank
Vasudha Bhatnagar	1	181	17.69
Rashmi Dobariyal	2	2	0.37
Priya Jain	3	2	0.71
Ashish Mahabal	4	77	10.22

1