Title
eDARA: Ensembles DARA
Abstract
The ever-growing amount of digital data stored in relational databases resulted in the need for new approaches to extract useful information from these databases. One of those approaches, the DARA algorithm, is designed to transform data stored in relational databases into a vector space representation utilising information retrieval theory. The DARA algorithm has shown to produce improvements over other state-of-the-art approaches. However, the DARA suffers a major drawback when the cardinality of attributes in relations are very high. This is because the size of the vector space representation depends on the number of unique values of all attributes in the dataset. This issue can be solved by reducing the number of features generated from the DARA transformation process by selecting only part of the relevant features to be processed. Since relational data is transformed into a vector space representation in the form of TF-IDF, only numerical values will be used to represent each record. As a result, discretizing these numerical attributes may also reduce the dimensionality of the transformed dataset. When clustering is applied to these datasets, clustering results of various dimensions may be produced as the number of bins used to discretize these numerical attributes is varied. From these clustering results, a final consensus clustering can be applied to produce a single clustering result which is a better fit, in some sense, than the existing clusterings. In this study, an ensemble DARA clustering approach that provides a mechanism to represent the consensus across multiple runs of a clustering algorithm on the relational datasets is proposed.
Year
DOI
Venue
2013
10.1007/978-3-642-53917-6_7
ADMA (2)
Keywords
Field
DocType
relational databases,vector space model,data mining
Fuzzy clustering,Data mining,CURE data clustering algorithm,Computer science,Artificial intelligence,Cluster analysis,Canopy clustering algorithm,Clustering high-dimensional data,Correlation clustering,Pattern recognition,Dara,Constrained clustering,Machine learning
Conference
Citations 
PageRank 
References 
0
0.34
26
Authors
3
Name
Order
Citations
PageRank
Chung Seng Kheau110.69
Rayner Alfred27115.97
HuiKeng Lau3235.43