Title
Classification and outlier detection based on topic based pattern synthesis
Abstract
In several pattern classification problems, we encounter training datasets with an imbalanced class distribution and the presence of outliers, which can hinder the performance of classifiers. In this paper, we propose classification schemes based on the pre-processing of data using Novel Pattern Synthesis (NPS), with the aim to improve performance on such datasets. We provide a formal framework for characterizing the class imbalance and outlier elimination. Specifically, we look into the role of NPS in: Outlier elimination and handling class imbalance problem. In NPS, for every pattern its k-nearest neighbours are found and a weighted average of the neighbours is taken to form a synthesized pattern. It is found that the classification accuracy of minority class increases in the presence of synthesized patterns. However, finding nearest neighbours in high-dimensional datasets is challenging. Hence, we make use of Latent Dirichlet Allocation to reduce the dimensionality of the dataset. An extensive experimental evaluation carried out on 25 real-world imbalanced datasets shows that pre-processing of data using NPS is effective and has a greater impact on the classification accuracy over minority class for imbalanced learning. We also observed that NPS outperforms the state-of-the-art methods for imbalanced classification. Experiments on 9 real-world datasets with outliers, demonstrate that NPS approach not only substantially increases the detection performance, but is also relatively scalable in large datasets in comparison to the state-of-the-art outlier detection methods.
Year
DOI
Venue
2013
10.1007/978-3-642-39712-7_8
MLDM
Keywords
Field
DocType
outlier detection,real-world datasets,training datasets,real-world imbalanced datasets shows,high-dimensional datasets,synthesized pattern,outlier elimination,classification accuracy,large datasets,class imbalance,nps approach,pattern synthesis,dimensionality reduction,latent dirichlet allocation,classification
Data mining,Anomaly detection,Latent Dirichlet allocation,One-class classification,Dimensionality reduction,Computer science,Artificial intelligence,Pattern recognition,Outlier,Curse of dimensionality,Pattern synthesis,Machine learning,Scalability
Conference
Citations 
PageRank 
References 
1
0.35
32
Authors
2
Name
Order
Citations
PageRank
Samrat Kokkula110.35
M. Narasimha Murty282486.07