Title
Categorization using semi-supervised clustering
Abstract
Many applications require matching objects to a predefined, yet highly dynamic set of categories accompanied by category descriptions. We present a novel approach to solving this class of categorization problems by formulating it in a semi-supervised clustering framework. Text-based matching is performed to generate ldquosoftrdquo seeds, which are then used to guide clustering in the basic feature space. We introduce a new variation of the k-means algorithm, called Soft Seeded k-means, which can effectively incorporate seeds that are of varying degrees of confidence, while allowing for incomplete coverage of the pre-defined categories. The algorithm is applied to real-world data from a business analytics application, and we demonstrate that it leads to superior performance compared to previous approaches.
Year
DOI
Venue
2008
10.1109/ICPR.2008.4761253
ICPR
Keywords
Field
DocType
pattern clustering,objects matching,semisupervised clustering,business analytics application,learning (artificial intelligence),text-based matching,soft seeded k-means,category descriptions,k-means algorithm,categorization,feature space,text analysis,k means,noise,k means algorithm,clustering algorithms,accuracy,algorithm design and analysis,servers,business,labeling,learning artificial intelligence
Canopy clustering algorithm,Fuzzy clustering,Categorization,Feature vector,Algorithm design,Correlation clustering,Pattern recognition,Computer science,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning
Conference
ISSN
ISBN
Citations 
1051-4651 E-ISBN : 978-1-4244-2175-6
978-1-4244-2175-6
3
PageRank 
References 
Authors
0.54
7
3
Name
Order
Citations
PageRank
Jianying Hu147835.52
Moninder Singh2381105.12
Aleksandra Mojsilovic328839.15