Abstract | ||
---|---|---|
Many applications require matching objects to a predefined, yet highly dynamic set of categories accompanied by category descriptions. We present a novel approach to solving this class of categorization problems by formulating it in a semi-supervised clustering framework. Text-based matching is performed to generate ldquosoftrdquo seeds, which are then used to guide clustering in the basic feature space. We introduce a new variation of the k-means algorithm, called Soft Seeded k-means, which can effectively incorporate seeds that are of varying degrees of confidence, while allowing for incomplete coverage of the pre-defined categories. The algorithm is applied to real-world data from a business analytics application, and we demonstrate that it leads to superior performance compared to previous approaches. |
Year | DOI | Venue |
---|---|---|
2008 | 10.1109/ICPR.2008.4761253 | ICPR |
Keywords | Field | DocType |
pattern clustering,objects matching,semisupervised clustering,business analytics application,learning (artificial intelligence),text-based matching,soft seeded k-means,category descriptions,k-means algorithm,categorization,feature space,text analysis,k means,noise,k means algorithm,clustering algorithms,accuracy,algorithm design and analysis,servers,business,labeling,learning artificial intelligence | Canopy clustering algorithm,Fuzzy clustering,Categorization,Feature vector,Algorithm design,Correlation clustering,Pattern recognition,Computer science,Artificial intelligence,Conceptual clustering,Cluster analysis,Machine learning | Conference |
ISSN | ISBN | Citations |
1051-4651 E-ISBN : 978-1-4244-2175-6 | 978-1-4244-2175-6 | 3 |
PageRank | References | Authors |
0.54 | 7 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jianying Hu | 1 | 478 | 35.52 |
Moninder Singh | 2 | 381 | 105.12 |
Aleksandra Mojsilovic | 3 | 288 | 39.15 |