Wikipedia-based smoothing for enhancing text clustering - Citegraph

Paper Info

Title
Wikipedia-based smoothing for enhancing text clustering

Abstract
The conventional algorithms for text clustering that are based on the bag of words model, fail to fully capture the semantic relations between the words. As a result, documents describing an identical topic may not be categorized into same clusters if they use different sets of words. A generic solution for this issue is to utilize background knowledge to enrich the document contents. In this research, we adopt a language modeling approach for text clustering and propose to smooth the document language models using Wikipedia articles in order to enhance text clustering performance. The contents of Wikipedia articles as well as their assigned categories are used in three different ways to smooth the document language models with the goal of enriching the document contents. Clustering is then performed on a document similarity graph constructed on the enhanced document collection. Experiment results confirm the effectiveness of the proposed methods.

Year	DOI	Venue
2011	10.1007/978-3-642-25631-8_30	AIRS
Keywords	Field	DocType
different way,words model,document language model,language modeling approach,document similarity graph,different set,text clustering,wikipedia article,wikipedia-based smoothing,enhanced document collection,document content,wikipedia,smoothing,language models	Bag-of-words model,Data mining,Fuzzy clustering,Computer science,Document clustering,Explicit semantic analysis,Artificial intelligence,Natural language processing,Cluster analysis,Language model,Information retrieval,Smoothing,Brown clustering	Conference
Volume	ISSN	Citations
7097	0302-9743	0
PageRank	References	Authors
0.34	13	2

Authors (2 rows)

Cited by (0 rows)

References (13 rows)

Name	Order	Citations	PageRank
Elahe Rahimtoroghi	1	12	2.64
Azadeh Shakery	2	342	36.50

1