Title
Classifying Extremely Short Texts by Exploiting Semantic Centroids in Word Mover's Distance Space
Abstract
Automatically classifying extremely short texts, such as social media posts and web page titles, plays an important role in a wide range of content analysis applications. However, traditional classifiers based on bag-of-words (BoW) representations often fail in this task. The underlying reason is that the document similarity can not be accurately measured under BoW representations due to the extreme sparseness of short texts. This results in significant difficulty to capture the generality of short texts. To address this problem, we use a better regularized word mover's distance (RWMD), which can measure distances among short texts at the semantic level. We then propose a RWMD-based centroid classifier for short texts, named RWMD-CC. Basically, RWMD-CC computes a representative semantic centroid for each category under the RWMD measure, and predicts test documents by finding the closest semantic centroid. The testing is much more efficient than the prior art of K nearest neighbor classifier based on WMD. Experimental results indicate that our RWMD-CC can achieve very competitive classification performance on extremely short texts.
Year
DOI
Venue
2019
10.1145/3308558.3313397
WWW '19: The Web Conference on The World Wide Web Conference WWW 2019
Keywords
Field
DocType
Extremely Short Texts, Hypothesis Margin, Regularized Word Mover's Distance, Semantic Centroid
k-nearest neighbors algorithm,Content analysis,Social media,Web page,Computer science,Artificial intelligence,Natural language processing,Classifier (linguistics),Document similarity,Generality,Machine learning,Centroid
Conference
ISBN
Citations 
PageRank 
978-1-4503-6674-8
0
0.34
References 
Authors
0
3
Name
Order
Citations
PageRank
Changchun Li1102.66
Jihong OuYang29415.66
Ximing Li34413.97