Title
Strategies for Short Text Representation in the Word Vector Space
Abstract
Short texts are present in many computer systems. Examples include social media messages, advertisement, Q&A websites, and an increasing number of other applications. They are characterized by little context words and a large vocabulary. As a consequence, traditional short text representations, such as TF and TF-IDF, have high dimensionality and are very sparse. The research field of word vectors has produced interesting word representations that are discriminative regarding semantics, which can be algebraically composed to create vector representations for paragraphs and documents. Literature reports limitations of this approach, producing the alternative Paragraph Vector method. Firstly, we investigate whether these limitations involving word vector operations are true for short text. Then, we propose a novel representation method based on the PSO meta-heuristic. Results in a document classification task are competitive with TF-IDF and show significant improvement over Paragraph Vector, with the advantage of dense and compact document vector representation.
Year
DOI
Venue
2018
10.1109/BRACIS.2018.00053
2018 7th Brazilian Conference on Intelligent Systems (BRACIS)
Keywords
Field
DocType
hort Text, Text Representation, Word Vectors, Document Vectors, Natural Language Processing, Machine Learning
Document classification,Vector space,Task analysis,Computer science,Paragraph,Natural language processing,Artificial intelligence,Artificial neural network,Discriminative model,Vocabulary,Semantics
Conference
ISBN
Citations 
PageRank 
978-1-5386-8024-7
0
0.34
References 
Authors
6
2
Name
Order
Citations
PageRank
Marcelo Pita1171.82
Gisele L. Pappa234736.97