Lexical Comparison Between Wikipedia And Twitter Corpora By Using Word Embeddings - Citegraph

Paper Info

Title
Lexical Comparison Between Wikipedia And Twitter Corpora By Using Word Embeddings

Abstract
Compared with carefully edited prose, the language of social media is informal in the extreme. The application of NLP techniques in this context may require a better understanding of word usage within social media. In this paper, we compute a word embedding for a corpus of tweets, comparing it to a word embedding for Wikipedia. After learning a transformation of one vector space to the other, and adjusting similarity values according to term frequency, we identify words whose usage differs greatly between the two corpora. For any given word, the set of words closest to it in a particular embedding provides a characterization for that word's usage within the corresponding corpora.

Year	Venue	DocType
2015	PROCEEDINGS OF THE 53RD ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL) AND THE 7TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (IJCNLP), VOL 2	Conference
Volume	Citations	PageRank
P15-2	6	0.47
References	Authors
13	4

Authors (4 rows)

Cited by (6 rows)

References (13 rows)

Name	Order	Citations	PageRank
Luchen Tan	1	55	9.04
Haotian Zhang	2	294	23.41
Charles L.A. Clarke	3	3289	286.78
Mark D. Smucker	4	948	60.04

1