Lexical co-occurrence, statistical significance, and word association - Citegraph

Paper Info

Title
Lexical co-occurrence, statistical significance, and word association

Abstract
Lexical co-occurrence is an important cue for detecting word associations. We propose a new measure of word association based on a new notion of statistical significance for lexical co-occurrences. Existing measures typically rely on global unigram frequencies to determine expected co-occurrence counts. Instead, we focus only on documents that contain both terms (of a candidate word-pair) and ask if the distribution of the observed spans of the word-pair resembles that under a random null model. This would imply that the words in the pair are not related strongly enough for one word to influence placement of the other. However, if the words are found to occur closer together than explainable by the null model, then we hypothesize a more direct association between the words. Through extensive empirical evaluation on most of the publicly available benchmark data sets, we show the advantages of our measure over existing co-occurrence measures.

Year	Venue	Keywords
2010	empirical methods in natural language processing	lexical co-occurrences,new measure,candidate word-pair,co-occurrence count,co-occurrence measure,direct association,new notion,statistical significance,null model,word association,lexical co-occurrence
DocType	Volume	Citations
Journal	abs/1008.5287	9
PageRank	References	Authors
0.52	15	3

Authors (3 rows)

Cited by (9 rows)

References (15 rows)

Name	Order	Citations	PageRank
Dipak L. Chaudhari	1	15	2.39
Om. P. Damani	2	213	25.79
Srivatsan Laxman	3	421	21.65

1