Title
Mining Significant Associations in Large Scale Text Corpora
Abstract
Mining large-scale text corpora is an essential step in extracting the key themes in a corpus. We motivate a quantitative measure for significant associations through the distributions of pairs and triplets of co-occurring words. We consider the algorithmic problem of efficiently enumerating such significant associations and present pruning algorithms for these problems, with theoretical as well as empirical analyses. Our algorithms make use of two novel mining methods: (1) matrix mining, and (2) shortened documents. We present evidence from a diverse set of documents that our measure does in fact elicit interesting co-occurrences.
Year
DOI
Venue
2002
10.1109/ICDM.2002.1183933
ICDM
Keywords
Field
DocType
present evidence,distributionsof pair,significant association,co-occurring word,large scale text corpora,enumerat-ingsuch significant association,essential step,present pruning algorithmsfor,mining significant associations,algorithmic problem,matrix mining,diverse set,statistical distributions,databases,algorithm design and analysis,computer science,text analysis,text mining,association rules,data mining
Data mining,Data stream mining,Text mining,Concept mining,Computer science,Molecule mining,Text corpus
Conference
ISBN
Citations 
PageRank 
0-7695-1754-4
1
0.35
References 
Authors
16
2
Name
Order
Citations
PageRank
Prabhakar Raghavan1133512776.61
Panayiotis Tsaparas2128672.59