Abstract | ||
---|---|---|
We propose a novel graph-based approach for constructing concept hierarchy from a large text corpus. Our algorithm incorporates both statistical co-occurrences and lexical similarity in optimizing the structure of the taxonomy. To automatically generate topic-dependent taxonomies from a large text corpus, we first extracts topical terms and their relationships from the corpus. The algorithm then constructs a weighted graph representing topics and their associations. A graph partitioning algorithm is then used to recursively partition the topic graph into a taxonomy. For evaluation, we apply our approach to articles, primarily computer science, in the CiteSeerX digital library and search engine. |
Year | DOI | Venue |
---|---|---|
2015 | 10.1145/2756406.2756967 | ACM/IEEE Joint Conference on Digital Libraries |
Field | DocType | Citations |
Lexical similarity,Search engine,Computer science,Text corpus,Theoretical computer science,Artificial intelligence,Natural language processing,Digital library,Partition (number theory),Graph partition,Recursion,Graph (abstract data type) | Conference | 0 |
PageRank | References | Authors |
0.34 | 3 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Pucktada Treeratpituk | 1 | 177 | 11.12 |
Madian Khabsa | 2 | 237 | 18.81 |
C. Lee Giles | 3 | 11154 | 1549.48 |