Abstract | ||
---|---|---|
In this paper we present an approach for effective construction of domain specific thesauri. We assume that the collection is partitioned into document categories. By taking advantage of these pre-defined categories, we are able to conceptualize a new topical language model to weight term topicality more accurately. With the help of information theory, interesting relationships among thesaurus elements are discovered deductively. Based on the "Layer-Seeds" clustering algorithm, topical terms from documents in a certain category will be organized according to their relationships in a tree-like hierarchical structure --- a thesaurus. Experimental results show that the thesaurus contains satisfactory structures, although it differs to some extent from a manually created thesaurus. A first evaluation of the thesaurus in a query expansion task yields evidence that an increase of recall can be achieved without loss of precision. |
Year | DOI | Venue |
---|---|---|
2004 | 10.1007/978-3-540-27779-8_21 | Lecture Notes in Computer Science |
Keywords | Field | DocType |
information theory,language model,query expansion | Information system,Information theory,Query expansion,Computer science,Natural language,Natural language processing,Artificial intelligence,Cluster analysis,Pointwise mutual information,Language model | Conference |
Volume | ISSN | Citations |
3136 | 0302-9743 | 3 |
PageRank | References | Authors |
0.52 | 6 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Libo Chen | 1 | 12 | 3.12 |
Ulrich Thiel | 2 | 230 | 30.13 |