Title
Language Modeling for Effective Construction of Domain Specific Thesauri
Abstract
In this paper we present an approach for effective construction of domain specific thesauri. We assume that the collection is partitioned into document categories. By taking advantage of these pre-defined categories, we are able to conceptualize a new topical language model to weight term topicality more accurately. With the help of information theory, interesting relationships among thesaurus elements are discovered deductively. Based on the "Layer-Seeds" clustering algorithm, topical terms from documents in a certain category will be organized according to their relationships in a tree-like hierarchical structure --- a thesaurus. Experimental results show that the thesaurus contains satisfactory structures, although it differs to some extent from a manually created thesaurus. A first evaluation of the thesaurus in a query expansion task yields evidence that an increase of recall can be achieved without loss of precision.
Year
DOI
Venue
2004
10.1007/978-3-540-27779-8_21
Lecture Notes in Computer Science
Keywords
Field
DocType
information theory,language model,query expansion
Information system,Information theory,Query expansion,Computer science,Natural language,Natural language processing,Artificial intelligence,Cluster analysis,Pointwise mutual information,Language model
Conference
Volume
ISSN
Citations 
3136
0302-9743
3
PageRank 
References 
Authors
0.52
6
2
Name
Order
Citations
PageRank
Libo Chen1123.12
Ulrich Thiel223030.13