Title
Using Taxonomy, Discriminants, and Signatures for Navigating in Text Databases
Abstract
We explore how to organize a text database hierarchically to aid better searching and browsing. We propose to exploit the natural hierarchy of topics, or taxonomy, that many corpora, such as internet directories, digital libraries, and patent databases enjoy. In our system, the user navigates through the query response not as a flat unstructured list, but embedded in the familiar taxonomy, and annotated with document signatures computed dynamically with respect to where the user is located at any time. We show how to update such databases with new documents with high speed and accuracy. We use techniques from statistical pattern recognition to efficiently separate the feature words or discriminants from the noise words st each node of the taxonomy. Using these, we build a multi-level classifier. At each node, this classifier can ignore the large number of noise words in a document. Thus the classifier has a small model size and is very fast. However, owing to the use of context-sensitive features, it classifier is very accurate. We report on experiences with the Reuters newswire benchmark, the US Patent database, and web document samples from Yahoo!.
Year
Venue
Keywords
1997
VLDB
text databases,digital library
Field
DocType
ISBN
Data mining,Web document,Information retrieval,Computer science,Exploit,Digital library,Classifier (linguistics),Hierarchy,Database
Conference
1-55860-470-7
Citations 
PageRank 
References 
84
116.05
19
Authors
4
Name
Order
Citations
PageRank
S. Chakrabarti14703999.55
Byron Dom22600825.93
Rakesh Agrawal3297515959.33
Prabhakar Raghavan4133512776.61