Title
Predicate-based indexing for desktop search
Abstract
Google and other products have revolutionized the way we search for information. There are, however, still a number of research challenges. One challenge that arises specifically in desktop search is to exploit the structure and semantics of documents, as defined by the application program that generated the data (e.g., Word, Excel, or Outlook). The current generation of search products does not understand these structures and therefore often returns wrong results. This paper shows how today's search technology can be extended in order to take the specific semantics of certain structures into account. The key idea is to extend inverted file index structures with predicates which encode the circumstances under which certain keywords of a document become visible to a user. This paper provides a framework that allows to express the semantics of structures in documents and algorithms to construct enhanced, predicate-based indexes. Furthermore, this paper shows how keyword and phrase queries can be processed efficiently on such enhanced indexes. It is shown that the proposed approach has superior retrieval performance with regard to both recall and precision and has tolerable space and query running time overheads.
Year
DOI
Venue
2010
10.1007/s00778-010-0187-5
The Vldb Journal
Keywords
Field
DocType
Search,Application data,Desktop search,Information retrieval,Databases
Inverted index,Desktop search,File format,Data mining,Information retrieval,Phrase search,Computer science,Precision and recall,Document Structure Description,Search engine indexing,Semantics,Database
Journal
Volume
Issue
ISSN
19
5
1066-8888
Citations 
PageRank 
References 
2
0.39
31
Authors
3
Name
Order
Citations
PageRank
Cristian Duda1243.27
Donald Kossmann26220603.55
Chong Zhou3598.57