Title
Supporting sub-document updates and queries in an inverted index
Abstract
Inverted indexes have become the standard indexing method for supporting search queries in a variety of content-based applications. Examples of such applications include enterprise document management, e-mail, web search, and social networks. One shortcoming in current inverted index designs is that they support only document-level updates, forcing a full document to be reindexed even if just part of it changes. This paper describes a new inverted index design that enables applications to break a document into semantically meaningful sub-documents or "sections". Each section of a document can be updated separately, but search queries can still work seamlessly across sections. Our index design is motivated by applications where there is metadata associated with each document that tends to be smaller and more frequently updated than the document's content, but at the same time, it is desireable to search the metadata and content with the same index structure. A novel self-optimizing query execution algorithm is described to efficiently join the sections of a document in the inverted index. Experimental results on TREC and patent data are provided, showing that sections can dramatically improve overall system throughput on a mixed workload of updates and queries.
Year
DOI
Venue
2008
10.1145/1458082.1458171
CIKM
Keywords
Field
DocType
document-level updates,full document,index design,current inverted index design,search query,new inverted index design,index structure,sub-document updates,web search,enterprise document management,inverted index,section,zig zag,indexation,document management,social network
Inverted index,Data mining,Metadata,Information retrieval,Document management system,Workload,Computer science,Search engine indexing,Throughput,Document retrieval,Design Document Listing,Database
Conference
Citations 
PageRank 
References 
1
0.36
19
Authors
5
Name
Order
Citations
PageRank
Vuk Ercegovac159234.79
Vanja Josifovski22265148.84
Ning Li310.36
Mauricio R. Mediano410.36
Eugene J. Shekita53630574.21