Entry Pairing in Inverted File - Citegraph

Paper Info

Title
Entry Pairing in Inverted File

Abstract
This paper proposes to exploit content and usage information to rearrange an inverted index for a full-text IR system. The idea is to merge the entries of two frequently co-occurring terms, either in the collection or in the answered queries, to form a single, paired, entry. Since postings common to paired terms are not replicated, the resulting index is more compact. In addition, queries containing terms that have been paired are answered faster since we can exploit the pre-computed posting intersection. In order to choose which terms have to be paired, we formulate the term pairing problem as a Maximum-Weight Matching Graph problem, and we evaluate in our scenario efficiency and efficacy of both an exact and a heuristic solution. We apply our technique: (i ) to compact a compressed inverted file built on an actual Web collection of documents, and (ii ) to increase capacity of an in-memory posting list. Experiments showed that in the first case our approach can improve the compression ratio of up to 7.7%, while we measured a saving from 12% up to 18% in the size of the posting cache.

Year	DOI	Venue
2009	10.1007/978-3-642-04409-0_50	WISE
Keywords	Field	DocType
scenario efficiency,full-text ir system,inverted file,resulting index,compression ratio,maximum-weight matching graph problem,heuristic solution,co-occurring term,actual web collection,entry pairing,inverted index,indexation	Inverted index,Graph problem,Data mining,Heuristic,Computer science,Cache,Theoretical computer science,Pairing,Exploit,Compression ratio,Merge (version control),Database	Conference
Volume	ISSN	Citations
5802	0302-9743	6
PageRank	References	Authors
0.44	17	4

Authors (4 rows)

Cited by (6 rows)

References (17 rows)

Name	Order	Citations	PageRank
Hoang Thanh Lam	1	108	8.49
Raffaele Perego	2	1471	108.91
Nguyen Thoi Quan	3	6	0.44
Fabrizio Silvestri	4	1819	107.29

1