iVA-File: Efficiently Indexing Sparse Wide Tables in Community Systems - Citegraph

Paper Info

Title
iVA-File: Efficiently Indexing Sparse Wide Tables in Community Systems

Abstract
In community web management systems (CWMS), storage structures inspired by universal tables are being used increasingly to manage sparse datasets. Such a sparse wide table (SWT) typically embodies thousands of attributes, with many of them being undefined in each tuple, and low-dimensional structured similarity search on a combination of numerical and text attributes is a common operation. However, many properties of such wide tables and their associated Web 2.0 services render most multi-dimensional indexing structures irrelevant. Recent studies in this area have mainly focused on improving the storage efficiency and efficient deployment of inverted indices; so far no new index has been proposed for indexing SWTs. The inverted index is fast for scanning but not efficient in reducing random accesses to the data file as it captures little information about the content of attribute values. In this paper, we propose the iVA-file that works on the basis of approximate contents and keeps scanning efficiency within a bounded range. We introduce the nG-signature to approximately represent data strings and improve the existing approximate vectors for numerical values. We also propose an efficient query processing strategy for the iVA-file, which is different from strategies used for existing scan-based indices. To enable the use of different metrics of distance between a query and a tuple that may vary from application to application, the iVA-file has been designed to be metric-oblivious and to provide efficient filter-and-refine search based on any rational metric. Extensive experiments on real datasets show that the iVA-file outperforms existing proposals in query efficiency significantly, at the same time, keeps a good update speed.

Year	DOI	Venue
2009	10.1109/ICDE.2009.99	ICDE
Keywords	Field	DocType
existing approximate vector,efficient deployment,efficiently indexing,efficient query processing strategy,data string,data file,storage efficiency,community systems,query efficiency,sparse wide tables,approximate content,efficient filter-and-refine search,inverted index,data structures,communication system,indexation,indexes,random access,database indexing,indexing,management system,encoding,structural similarity,index	Inverted index,Data structure,Data mining,Computer science,Tuple,Search engine indexing,Storage efficiency,Database index,Data file,Database,Nearest neighbor search	Conference
ISSN	Citations	PageRank
1084-4627	2	0.36
References	Authors
30	4

Authors (4 rows)

Cited by (2 rows)

References (30 rows)

Name	Order	Citations	PageRank
Boduo Li	1	202	8.65
Mei Hui	2	2	0.36
Jianzhong Li	3	3196	304.46
Hong Gao	4	1086	120.07

1