Title
Bit-sliced index arithmetic
Abstract
The bit-sliced index (BSI) was originally defined in [ONQ97]. The current paper introduces the concept of BSI arithmetic. For any two BSI's X and Y on a table T, we show how to efficiently generate new BSI's Z, V, and W, such that Z = X + Y, V = X - Y, and W = MIN(X, Y); this means that if a row r in T has a value x represented in BSI X and a value y in BSI Y, the value for r in BSI Z will be x + y, the value in V will be x - y and the value in W will be MIN(x, y). Since a bitmap representing a set of rows is the simplest bit-sliced index, BSI arithmetic is the most straightforward way to determine multisets of rows (with duplicates) resulting from the SQL clauses UNION ALL (addition), EXCEPT ALL (subtraction), and INTERSECT ALL (min) (see [OO00, DB2SQL] for definitions of these clauses). Another contribution of the current paper is to generalize BSI range restrictions from [ONQ97] to a new non-Boolean form: to determine the top k BSI-valued rows, for ally meaningful value k between one and the total number of rows in T. Together with bit-sliced addition, this permits us to solve a common basic problem of text retrieval: given an object-relational table T of rows representing documents, with a collection type column K representing keyword terms, we demonstrate an efficient algorithm to find k documents that share the largest number of terms with some query list Q of terms. A great deal of published work on such problems exists in the Information Retrieval (IR) field. The algorithm we introduce, which we call Bit-Sliced Term-Matching, or BSTM, uses an approach comparable in performance to the most efficient known IR algorithm, a major improvement on current DBMS text searching algorithms, with the advantage that it uses only indexing we propose for native database operations.
Year
DOI
Venue
2001
10.1145/375663.375669
SIGMOD Conference
Keywords
Field
DocType
parallel computation,olap,search algorithm,information retrieval,indexation
Row,Data mining,Text searching,Computer science,Search engine indexing,Arithmetic,Bitmap,Subtraction,Text retrieval,Database,The Intersect
Conference
Volume
Issue
ISSN
30
2
0163-5808
ISBN
Citations 
PageRank 
1-58113-332-4
16
0.98
References 
Authors
13
3
Name
Order
Citations
PageRank
Denis Rinfret1191.74
Patrick E. O'Neil21966237.34
Elizabeth J. O'Neil398773.91