Abstract | ||
---|---|---|
In large IR systems, information about word occurrence may be stored in the form of a bit matrix, with rows corresponding to different words and columns to documents. Such a matrix is generally very large and very sparse. New methods for compressing such matrices are presented, which exploit possible correlations between rows and between columns. The methods are based on partitioning the matrix into small blocks and predicting the 1-bit distribution within a block by means of various bit generation models. Each block is then encoded using Huffman or arithmetic coding. The methods also use a new way of enumerating subsets of fixed size from a given superset. Preliminary experimental results indicate improvements over previous methods. |
Year | DOI | Venue |
---|---|---|
1992 | 10.1016/0306-4573(92)90065-8 | Inf. Process. Manage. |
Keywords | Field | DocType |
systematic approach,bitmap generation,mathematical formulas,matrices,correlation,comparative analysis,information processing,information retrieval,coding | Row,Data mining,Subset and superset,Computer science,Matrix (mathematics),Algorithm,Coding (social sciences),Huffman coding,Bitmap,Data compression,Arithmetic coding | Journal |
Volume | Issue | ISSN |
28 | 6 | Information Processing and Management |
Citations | PageRank | References |
3 | 0.66 | 10 |
Authors | ||
2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Abraham Bookstein | 1 | 710 | 480.57 |
Shmuel T. Klein | 2 | 434 | 77.80 |