Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections - Citegraph

Paper Info

Title
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections

Abstract
A repetitive sequence collection is one where portions of a base sequence of length n are repeated many times with small variations, forming a collection of total length N . Examples of such collections are version control data and genome sequences of individuals, where the differences can be expressed by lists of basic edit operations. This paper is devoted to studying ways to store massive sets of highly repetitive sequence collections in space-efficient manner so that retrieval of the content as well as queries on the content of the sequences can be provided time-efficiently. We show that the state-of-the-art entropy-bound full-text self-indexes do not yet provide satisfactory space bounds for this specific task. We engineer some new structures that use run-length encoding and give empirical evidence that these structures are superior to the current structures.

Year	DOI	Venue
2008	10.1007/978-3-540-89097-3_17	SPIRE
Keywords	Field	DocType
run-length compressed indexes,genome sequence,total length,current structure,massive set,length n,base sequence,repetitive sequence collection,run-length encoding,highly repetitive sequence collections,new structure,empirical evidence,run length encoding,indexation,preprint,version control	Genome,Information retrieval,Computer science,Wavelet Tree,Revision control,Preprint,Encoding (memory)	Conference
Volume	ISSN	Citations
5280	0302-9743	33
PageRank	References	Authors
1.37	19	4

Authors (4 rows)

Cited by (33 rows)

References (19 rows)

Name	Order	Citations	PageRank
Jouni Sirén	1	222	14.85
Niko Välimäki	2	276	14.53
Veli Mäkinen	3	1583	85.29
Gonzalo Navarro	4	6088	345.16

1