An efficient normalized maximum likelihood algorithm for DNA sequence compression - Citegraph

Paper Info

Title
An efficient normalized maximum likelihood algorithm for DNA sequence compression

Abstract
This article presents an efficient algorithm for DNA sequence compression, which achieves the best compression ratios reported over a test set commonly used for evaluating DNA compression programs. The algorithm introduces many refinements to a compression method that combines: (1) encoding by a simple normalized maximum likelihood (NML) model for discrete regression, through reference to preceding approximate matching blocks, (2) encoding by a first order context coding and (3) representing strings in clear, to make efficient use of the redundancy sources in DNA data, under fast execution times. One of the main algorithmic features is the constraint on the matching blocks to include reasonably long contiguous matches, which not only reduces significantly the search time, but also can be used to modify the NML model to exploit the constraint for getting smaller code lengths. The algorithm handles the changing statistics of DNA data in an adaptive way and by predictively encoding the matching pointers it is successful in compressing long approximate matches. Apart from comparison with previous DNA encoding methods, we present compression results for the recently published human genome data.

Year	DOI	Venue
2005	10.1145/1055709.1055711	ACM Trans. Inf. Syst.
Keywords	Field	DocType
dna compression program,efficient normalized maximum likelihood,previous dna,approximate matching block,approximate sequence matching,compression result,human genome data,dna compression,efficient algorithm,compression ratio,compression method,normalized maximum likelihood model,dna data,dna sequence compression,human genome,first order,dna sequence	Incremental encoding,Pattern recognition,Computer science,Algorithm,Coding (social sciences),Compression ratio,Redundancy (engineering),Artificial intelligence,Data compression,Test set,Lossless compression,Encoding (memory)	Journal
Volume	Issue	ISSN
23	1	1046-8188
Citations	PageRank	References
41	1.97	15
Authors
2

Authors (2 rows)

Cited by (41 rows)

References (15 rows)

Name	Order	Citations	PageRank
Gergely Korodi	1	78	5.57
Ioan Tabus	2	276	38.23

1