Title | ||
---|---|---|
DNA sequence compression using the normalized maximum likelihood model for discrete regression |
Abstract | ||
---|---|---|
We discuss how to use the normalized maximum likelihood (NML) model for encodingsequences known to have regularities in the form of approximate repetitions. We present aparticular version of the NML model for discrete regression, which is shown to provide avery powerful yet simple model for encoding the approximate repeats in DNA sequences.Combining the model of repeats with a simple first order Markov model we obtain a fastlossless compression method, which compares favorably with the existing DNA compressionprograms. It is remarkable that a simple model, which recursively updates a small numberof parameters, is able to reach the state of the art compression ratio for DNA sequencesobtained with much more complex models. Being a minimum description length (MDL)model, the NML model may later prove to be useful in studying global and local featuresof DNA or possibly of other biological sequences. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1109/DCC.2003.1194016 | DCC |
Keywords | Field | DocType |
art compression ratio,complex model,order markov model,dna sequence,simple model,normalized maximum likelihood model,existing dna compressionprograms,nml model,approximate repetition,local featuresof dna,discrete regression,dna sequence compression,approximate repeat,history,maximum likelihood estimation,lossless compression,sequences,markov model,first order,data compression,entropy,compression ratio,minimum description length,dictionaries,dna,encoding,markov processes | Markov process,Regression,Markov model,Minimum description length,Theoretical computer science,Compression ratio,Data compression,Mathematics,Lossless compression,Encoding (memory) | Conference |
ISSN | ISBN | Citations |
1068-0314 | 0-7695-1896-6 | 19 |
PageRank | References | Authors |
1.10 | 10 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ioan Tabus | 1 | 276 | 38.23 |
Gergely Korodi | 2 | 78 | 5.57 |
Jorma Rissanen | 3 | 1665 | 798.14 |