Title
Genome compression using normalized maximum likelihood models for constrained Markov sources
Abstract
The paper presents exact and implementable solutions to the problem of universal coding of approximate repeats by using the normalized maximum likelihood model for the class of Markov sources of first order, incorporating constraints which are standard in the context of fast searching similarities over full genomes. A coding scheme combining universal codes for memoryless sources and for sources with memory is then presented. The results when compressing the full human genome show that the combined scheme is able to provide slight improvements over the existing state of the art. As a side result, interesting pairs of sequences may be found, which are highly similar by the new NML model for Markov sources, but have a lower similarity score when evaluated with the NML for memoryless sources.
Year
DOI
Venue
2008
10.1109/ITW.2008.4578663
Porto
Keywords
DocType
ISBN
markov processes,genetic engineering,genetics,maximum likelihood estimation,markov sources,coding scheme,constrained markov sources,genome compression,memoryless sources,normalized maximum likelihood models,universal coding,human genome,first order
Conference
978-1-4244-2271-5
Citations 
PageRank 
References 
0
0.34
9
Authors
2
Name
Order
Citations
PageRank
Ioan Tabus127638.23
Gergely Korodi2785.57