Title
Finding recurrent sources in sequences
Abstract
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k h pieces, each of which stems from one of the h sources. We define this (k,h)-segmentation problem and show that it is NP-hard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice.
Year
DOI
Venue
2003
10.1145/640075.640091
RECOMB
Keywords
Field
DocType
recurrent source,l2 error measure,empirical result,approximation ratio,artificial data,h hidden source,k h piece,h source,genomic sequence,approximation algorithm,l1 error measure,treewidth,probabilistic algorithms,genome sequence,combinatorial optimization,bayesian networks,greedy algorithms
Small number,Approximation algorithm,Combinatorics,Greedy algorithm,Probabilistic analysis of algorithms,Combinatorial optimization,Bayesian network,Concatenation,Bioinformatics,Treewidth,Mathematics
Conference
ISBN
Citations 
PageRank 
1-58113-635-8
26
3.45
References 
Authors
11
2
Name
Order
Citations
PageRank
Aristides Gionis16808386.81
Heikki Mannila265951495.69