Finding recurrent sources in sequences - Citegraph

Paper Info

Title
Finding recurrent sources in sequences

Abstract
Many genomic sequences and, more generally, (multivariate) time series display tremendous variability. However, often it is reasonable to assume that the sequence is actually generated by or assembled from a small number of sources, each of which might contribute several segments to the sequence. That is, there are h hidden sources such that the sequence can be written as a concatenation of k h pieces, each of which stems from one of the h sources. We define this (k,h)-segmentation problem and show that it is NP-hard in the general case. We give approximation algorithms achieving approximation ratios of 3 for the L1 error measure and √5 for the L2 error measure, and generalize the results to higher dimensions. We give empirical results on real (chromosome 22) and artificial data showing that the methods work well in practice.

Year	DOI	Venue
2003	10.1145/640075.640091	RECOMB
Keywords	Field	DocType
recurrent source,l2 error measure,empirical result,approximation ratio,artificial data,h hidden source,k h piece,h source,genomic sequence,approximation algorithm,l1 error measure,treewidth,probabilistic algorithms,genome sequence,combinatorial optimization,bayesian networks,greedy algorithms	Small number,Approximation algorithm,Combinatorics,Greedy algorithm,Probabilistic analysis of algorithms,Combinatorial optimization,Bayesian network,Concatenation,Bioinformatics,Treewidth,Mathematics	Conference
ISBN	Citations	PageRank
1-58113-635-8	26	3.45
References	Authors
11	2

Authors (2 rows)

Cited by (26 rows)

References (11 rows)

Name	Order	Citations	PageRank
Aristides Gionis	1	6808	386.81
Heikki Mannila	2	6595	1495.69

1