Monotone Scoring of Patterns with Mismatches: (Extended Abstract) - Citegraph

Paper Info

Title
Monotone Scoring of Patterns with Mismatches: (Extended Abstract)

Abstract
We study the problem of extracting, from given source x and error threshold k, substrings of x that occur unusually often in x within k substitutions or mismatches. Speciflcally, we assume that the input textstring x of n characters is produced by an i.i.d. source, and design e-cient methods for computing the probability and expected number of occurrences for substrings of x with (either exactly or up to) k mis- matches. Two related schemes are presented. In the flrst one, an O(nk) time preprocessing of x is developed that supports the following subse- quent queries: for any substring w of x arbitrarily specifled as input, the probability of occurrence of w in x within (either exactly or up to) k mismatches is reported in O(k2) time. In the second scheme, a length or length range is arbitrarily specifled, and the above probabilities are computed for all substrings of x having length in that range, in overall O(nk) time. Further, monotonicity conditions are introduced and stud- ied for probabilities and expected occurrences of a substring under unit increases in its length, allowed number of errors, or both. Over intervals of constant frequency count, these monotonicities translate to some of the scores in use, thereby reducing the size of tables at the outset and enhancing the process of discovery. These latter derivations extend to patterns with mismatches an analysis previously devoted to exact pat- terns.

Year	Venue	Keywords
2004	WABI	error threshold
Field	DocType	Citations
Monotonic function,Discrete mathematics,Substring,Combinatorics,Error threshold,Database query,Pattern analysis,Preprocessor,Expected value,Mathematics,Monotone polygon	Conference	0
PageRank	References	Authors
0.34	8	2

Authors (2 rows)

Cited by (0 rows)

References (8 rows)

Name	Order	Citations	PageRank
Alberto Apostolico	1	1441	182.20
Cinzia Pizzi	2	139	15.73

1