Title
Better spaced seeds using Quadratic Residues.
Abstract
Spaced seeds are used in approximate pattern matching algorithms to quickly discard regions where a match is not likely to occur. We propose a family of lossless spaced seeds based on Quadratic Residues modulo a prime number. Our seeds work with a threshold t>1 in the sense that two regions are considered similar only if the seed hits t times within the regions. We prove that, for any number of errors, our seeds have an exponentially smaller probability of producing false positive matches than any traditional seed using a threshold t=1. To establish our result we introduce a formal notion of selectivity that generalizes the concept of seed weight, and we relate it to the minimum coverage and to a new structural property defined in terms on seed rotations. This groundwork will be useful for further analysis on seeds with threshold and we use it to provide improved bounds for approximate matching with 2 or 3 errors. Our results show that the use of a single seed with a threshold t>1 should be considered as a possible alternative to single or multiple seeds with t=1.
Year
DOI
Venue
2013
10.1016/j.jcss.2013.03.002
Journal of Computer and System Sciences
Keywords
Field
DocType
Sequence comparison,Homology search,Approximate matching,Hamming distance,Spaced seeds,Lossless filtration
Discrete mathematics,Combinatorics,Quadratic residue,Prime number,Modulo,Structural property,Approximate matching,Pattern matching,Mathematics,Lossless compression,Exponential growth
Journal
Volume
Issue
ISSN
79
7
0022-0000
Citations 
PageRank 
References 
8
0.49
12
Authors
2
Name
Order
Citations
PageRank
Lavinia Egidi19110.21
Giovanni Manzini21584111.42