Title
Recognition of characteristic patterns in sets of functionally equivalent DNA sequences.
Abstract
An algorithm has been developed for the identification of unknown patterns which are distinctive for a set of short DNA sequences believed to be functionally equivalent. A pattern is defined as being a string, containing fully or partially specified nucleotides at each position of the string. The advantage of this 'vague' definition of the pattern is that it imposes minimum constraints on the characterization of patterns. A new feature of the approach developed here is that it allows a 'fair' simultaneous testing of patterns of all degrees of degeneracy. This analysis is based on an evaluation of inhomogeneity in the empirical occurrence distribution of any such pattern within a set of sequences. The use of the nonparametric kernel density estimation of Parzen allows one to assess small disturbances among the sequence alignments. The method also makes it possible to identify sequence subsets with different characteristic patterns. This algorithm was implemented in the analysis of patterns characteristic of sets of promoters, terminators and splice junction sequences. The results are compared with those obtained by other methods.
Year
DOI
Venue
1987
10.1093/bioinformatics/3.3.223
Computer Applications in the Biosciences
Keywords
Field
DocType
dna sequence
Data mining,Discrete mathematics,Computer science,DNA sequencing,Computational biology
Journal
Volume
Issue
ISSN
3
3
0266-7061
Citations 
PageRank 
References 
10
6.51
0
Authors
2
Name
Order
Citations
PageRank
G Mengeritsky1116.90
Temple F. Smith213973.26