Title
Inferring relatedness of a macromolecule to a sequence database without sequencing.
Abstract
Derivation of biological information of a macromolecule isolate based on sequence similarity is playing a significant role in numerous areas of biological research. However, it is often the case that a researcher obtains more macromolecule isolates than can be sequenced practically, due either to the high cost of sequencing or lack of specialized equipment and personnel. To overcome this difficulty, we study the problem of obtaining biological information (such as sequence information) about a macromolecule isolate using only (i) the fragmentation pattern of that isolate obtained from digestion with enzymes and (ii) a database D of sequences. We investigate a three phase approach to solving this problem. In the first phase, we obtain a restriction pattern of the isolate while analytically deriving the corresponding restriction maps of the sequences in the database. In the second phase, we identify a set S [symbol: see text] D of sequences which have restriction maps that are most similar to the unknown isolate's restriction pattern. This task is complicated by the fact that we have only approximate fragment lengths for the unknown isolate and that we do not know the actual ordering of the unknown isolate's fragments. Despite these difficulties, we derive experimental results which indicate maximum matching techniques are effective in identifying the correct set most of the time. In the third phase, we use the set S to infer biological information (such as sequence information or hierarchical classification information) about the unknown isolate. We demonstrate experimentally that the closeness of the sequences in the set S to each other can be used to infer the relatedness of the unknown isolate to the sequences of the set S. Furthermore, the confidence of this inferred information is strongly correlated to the minimum pairwise relatedness of any two elements in S.
Year
Venue
Keywords
1996
ISMB
inferring relatedness,algorithms,sequence databases,sequence similarity,phylogenies,restric- tion mapping,sequence database,search
Field
DocType
Volume
Sequence alignment,Pairwise comparison,Phylogenetic tree,Sequence database,Protein sequencing,Computer science,Database search engine,Bioinformatics,Restriction enzyme,Restriction map
Conference
4
ISSN
ISBN
Citations 
1553-0833
1-57735-002-2
0
PageRank 
References 
Authors
0.34
9
4
Name
Order
Citations
PageRank
J Kim1596.96
J R Cole233169.86
E Torng300.34
Sakti Pramanik4770204.19