Title
Confidence measures for protein fold recognition.
Abstract
Motivation: We present an extensive evaluation of different methods and criteria to detect remote homologs of a given protein sequence. We investigate two associated problems: first, to develop a sensitive searching method to identify possible candidates and, second, to assign a confidence to the putative candidates in order to select the best one. For searching methods where the score distributions are known, p-values are used as confidence measure with great success. For the cases where such theoretical backing is absent, we propose empirical approximations to p-values for searching procedures. Results: As a baseline, we review the performances of different methods for detecting remote protein folds (sequence alignment and threading, with and without sequence profiles, global and local). The analysis is performed on a large representative set of protein structures. For fold recognition, we find that methods using sequence profiles generally perform better than methods using plain sequences, and that threading methods perform better than sequence alignment methods. In order to assess the quality of the predictions made, we establish and compare several confidence measures, including raw scores, Z-scores, raw score gaps, z-score gaps, and different methods of p-value estimation. We work our way from the theoretically well backed local scores towards more explorative global and threading scores. The methods for assessing the statistical significance of predictions are compared using specificity-sensitivity plots. For local alignment techniques we find that p-value methods work best, albeit computationally cheaper methods such as those based on score gaps achieve similar performance. For global methods where no theory is available methods based on score gaps work best. By using the score gap functions as the measure of confidence we improve the more powerful fold recognition methods for which p-values are unavailable.
Year
DOI
Venue
2002
10.1093/bioinformatics/18.6.802
BIOINFORMATICS
Keywords
Field
DocType
fold recognition,protein folding,protein structure,statistical significance,sequence alignment,protein sequence,local alignment
Data mining,Confidence measures,Raw score,Computer science,Threading (protein sequence),Threading (manufacturing),Smith–Waterman algorithm,Bioinformatics
Journal
Volume
Issue
ISSN
18
6
1367-4803
Citations 
PageRank 
References 
7
1.09
6
Authors
5
Name
Order
Citations
PageRank
Ingolf Sommer122118.10
Alexander Zien21255146.93
Niklas Von Öhsen3517.93
Ralf Zimmer426928.70
Thomas Lengauer53155605.03