Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models - Citegraph

Paper Info

Title
Detection of Underrepresented Biological Sequences using Class-Conditional Distribution Models

Abstract
A labeled sequence data set related to a certain biological property is often biased and, therefore, does not completely capture its diversity in nature. To reduce this sampling bias problem a data mining procedure is proposed for detecting underrepresented relevant sequences. The procedure is aimed at helping domain experts achieve a cost-effective qualitative enlargement of knowledge through an in-depth study of a small number of statistically underrepresented and functionally interesting sequences. Our procedure consists of: (i) learning a class-conditional distribution model on each class of labeled data; (ii) applying the models to select statistically underrepresented unlabeled sequences; and (iii) automatically evaluating their interestingness. An application of the proposed approach is illustrated on an important problem of increasing the data set of confirmed disordered proteins. The obtained results demonstrate the promise of the proposed approach for an efficient reduction of sampling bias in biological databases.

Year	Venue	Keywords
2003	SIAM Proceedings Series	conditional distribution,data mining
Field	DocType	Citations
Small number,Data mining,Distribution model,Conditional probability distribution,Pattern recognition,Computer science,Sampling bias,Biological database,Artificial intelligence,Data sequences,Labeled data	Conference	2
PageRank	References	Authors
0.49	5	4

Authors (4 rows)

Cited by (2 rows)

References (5 rows)

Name	Order	Citations	PageRank
Slobodan Vucetic	1	637	56.38
Dragoljub Pokrajac	2	264	19.89
Hongbo M Xie	3	137	12.22
Zoran Obradovic	4	1110	137.41

1