Title
Training set expansion: an approach to improving the reconstruction of biological networks from limited and uneven reliable interactions.
Abstract
Motivation: An important problem in systems biology is reconstructing complete networks of interactions between biological objects by extrapolating from a few known interactions as examples. While there are many computational techniques proposed for this network reconstruction task, their accuracy is consistently limited by the small number of high-confidence examples, and the uneven distribution of these examples across the potential interaction space, with some objects having many known interactions and others few. Results: To address this issue, we propose two computational methods based on the concept of training set expansion. They work particularly effectively in conjunction with kernel approaches, which are a popular class of approaches for fusing together many disparate types of features. Both our methods are based on semi-supervised learning and involve augmenting the limited number of gold-standard training instances with carefully chosen and highly con. dent auxiliary examples. The first method, prediction propagation, propagates highly con. dent predictions of one local model to another as the auxiliary examples, thus learning from information-rich regions of the training network to help predict the information-poor regions. The second method, kernel initialization, takes the most similar and most dissimilar objects of each object in a global kernel as the auxiliary examples. Using several sets of experimentally verified protein-protein interactions from yeast, we show that training set expansion gives a measurable performance gain over a number of representative, state-of-the-art network reconstruction methods, and it can correctly identify some interactions that are ranked low by other methods due to the lack of training examples of the involved proteins.
Year
DOI
Venue
2009
10.1093/bioinformatics/btn602
BIOINFORMATICS
Keywords
Field
DocType
computational biology,gold standard,protein protein interaction,algorithms,semi supervised learning,gene regulatory networks,biological network,systems biology,system biology
Small number,Data mining,Computer science,Artificial intelligence,Kernel (linear algebra),Training set,Ranking,Measure (mathematics),Systems biology,Initialization,Bioinformatics,Gene regulatory network,Machine learning
Journal
Volume
Issue
ISSN
25
2
1367-4803
Citations 
PageRank 
References 
5
0.55
12
Authors
2
Name
Order
Citations
PageRank
Kevin Y. Yip160038.39
Mark Gerstein235445.41