Title
Multistage Gene Normalization and SVM-Based Ranking for Protein Interactor Extraction in Full-Text Articles
Abstract
The interactor normalization task (INT) is to identify genes that play the interactor role in protein-protein interactions (PPIs), to map these genes to unique IDs, and to rank them according to their normalized confidence. INT has two subtasks: gene normalization (GN) and interactor ranking. The main difficulties of INT GN are identifying genes across species and using full papers instead of abstracts. To tackle these problems, we developed a multistage GN algorithm and a ranking method, which exploit information in different parts of a paper. Our system achieved a promising AUC of 0.43471. Using the multistage GN algorithm, we have been able to improve system performance (AUC) by 1.719 percent compared to a one-stage GN algorithm. Our experimental results also show that with full text, versus abstract only, INT AUC performance was 22.6 percent higher.
Year
DOI
Venue
2010
10.1109/TCBB.2010.45
IEEE/ACM Trans. Comput. Biology Bioinform.
Keywords
Field
DocType
bioinformatics,data mining,genetics,signal processing,protein protein interactions,support vector machines,text analysis,proteins,protein protein interaction,intrusion detection,text mining,system performance
Data mining,Gene normalization,Text mining,Normalization (statistics),Ranking,Computer science,Support vector machine,Artificial intelligence,Interactor,Bioinformatics,Intrusion detection system,Machine learning
Journal
Volume
Issue
ISSN
7
3
1545-5963
Citations 
PageRank 
References 
21
1.02
18
Authors
3
Name
Order
Citations
PageRank
Hong-Jie Dai128821.58
Po-Ting Lai21309.32
Richard Tzong-Han Tsai371454.89