Title
Improving Document Prioritization for Protein-Protein Interaction Extraction Using Shallow Linguistics and Word Embeddings.
Abstract
Understanding of biological processes, associated to disease or pharmacological action for example, requires the analysis of large amounts of interconnected information. Protein interaction networks form part of this puzzle, and extracting this information from the scientific literature is an important but challenging task. In this work, we present a supervised classification approach for identifying and ranking literature documents that contain information regarding protein interactions. We studied the use of word embedding together with simple chunking features, and show that the combination of these features with baseline bag-of-words can lead to similar or even improved results when compared to the use of features based on deep linguistic parsing. When applied to the BioCreative III Article Classification Task dataset, our approach achieves an area under the precision-recall curve of 0.70 and a Matthew's correlation coefficient of 0.56.
Year
DOI
Venue
2017
10.1007/978-3-319-60816-7_6
11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS
Keywords
DocType
Volume
Protein-protein interactions,Literature retrieval,Machine learning,Word embeddings
Conference
616
ISSN
Citations 
PageRank 
2194-5357
0
0.34
References 
Authors
0
1
Name
Order
Citations
PageRank
Sérgio Matos141529.51