Title | ||
---|---|---|
Improving Document Prioritization for Protein-Protein Interaction Extraction Using Shallow Linguistics and Word Embeddings. |
Abstract | ||
---|---|---|
Understanding of biological processes, associated to disease or pharmacological action for example, requires the analysis of large amounts of interconnected information. Protein interaction networks form part of this puzzle, and extracting this information from the scientific literature is an important but challenging task. In this work, we present a supervised classification approach for identifying and ranking literature documents that contain information regarding protein interactions. We studied the use of word embedding together with simple chunking features, and show that the combination of these features with baseline bag-of-words can lead to similar or even improved results when compared to the use of features based on deep linguistic parsing. When applied to the BioCreative III Article Classification Task dataset, our approach achieves an area under the precision-recall curve of 0.70 and a Matthew's correlation coefficient of 0.56. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-60816-7_6 | 11TH INTERNATIONAL CONFERENCE ON PRACTICAL APPLICATIONS OF COMPUTATIONAL BIOLOGY & BIOINFORMATICS |
Keywords | DocType | Volume |
Protein-protein interactions,Literature retrieval,Machine learning,Word embeddings | Conference | 616 |
ISSN | Citations | PageRank |
2194-5357 | 0 | 0.34 |
References | Authors | |
0 | 1 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sérgio Matos | 1 | 415 | 29.51 |