Title
Algorithm for grounding mutation mentions from text to protein sequences
Abstract
Protein mutations derived from in vitro experimental analysis are described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. Presented here is a method for grounding of textual mentions from papers describing mutational changes to proteins. We distinguish between grounding of mutation entities to protein database identifiers and to the correct positions on sequences extracted from protein databases. The grounding workflow coordinates the extraction of mutation, protein and organism mentions from texts and uses these to identify target sequences. Mutation mentions are sequentially mapped onto candidate proteins to facilitate their correct grounding to a protein sequence, independent of a protein-mutation tuple extraction task. Using a gold standard corpus of full text articles and corresponding protein sequences we show high performance precision and recall and discuss novel aspects of the algorithm in the context of previous work.
Year
DOI
Venue
2010
10.1007/978-3-642-15120-0_10
DILS
Keywords
Field
DocType
protein sequence,sequence analysis,experimental analysis,gold standard,natural language processing
Data mining,Identifier,Protein sequencing,Tuple,Computer science,Precision and recall,Algorithm,Protein Databases,Mutation,Sequence analysis
Conference
Volume
ISSN
ISBN
6254
0302-9743
3-642-15119-1
Citations 
PageRank 
References 
4
0.42
19
Authors
3
Name
Order
Citations
PageRank
Jonas Bergman Laurila1182.11
K Rajaraman238031.94
christopher j o baker332930.96