Abstract | ||
---|---|---|
Protein mutations derived from in vitro experimental analysis are described in detail in scientific papers. Reuse of mutation impact annotations is an important subfield of bioinformatics for which mutation grounding is a critical step. Presented here is a method for grounding of textual mentions from papers describing mutational changes to proteins. We distinguish between grounding of mutation entities to protein database identifiers and to the correct positions on sequences extracted from protein databases. The grounding workflow coordinates the extraction of mutation, protein and organism mentions from texts and uses these to identify target sequences. Mutation mentions are sequentially mapped onto candidate proteins to facilitate their correct grounding to a protein sequence, independent of a protein-mutation tuple extraction task. Using a gold standard corpus of full text articles and corresponding protein sequences we show high performance precision and recall and discuss novel aspects of the algorithm in the context of previous work. |
Year | DOI | Venue |
---|---|---|
2010 | 10.1007/978-3-642-15120-0_10 | DILS |
Keywords | Field | DocType |
protein sequence,sequence analysis,experimental analysis,gold standard,natural language processing | Data mining,Identifier,Protein sequencing,Tuple,Computer science,Precision and recall,Algorithm,Protein Databases,Mutation,Sequence analysis | Conference |
Volume | ISSN | ISBN |
6254 | 0302-9743 | 3-642-15119-1 |
Citations | PageRank | References |
4 | 0.42 | 19 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Jonas Bergman Laurila | 1 | 18 | 2.11 |
K Rajaraman | 2 | 380 | 31.94 |
christopher j o baker | 3 | 329 | 30.96 |