Abstract | ||
---|---|---|
Motivation: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine- learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). Results: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving approximate to 70%) in exact alignment and 22% (achieving approximate to 82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1093/bioinformatics/bts125 | BIOINFORMATICS |
Field | DocType | Volume |
Data mining,Annotation,Information retrieval,Harmonization,Homogeneous,Computer science,Biomedical text mining,Protein Annotation,Bioinformatics,MEDLINE,Java | Journal | 28 |
Issue | ISSN | Citations |
9 | 1367-4803 | 4 |
PageRank | References | Authors |
0.45 | 20 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
David Campos | 1 | 219 | 10.69 |
Sérgio Matos | 2 | 415 | 29.51 |
Ian Lewin | 3 | 246 | 25.58 |
José Luis Oliveira | 4 | 760 | 84.03 |
dietrich rebholzschuhmann | 5 | 1023 | 75.06 |