Harmonization of gene/protein annotations: towards a gold standard MEDLINE. - Citegraph

Paper Info

Title
Harmonization of gene/protein annotations: towards a gold standard MEDLINE.

Abstract
Motivation: The recognition of named entities (NER) is an elementary task in biomedical text mining. A number of NER solutions have been proposed in recent years, taking advantage of available annotated corpora, terminological resources and machine- learning techniques. Currently, the best performing solutions combine the outputs from selected annotation solutions measured against a single corpus. However, little effort has been spent on a systematic analysis of methods harmonizing the annotation results and measuring against a combination of Gold Standard Corpora (GSCs). Results: We present Totum, a machine learning solution that harmonizes gene/protein annotations provided by heterogeneous NER solutions. It has been optimized and measured against a combination of manually curated GSCs. The performed experiments show that our approach improves the F-measure of state-of-the-art solutions by up to 10% (achieving approximate to 70%) in exact alignment and 22% (achieving approximate to 82%) in nested alignment. We demonstrate that our solution delivers reliable annotation results across the GSCs and it is an important contribution towards a homogeneous annotation of MEDLINE abstracts.

Year	DOI	Venue
2012	10.1093/bioinformatics/bts125	BIOINFORMATICS
Field	DocType	Volume
Data mining,Annotation,Information retrieval,Harmonization,Homogeneous,Computer science,Biomedical text mining,Protein Annotation,Bioinformatics,MEDLINE,Java	Journal	28
Issue	ISSN	Citations
9	1367-4803	4
PageRank	References	Authors
0.45	20	5

Authors (5 rows)

Cited by (4 rows)

References (20 rows)

Name	Order	Citations	PageRank
David Campos	1	219	10.69
Sérgio Matos	2	415	29.51
Ian Lewin	3	246	25.58
José Luis Oliveira	4	760	84.03
dietrich rebholzschuhmann	5	1023	75.06

1