Title
Identification of Chemical Entities in Patent Documents
Abstract
Biomedical literature is an important source of information for chemical compounds. However, different representations and nomenclatures for chemical entities exist, which makes the reference of chemical entities ambiguous. Many systems already exist for gene and protein entity recognition, however very few exist for chemical entities. The main reason for this is the lack of corpus to train named entity recognition systems and perform evaluation. In this paper we present a chemical entity recognizer that uses a machine learning approach based on conditional random fields (CRF) and compare the performance with dictionary-based approaches using several terminological resources. For the training and evaluation, a gold standard of manually curated patent documents was used. While the dictionary-based systems perform well in partial identification of chemical entities, the machine learning approach performs better (10% increase in F-score in comparison to the best dictionary-based system) when identifying complete entities.
Year
DOI
Venue
2009
10.1007/978-3-642-02481-8_144
IWANN (2)
Keywords
Field
DocType
conditional random field,patent documents,chemical entities,biomedical literature,protein entity recognition,entity recognition system,chemical entity recognizer,chemical entity,chemical compound,dictionary-based approach,complete entity,dictionary-based system,computer science,text mining,conditional random fields
Conditional random field,Entity linking,Data mining,Text mining,Information retrieval,Computer science,Weak entity,Natural language processing,Artificial intelligence,Named-entity recognition
Conference
Volume
ISSN
Citations 
5518
0302-9743
10
PageRank 
References 
Authors
0.72
18
4
Name
Order
Citations
PageRank
Tiago Grego1343.91
P Pezik21419.07
Francisco M. Couto398272.63
dietrich rebholzschuhmann4102375.06