Title
Extraction of Chemical and Drug Named Entities by Ensemble Learning Using Chemical NER Tools Based on Different Extraction Guidelines.
Abstract
Chemical named-entity recognition (chemical NER) is the task of extracting chemical information and chemical-related entities such as drug names and source materials from text in several domains such as bioinformatics and nanoinformatics. There have been several attempts to construct corpora for handling such chemical-related information based on different corpus-construction guidelines. Even though these guidelines contain common types of chemical information, they differ in several ways. As a result, chemical NER tools developed for a particular guideline might be able to extract common chemical named entities, but they may have problems extracting other chemical-related entities. Assuming the differences between these guidelines are consistent, the pattern of success and failure of the chemical NER tools might also be consistent. In this paper, we present an ensemble-learning approach that uses the conditional random field (CRF) as a machine-learning technique to fuse a variety of different characteristic chemical NER tools based on different guidelines to construct a chemical NER for a particular guideline. To achieve consistent tokenization across these different tools, we applied a post-tokenization mechanism. We evaluated the system using the BioCreative IV, CHEMDNER task datasets. We confirmed that the ensemble-learning approach using a combination of chemical NER tools is better than a simple domain-adaptation approach using just one chemical NER tool. We also confirmed that the ensemble-learning approach could improve the performance of a well-tuned rule-based chemical NER tool on certain tasks.
Year
Venue
Field
2015
Transactions on Machine Learning and Data Mining
Conditional random field,Tokenization (data security),Data mining,Computer science,Ensemble learning
DocType
Volume
Issue
Journal
8
2
Citations 
PageRank 
References 
0
0.34
0
Authors
2
Name
Order
Citations
PageRank
Thaer M. Dieb131.51
Masaharu Yoshioka236841.40