Title
CHEMDNER system with mixed conditional random fields and multi-scale word clustering.
Abstract
The chemical compound and drug name recognition plays an important role in chemical text mining, and it is the basis for automatic relation extraction and event identification in chemical information processing. So a high-performance named entity recognition system for chemical compound and drug names is necessary.We developed a CHEMDNER system based on mixed conditional random fields (CRF) with word clustering for chemical compound and drug name recognition. For the word clustering, we used Brown's hierarchical algorithm and Skip-gram model based on deep learning with massive PubMed articles including titles and abstracts.This system achieved the highest F-score of 88.20% for the CDI task and the second highest F-score of 87.11% for the CEM task in BioCreative IV. The performance was further improved by multi-scale clustering based on deep learning, achieving the F-score of 88.71% for CDI and 88.06% for CEM.The mixed CRF model represents both the internal complexity and external contexts of the entities, and the model is integrated with word clustering to capture domain knowledge with PubMed articles including titles and abstracts. The domain knowledge helps to ensure the performance of the entity recognition, even without fine-grained linguistic features and manually designed rules.
Year
DOI
Venue
2015
10.1186/1758-2946-7-S1-S4
J. Cheminformatics
Keywords
Field
DocType
chemical named entity recognition,deep learning,mixed conditional random fields,word clustering
Conditional random field,Data mining,Text mining,Information processing,Information retrieval,Computer science,Artificial intelligence,Bioinformatics,Deep learning,Cluster analysis,Named-entity recognition,Relationship extraction
Journal
Volume
Issue
ISSN
7
Suppl 1 Text mining for chemistry and the CHEMDNER track
1758-2946
Citations 
PageRank 
References 
17
0.69
15
Authors
5
Name
Order
Citations
PageRank
Yanan Lu1974.02
Donghong Ji2892120.08
Xiaoyuan Yao3171.03
Xiaomei Wei4170.69
Xiaohui Liang5171.03