Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations. - Citegraph

Paper Info

Title
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.

Abstract
Chemical and biomedical Named Entity Recognition (NER) is an essential prerequisite task before effective text mining can begin for biochemical-text data. Exploiting unlabeled text data to leverage system performance has been an active and challenging research topic in text mining due to the recent growth in the amount of biomedical literature. We present a semi-supervised learning method that efficiently exploits unlabeled data in order to incorporate domain knowledge into a named entity recognition model and to leverage system performance. The proposed method includes Natural Language Processing (NLP) tasks for text preprocessing, learning word representation features from a large amount of text data for feature extraction, and conditional random fields for token classification. Other than the free text in the domain, the proposed method does not rely on any lexicon nor any dictionary in order to keep the system applicable to other NER tasks in bio-text data.We extended BANNER, a biomedical NER system, with the proposed method. This yields an integrated system that can be applied to chemical and drug NER or biomedical NER. We call our branch of the BANNER system BANNER-CHEMDNER, which is scalable over millions of documents, processing about 530 documents per minute, is configurable via XML, and can be plugged into other systems by using the BANNER Unstructured Information Management Architecture (UIMA) interface. BANNER-CHEMDNER achieved an 85.68% and an 86.47% F-measure on the testing sets of CHEMDNER Chemical Entity Mention (CEM) and Chemical Document Indexing (CDI) subtasks, respectively, and achieved an 87.04% F-measure on the official testing set of the BioCreative II gene mention task, showing remarkable performance in both chemical and biomedical NER. BANNER-CHEMDNER system is available at: https://bitbucket.org/tsendeemts/banner-chemdner.

Year	DOI	Venue
2015	10.1186/1758-2946-7-S1-S9	J. Cheminformatics
Keywords	Field	DocType
conditional random fields,feature representation learning,named entity recognition,semi-supervised learning	Data mining,Semi-supervised learning,Computer science,Artificial intelligence,Natural language processing,Conditional random field,Text mining,Domain knowledge,Feature extraction,Lexicon,Preprocessor,Bioinformatics,Named-entity recognition	Journal
Volume	Issue	ISSN
7	Suppl 1 Text mining for chemistry and the CHEMDNER track	1758-2946
Citations	PageRank	References
12	0.53	29
Authors
6

Authors (6 rows)

Cited by (12 rows)

References (29 rows)

Name	Order	Citations	PageRank
Tsendsuren Munkhdalai	1	169	13.49
Meijing Li	2	50	7.60
Khuyagbaatar Batsuren	3	13	1.59
Hyeon Ah Park	4	12	0.53
Nak Hyeon Choi	5	12	0.53
Keun Ho Ryu	6	883	85.61

1