Title
Biomedical Named Entity Recognition Based on the Combination of Regional and Global Text Features
Abstract
The biomedical information extraction, especially Named Entity Recognition (NER), is a primary task in biomedical text-mining due to the rapid growth of large-scale literature. Extracting biomedical entities aims at identifying specific entities (words or phrases) from those unstructured text data. In this work, we introduce a novel biomedical NER system utilizing a combination of regional and global text features: linguistic, lexical, contextual, and syntactic features. Our system adopts Conditional Random Fields (CRFs) [1] as a machine learning algorithm and consists of two major pipelines (see Figure 1). We especially focus on constructing the first pipeline for text processing in a modularized manner and discovering rich feature sets regarding comprehensive linguistics and contexts. To implement the CRF framework in the second pipeline, our system uses a modified version of Mallet [2] to take advantage of feature induction. As a result of 10-fold cross-validation, our system achieves from 0.99% up to 18.47% of F-measure improvement as well as the highest precision compared to existing open-source biomedical NER systems on GENETAG corpus [3]. We figure out that several components such as abundant key features, external resources, and feature induction contribute to the performance of the proposed system.
Year
DOI
Venue
2014
10.1145/2665970.2665990
DTMBIO@CIKM
Keywords
Field
DocType
biomedical named entity recognition,information extraction,conditional random fields,machine learning,text analysis,text mining
Conditional random field,Text mining,Computer science,Information extraction,Artificial intelligence,Natural language processing,Named-entity recognition,Syntax,CRFS,Text processing
Conference
Citations 
PageRank 
References 
0
0.34
1
Authors
5
Name
Order
Citations
PageRank
Yoo Kyung Jeong1514.98
Dahee Lee211.36
Namgi Han311.03
Won Chul Kim400.34
Min Song5173.46