Title
An Active Co-Training Algorithm For Biomedical Named-Entity Recognition
Abstract
Exploiting unlabeled text data with a relatively small labeled corpus has been an active and challenging research topic in text mining, due to the recent growth of the amount of biomedical literature. Biomedical named-entity recognition is an essential prerequisite task before effective text mining of biomedical literature can begin. This paper proposes an Active Co-Training (ACT) algorithm for biomedical named-entity recognition. ACT is a semi-supervised learning method in which two classifiers based on two different feature sets iteratively learn from informative examples that have been queried from the unlabeled data. We design a new classification problem to measure the informativeness of an example in unlabeled data. In this classification problem, the examples are classified based on a joint view of a feature set to be informative/non-informative to both classifiers. To form the training data for the classification problem, we adopt a query-by-committee method. Therefore, in the ACT, both classifiers are considered to be one committee, which is used on the labeled data to give the informativeness label to each example. The ACT method outperforms the traditional co-training algorithm in terms of f-measure as well as the number of training iterations performed to build a good classification model. The proposed method tends to efficiently exploit a large amount of unlabeled data by selecting a small number of examples having not only useful information but also a comprehensive pattern.
Year
DOI
Venue
2012
10.3745/JIPS.2012.8.4.575
JOURNAL OF INFORMATION PROCESSING SYSTEMS
Keywords
Field
DocType
Biomedical Named-Entity Recognition, Co-Training, Semi-Supervised Learning, Feature Processing, Text Mining
Small number,Data mining,Semi-supervised learning,Computer science,Co-training,Artificial intelligence,Labeled data,Training set,Text mining,Algorithm,Exploit,Named-entity recognition,Machine learning
Journal
Volume
Issue
ISSN
8
4
1976-913X
Citations 
PageRank 
References 
3
0.40
1
Authors
5
Name
Order
Citations
PageRank
Tsendsuren Munkhdalai116913.49
Meijing Li2507.60
Unil Yun396955.33
Oyun-erdene Namsrai4244.32
Keun Ho Ryu588385.61