Semi-Supervised Sequence Labeling with Self-Learned Features - Citegraph

Paper Info

Title
Semi-Supervised Sequence Labeling with Self-Learned Features

Abstract
Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of labeled words. To tackle this issue, we propose a semi-supervised approach to improve the sequence labeling procedure in IE through a class of algorithms with {\em self-learned features} (SLF). A supervised classifier can be trained with annotated text sequences and used to classify each word in a large set of unannotated sentences. By averaging predicted labels over all cases in the unlabeled corpus, SLF training builds class label distribution patterns for each word (or word attribute) in the dictionary and re-trains the current model iteratively adding these distributions as extra word {\em features}. Basic SLF models how likely a word could be assigned to target class types. Several extensions are proposed, such as learning words' class boundary distributions. SLF exhibits robust and scalable behaviour and is easy to tune. We applied this approach on four classical IE tasks: named entity recognition (German and English), part-of-speech tagging (English) and one gene name recognition corpus. Experimental results show effective improvements over the supervised baselines on all tasks. In addition, when compared with the closely related self-training idea, this approach shows favorable advantages.

Year	DOI	Venue
2009	10.1109/ICDM.2009.40	ICDM
Keywords	Field	DocType
extra word,basic slf model,word attribute,class type,class label distribution pattern,class boundary distribution,semi-supervised sequence,annotated text sequence,classical ie task,slf training,semi-supervised approach,self-learned features,hidden markov models,labeling,natural language,artificial neural networks,information extraction,feature extraction,semi supervised learning,data mining,natural language processing,learning artificial intelligence,sequence labeling	Data mining,Semi-supervised learning,Sequence labeling,Computer science,Artificial intelligence,Natural language processing,Classifier (linguistics),Pattern recognition,Feature extraction,Information extraction,Natural language,Hidden Markov model,Named-entity recognition,Machine learning	Conference
ISSN	Citations	PageRank
1550-4786	7	0.46
References	Authors
24	6

Authors (6 rows)

Cited by (7 rows)

References (24 rows)

Name	Order	Citations	PageRank
Qi, Yanjun	1	684	45.77
Pavel Kuksa	2	399	24.10
Ronan Collobert	3	4002	308.61
Sadamasa, Kunihiko	4	91	4.63
Koray Kavukcuoglu	5	10189	504.11
Jason Weston	6	13068	805.30

1