Title
Large Scale Sequential Learning from Partially Labeled Data
Abstract
The success of data-driven solutions to difficult problems, along with the dropping costs of storing and processing massive amounts of data, has led to growing interest in large scale machine learning. In many cases, statistical learning problems involve sequential data, which exhibits significant sequential correlation. This fact makes the training of sequence classifier be time consuming and the application of sequential learning from large scale data is difficult, especially when the available training data are sparsely labeled. This paper proposed a novel learning approach to build the sequence classifiers from a large amount of partially labeled training data. The mechanism of semi-supervised learning for classifier building from the partially labeled data is embedded in the computing framework of ensemble learning, which is adapted for distributed learning over large scale dataset. For its practical evaluation, we conducted the empirical experiments by using Conditional Random Field (CRF) as the basic learner to detect concepts in large scale document set. The results show that our approach outperforms the best baselines significantly, which demonstrates the effectiveness of the proposed approach.
Year
DOI
Venue
2013
10.1109/ICSC.2013.39
ICSC
Keywords
Field
DocType
large scale machine learning,available training data,sequential learning,semisupervised learning,selft-training,distributed learning,sequence classifiers,learning (artificial intelligence),large scale sequential learning,semi-supervised learning,sequential data,data processing,large scale data,partially labeled data,partially labeled training data,computing framework,conditional random fields,statistical learning problems,statistical learning problem,conditional random field,data driven solutions,large scale document set,ensemble learning,large amount,crf,dropping costs,document handling,co-training,sequential correlation,training data,concept detection,learning artificial intelligence
Online machine learning,Data mining,Semi-supervised learning,Stability (learning theory),Instance-based learning,Computer science,Co-training,Unsupervised learning,Artificial intelligence,Ensemble learning,Machine learning,Learning classifier system
Conference
ISSN
Citations 
PageRank 
2325-6516
1
0.35
References 
Authors
21
3
Name
Order
Citations
PageRank
Jianqiang Li115619.55
Chun-Chen Liu214412.91
Bo Liu314311.62