Title
Single-pass online learning: performance, voting schemes and online feature selection
Abstract
To learn concepts over massive data streams, it is essential to design inference and learning methods that operate in real time with limited memory. Online learning methods such as perceptron or Winnow are naturally suited to stream processing; however, in practice multiple passes over the same training data are required to achieve accuracy comparable to state-of-the-art batch learners. In the current work we address the problem of training an on-line learner with a single passover the data. We evaluate several existing methods, and also propose a new modification of Margin Balanced Winnow, which has performance comparable to linear SVM. We also explore the effect of averaging, a.k.a. voting, on online learning. Finally, we describe how the new Modified Margin Balanced Winnow algorithm can be naturally adapted to perform feature selection. This scheme performs comparably to widely-used batch feature selection methods like information gain or Chi-square, with the advantage of being able to select features on-the-fly. Taken together, these techniques allow single-pass online learning to be competitive with batch techniques, and still maintain the advantages of on-line learning.
Year
DOI
Venue
2006
10.1145/1150402.1150466
KDD
Keywords
Field
DocType
feature selection,batch technique,training data,massive data stream,widely-used batch feature selection,online feature selection,online learning,state-of-the-art batch learner,new modified margin,winnow algorithm,on-line learning,information gain,real time,stream processing,winnow,voting
Online machine learning,Data mining,Data stream mining,Semi-supervised learning,Active learning (machine learning),Feature selection,Computer science,Artificial intelligence,Winnow,Stream processing,Perceptron,Machine learning
Conference
ISBN
Citations 
PageRank 
1-59593-339-5
35
1.74
References 
Authors
15
2
Name
Order
Citations
PageRank
Vitor R. Carvalho167236.38
William W. Cohen2101781243.74