Title
Combining Multiple Learning Strategies for Effective Cross Validation
Abstract
Parameter tuning through cross-validation becomes very difficult when the validation set contains no or only a few examples of the classes in the evaluation set. We address this open challenge by using a combination of classifiers with different performance characteristics to effectively reduce the performance variance on average of the overall system across all classes, including those not seen before. This approach allows us to tune the combination system on available but less-representative validation data and obtain smaller performance degradation of this system on the evaluation data than using a single-method classifier alone. We tested this approach by applying k-Nearest Neighbor, Rocchio and Language Modeling classifiers and their combination to the event tracking problem in the Topic Detection and Tracking (TDT) domain, where new classes (events) are created constantly over time, and representative validation sets for new classes are often difficult to obtain on time. When parameters tuned on an early benchmark TDT corpus were evaluated on a later TDT benchmark corpus with no overlapping events, we observed a 38-65\% reduction in tracking cost (a weighted combination of errors) by the combined system over the individual methods evaluated under the same conditions, strongly suggesting the robustness of this approach as a solution for improving cross-class performance consistency of statistical classifiers when standard cross-validation fails due to the lack of representative validation sets.
Year
Venue
Keywords
2000
ICML
combining multiple learning strategies,effective cross validation,language model,k nearest neighbor,cross validation
Field
DocType
ISBN
Data mining,Pattern recognition,Computer science,Artificial intelligence,Cross-validation,Machine learning
Conference
1-55860-707-2
Citations 
PageRank 
References 
39
4.32
8
Authors
3
Name
Order
Citations
PageRank
Yiming Yang13299344.91
Tom Ault215619.83
Thomas Pierce3394.32