Title
Optimizing term vectors for efficient and robust filtering
Abstract
We describe an efficient, robust method for selecting and optimizing terms for a classification or filtering task. Terms are extracted from positive examples in training data based on several alternative term-selection algorithms, then combined additively after a simple term-score normalization step to produce a merged and ranked master term vector. The score threshold for the master vector is set via beta-gamma regulation over all the available training data. The process avoids para-meter calibrations and protracted training. It also results in compact profiles for run-time evaluation of test (new) documents. Results on TREC-2002 filtering-task datasets demonstrate substantial improvements over TREC-median results and rival both idealized IR-based results and optimized (and expensive) SVM-based classifiers in general effectiveness.
Year
DOI
Venue
2003
10.1145/860435.860546
SIGIR
Keywords
Field
DocType
svm-based classifier,training data,master vector,master term vector,alternative term-selection algorithm,available training data,optimizing term vector,beta-gamma regulation,protracted training,trec-2002 filtering-task datasets,trec-median result,vector optimization
Training set,Data mining,Normalization (statistics),Ranking,Pattern recognition,Computer science,Vector optimization,Support vector machine,Filter (signal processing),Robust filtering,Artificial intelligence,Calibration
Conference
ISBN
Citations 
PageRank 
1-58113-646-3
0
0.34
References 
Authors
4
3
Name
Order
Citations
PageRank
David A. Evans1841147.89
Jeffrey Bennett2759.82
David A. Hull31282214.27