Abstract | ||
---|---|---|
We describe an efficient, robust method for selecting and optimizing terms for a classification or filtering task. Terms are extracted from positive examples in training data based on several alternative term-selection algorithms, then combined additively after a simple term-score normalization step to produce a merged and ranked master term vector. The score threshold for the master vector is set via beta-gamma regulation over all the available training data. The process avoids para-meter calibrations and protracted training. It also results in compact profiles for run-time evaluation of test (new) documents. Results on TREC-2002 filtering-task datasets demonstrate substantial improvements over TREC-median results and rival both idealized IR-based results and optimized (and expensive) SVM-based classifiers in general effectiveness. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1145/860435.860546 | SIGIR |
Keywords | Field | DocType |
svm-based classifier,training data,master vector,master term vector,alternative term-selection algorithm,available training data,optimizing term vector,beta-gamma regulation,protracted training,trec-2002 filtering-task datasets,trec-median result,vector optimization | Training set,Data mining,Normalization (statistics),Ranking,Pattern recognition,Computer science,Vector optimization,Support vector machine,Filter (signal processing),Robust filtering,Artificial intelligence,Calibration | Conference |
ISBN | Citations | PageRank |
1-58113-646-3 | 0 | 0.34 |
References | Authors | |
4 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
David A. Evans | 1 | 841 | 147.89 |
Jeffrey Bennett | 2 | 75 | 9.82 |
David A. Hull | 3 | 1282 | 214.27 |