Abstract | ||
---|---|---|
For many text classification tasks, sets of background text are easily available from the Web and other online sources. We show that such background text can greatly improve text clas- sification performance by treating the background text as un- labeled data and using existing techniques based on EM for iteratively labeling this background text. Although results are most pronounced when the background text falls into cate- gories that mirror those present in the training and test data, we show improved classification accuracy even though the use of background text violates many of the assumptions un- derlying the original approach, especially in the presence of limited training data. |
Year | Venue | Field |
---|---|---|
2005 | the florida ai research society | Training set,Computer science,Natural language processing,Artificial intelligence,Test data |
DocType | Citations | PageRank |
Conference | 1 | 0.35 |
References | Authors | |
7 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Sarah Zelikovitz | 1 | 181 | 16.42 |
Haym Hirsh | 2 | 1839 | 277.74 |