Abstract | ||
---|---|---|
The classification problem derived from information extraction (IE) has an imbalanced training set. This is particularly true when learning from smaller datasets which often have a few positive training examples and many negative ones. This paper takes two popular IE algorithms -- SVM and Perceptron -- and demonstrates how the introduction of an uneven margins parameter can improve the results on imbalanced training data in IE. Our experiments demonstrate that the uneven margin was indeed helpful, especially when learning from few examples. Essentially, the smaller the training set is, the more beneficial the uneven margin can be. We also compare our systems to other state-of-the-art algorithms on several benchmarking corpora for IE. |
Year | Venue | Keywords |
---|---|---|
2005 | CoNLL | positive training example,benchmarking corpus,imbalanced training data,smaller datasets,popular ie algorithm,classification problem,uneven margin,imbalanced training set,information extraction,uneven margins parameter,training set |
Field | DocType | Citations |
Training set,Pattern recognition,Computer science,Support vector machine,Information extraction,Artificial intelligence,Perceptron,Machine learning,Benchmarking | Conference | 28 |
PageRank | References | Authors |
1.79 | 12 | 3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Yaoyong Li | 1 | 393 | 26.55 |
Kalina Bontcheva | 2 | 2538 | 211.33 |
Hamish Cunningham | 3 | 2426 | 255.41 |