Evaluating Models' Local Decision Boundaries via Contrast Sets.

Paper Info

Title
Evaluating Models' Local Decision Boundaries via Contrast Sets.

Abstract
Standard test sets for supervised learning evaluate in-distribution generalization. Unfortunately, when a dataset has systematic gaps (e.g., annotation artifacts), these evaluations are misleading: a model can learn simple decision rules that perform well on the test set but do not capture the abilities a dataset is intended to test. We propose a more rigorous annotation paradigm for NLP that helps to close systematic gaps in the test data. In particular, after a dataset is constructed, we recommend that the dataset authors manually perturb the test instances in small but meaningful ways that (typically) change the gold label, creating contrast sets. Contrast sets provide a local view of a model’s decision boundary, which can be used to more accurately evaluate a model’s true linguistic capabilities. We demonstrate the efficacy of contrast sets by creating them for 10 diverse NLP datasets (e.g., DROP reading comprehension, UD parsing, and IMDb sentiment analysis). Although our contrast sets are not explicitly adversarial, model performance is significantly lower on them than on the original test sets—up to 25% in some cases. We release our contrast sets as new evaluation benchmarks and encourage future dataset construction efforts to follow similar annotation processes.

Authors (26 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Matthew Gardner	1	704	38.49
Yoav Artzi	2	483	26.99
Victoria Basmova	3	0	0.34
Jonathan Berant	4	982	53.86
Ben Bogin	5	20	4.06
Sihao Chen	6	0	0.34
Pradeep Dasigi	7	131	12.09
Dheeru Dua	8	38	4.95
Yanai Elazar	9	9	5.54
Ananth Gottumukkala	10	0	0.34
Nitish Gupta	11	17	4.70
Hannaneh Hajishirzi	12	417	46.10
Gabriel Ilharco	13	5	2.11
Daniel Khashabi	14	114	15.14
Kevin Lin	15	1	1.36
Jiangming Liu	16	19	6.12
Nelson Liu	17	42	4.59
Phoebe Mulcaire	18	3	1.40
Qiang Ning	19	18	9.48
Sameer Singh	20	1060	71.63
Noah A. Smith	21	5867	314.27
Sanjay Subramanian	22	1	3.78
Reut Tsarfaty	23	0	0.34
Eric Wallace	24	18	7.45
Ally Zhang	25	0	0.34
Ben Zhou	26	0	0.68