Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets - Citegraph

Paper Info

Title
Replicability Analysis for Natural Language Processing: Testing Significance with Multiple Datasets

Abstract
With the ever growing amount of textual data from a large variety of languages, domains, and genres, it has become standard to evaluate NLP algorithms on multiple datasets in order to ensure a consistent performance across heterogeneous setups. However, such multiple comparisons pose significant challenges to traditional statistical analysis methods in NLP and can lead to erroneous conclusions. In this paper we propose a Replicability Analysis framework for a statistically sound analysis of multiple comparisons between algorithms for NLP tasks. We discuss the theoretical advantages of this framework over the current, statistically unjustified, practice in the NLP literature, and demonstrate its empirical value across four applications: multi-domain dependency parsing, multilingual POS tagging, cross-domain sentiment classification and word similarity prediction.

Year	DOI	Venue
2017	10.1162/tacl_a_00074	Transactions of the Association for Computational Linguistics
Field	DocType	Volume
Computer science,Multiple comparisons problem,Dependency grammar,Natural language processing,Artificial intelligence,Sound analysis,Machine learning,Statistical analysis	Journal	5
Issue	Citations	PageRank
1	1	0.36
References	Authors
27	4

Authors (4 rows)

Cited by (1 rows)

References (27 rows)

Name	Order	Citations	PageRank
Rotem Dror	1	1	1.71
Gili Baumer	2	1	0.36
M. Bogomolov	3	3	1.22
Roi Reichart	4	760	53.53

1