Title | ||
---|---|---|
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. |
Abstract | ||
---|---|---|
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is model-agnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a hand-crafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multi-task and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems. |
Year | Venue | DocType |
---|---|---|
2018 | international conference on learning representations | Conference |
Volume | Citations | PageRank |
abs/1804.07461 | 58 | 1.33 |
References | Authors | |
39 | 6 |
Name | Order | Citations | PageRank |
---|---|---|---|
Alex Wang | 1 | 71 | 5.27 |
Amanpreet Singh | 2 | 109 | 8.34 |
julian michael | 3 | 78 | 5.08 |
Felix Hill | 4 | 346 | 17.90 |
Omer Levy | 5 | 1387 | 56.96 |
Samuel R. Bowman | 6 | 906 | 44.99 |