Abstract | ||
---|---|---|
State-of-the-art LSTM language models trained on large corpora learn sequential contingencies in impressive detail and have been shown to acquire a number of non-local grammatical dependencies with some success. Here we investigate whether supervision with hierarchical structure enhances learning of a range of grammatical dependencies, a question that has previously been addressed only for subject-verb agreement. Using controlled experimental methods from psycholinguistics, we compare the performance of word-based LSTM models versus two models that represent hierarchical structure and deploy it in left-to-right processing: Recurrent Neural Network Grammars (RNNGs) (Dyer et al., 2016) and a incrementalized version of the Parsing-as-Language-Modeling configuration from Chariak et al., (2016). Models are tested on a diverse range of configurations for two classes of non-local grammatical dependencies in English---Negative Polarity licensing and Filler--Gap Dependencies. Using the same training data across models, we find that structurally-supervised models outperform the LSTM, with the RNNG demonstrating best results on both types of grammatical dependencies and even learning many of the Island Constraints on the filler--gap dependency. Structural supervision thus provides data efficiency advantages over purely string-based training of neural language models in acquiring human-like generalizations about non-local grammatical dependencies. |
Year | Venue | Field |
---|---|---|
2019 | arXiv: Computation and Language | Computer science,Natural language processing,Artificial intelligence |
DocType | Volume | Citations |
Journal | abs/1903.00943 | 0 |
PageRank | References | Authors |
0.34 | 9 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ethan Wilcox | 1 | 4 | 3.79 |
Peng Qian | 2 | 1 | 2.04 |
richard futrell | 3 | 15 | 11.08 |
Miguel Ballesteros | 4 | 998 | 49.97 |
Roger Levy | 5 | 996 | 67.40 |