Surface Statistics of an Unknown Language Indicate How to Parse It. - Citegraph

Paper Info

Title
Surface Statistics of an Unknown Language Indicate How to Parse It.

Abstract
We introduce a novel framework for delexicalized dependency parsing in a new language. We show that useful features of the target language can be extracted automatically from an unparsed corpus, which consists only of gold part-of-speech (POS) sequences. Providing these features to our neural parser enables it to parse sequences like those in the corpus. Strikingly, our system has no supervision in the target language. Rather, it is a multilingual system that is trained end-to-end on a variety of other languages, so it learns a feature extractor that works well. We show experimentally across multiple languages: (1) Features computed from the unparsed corpus improve parsing accuracy. (2) Including thousands of synthetic languages in the training achieves further improvement. (3) Despite being computed from unparsed corpora, our learned task-specific features beat previous worku0027s interpretable typological features that require parsed corpora or expert categorization of the language. Our best method improved attachment scores on held-out test languages by an average of 5.6 percentage points over past work that does not inspect the unparsed data (McDonald et al., 2011), and by 20.65 points over past “grammar induction” work that does not use training languages (Naseem et al., 2010).

Year	DOI	Venue
2018	10.1162/tacl_a_00248	TACL
Field	DocType	Volume
Computer science,Dependency grammar,Natural language processing,Artificial intelligence,Constructed language,Parsing	Journal	6
Citations	PageRank	References
2	0.36	1
Authors
2

Authors (2 rows)

Cited by (2 rows)

References (1 rows)

Name	Order	Citations	PageRank
Dingquan Wang	1	11	2.51
Jason Eisner	2	1825	173.00

1