Title
Fully parsing the Penn Treebank
Abstract
We present a two stage parser that recovers Penn Treebank style syntactic analyses of new sentences including skeletal syntactic structure, and, for the first time, both function tags and empty categories. The accuracy of the first-stage parser on the standard Parseval metric matches that of the (Collins, 2003) parser on which it is based, despite the data fragmentation caused by the greatly enriched space of possible node labels. This first stage simultaneously achieves near state-of-the-art performance on recovering function tags with minimal modifications to the underlying parser, modifying less than ten lines of code. The second stage achieves state-of-the-art performance on the recovery of empty categories by combining a linguistically-informed architecture and a rich feature set with the power of modern machine learning methods.
Year
DOI
Venue
2006
10.3115/1220835.1220859
HLT-NAACL
Keywords
Field
DocType
function tag,enriched space,stage parser,penn treebank,syntactic analysis,empty category,data fragmentation,skeletal syntactic structure,underlying parser,state-of-the-art performance,first-stage parser,machine learning,lines of code,information architecture
Recursive descent parser,Computer science,Simple LR parser,Speech recognition,GLR parser,Natural language processing,Treebank,LALR parser,Artificial intelligence,Parsing,Parser combinator,Canonical LR parser
Conference
Citations 
PageRank 
References 
46
2.85
14
Authors
3
Name
Order
Citations
PageRank
Ryan Gabbard1837.45
Mitchell Marcus2365156.84
Seth Kulick322129.66