Title
HamleDT 2.0: Thirty Dependency Treebanks Stanfordized.
Abstract
We present HamleDT 2.0 (HArmonized Multi-LanguagE Dependency Treebank). HamleDT 2.0 is a collection of 30 existing treebanks harmonized into a common annotation style, the Prague Dependencies, and further transformed into Stanford Dependencies, a treebank annotation style that became popular in recent years. We use the newest basic Universal Stanford Dependencies, without added language-specific subtypes. We describe both of the annotation styles, including adjustments that were necessary to make, and provide details about the conversion process. We also discuss the differences between the two styles, evaluating their advantages and disadvantages, and note the effects of the differences on the conversion. We regard the stanfordization as generally successful, although we admit several shortcomings, especially in the distinction between direct and indirect objects, that have to be addressed in future. We release part of HamleDT 2.0 freely; we are not allowed to redistribute the whole dataset, but we do provide the conversion pipeline.
Year
Venue
Keywords
2014
LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
treebanks,Stanford dependencies,harmonization
Field
DocType
Citations 
Annotation,Computer science,Treebank,Artificial intelligence,Natural language processing
Conference
13
PageRank 
References 
Authors
0.78
13
6
Name
Order
Citations
PageRank
Rudolf Rosa110914.44
Jan Masek2376.38
David Marecek31148.57
Martin Popel426921.27
daniel zeman543437.62
Zdenek Zabokrtský619322.23