A Universal Part-of-Speech Tagset - Citegraph

Paper Info

Title
A Universal Part-of-Speech Tagset

Abstract
To facilitate future research in unsupervised induction of syntactic structure and to standardize best-practices, we propose a tagset that consists of twelve universal part-of-speech categories. In addition to the tagset, we develop a mapping from 25 different treebank tagsets to this universal set. As a result, when combined with the original treebank data, this universal tagset and mapping produce a dataset consisting of common parts-of-speech for 22 different languages. We highlight the use of this resource via three experiments, that (1) compare tagging accuracies across languages, (2) present an unsupervised grammar induction approach that does not use gold standard part-of-speech tags, and (3) use the universal tags to transfer dependency parsers between languages, achieving state-of-the-art results.

Year	Venue	Keywords
2011	LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	Part-of-Speech Tagging,Multilinguality,Annotation Guidelines
DocType	Citations	PageRank
Journal	260	10.62
References	Authors
24	3

Search Limit

100260

Authors (3 rows)

Cited by (100 rows)

References (24 rows)

Name	Order	Citations	PageRank
Slav Petrov	1	2405	107.56
Dipanjan Das	2	1619	75.14
Ryan McDonald	3	4653	245.25

1