Title
The Icelandic Parsed Historical Corpus (IcePaHC).
Abstract
We describe the background for and building of IcePaHC, a one million word parsed historical corpus of Icelandic which has just been finished. This corpus which is completely free and open contains fragments of 60 texts ranging from the late 12th century to the present. We describe the text selection and text collecting process and discuss the quality of the texts and their conversion to modern Icelandic spelling. We explain why we choose to use a phrase structure Penn style annotation scheme and briefly describe the syntactic annotation process. We also describe a spin-off project which is only in its beginning stages: a parsed historical corpus of Faroese. Finally, we advocate the importance of an open source policy as regards language resources.
Year
Venue
Keywords
2012
LREC 2012 - EIGHTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION
Icelandic,Faroese,treebank,parsed corpus,annotation
Field
DocType
Citations 
Faroese,Annotation,Computer science,Phrase structure rules,Artificial intelligence,Spelling,Natural language processing,Parsing,Syntax,Icelandic
Conference
10
PageRank 
References 
Authors
0.75
5
4
Name
Order
Citations
PageRank
Eiríkur Rögnvaldsson19412.12
anton karl ingason2293.26
Einar Freyr Sigurðsson3131.99
Joel Wallenberg4262.57