Title
The JOS Morphosyntactically Tagged Corpus of Slovene
Abstract
The JOS morphosyntactic resources for Slovene consist of the specifications, lexicon, and two corpora: jos100k, a 100,0 00 word balanced monolingual sampled corpus annotated with hand validated morphosyntactic descriptions (MSDs) and lemmas, and jos1M, the 1 million word partially hand validated corpus. The two corpora have been sampled from the 600M word Slovene reference corpus FidaPLUS. The JOS resources have a standardised encoding, with the MULTEXT-East-type morphosyntactic specifications and the cor pora encoded according to the Text Encoding Initiative Guidelines P5. JOS resources are available as a dataset for research under the Creative Commons licence and are meant to facilitate developments of HLT for Slovene.
Year
Venue
Keywords
2008
LREC
standardisation
Field
DocType
Citations 
Computer science,Lexicon,Artificial intelligence,Natural language processing,Lemma (mathematics),Creative commons
Conference
7
PageRank 
References 
Authors
1.16
7
2
Name
Order
Citations
PageRank
Tomaz Erjavec153760.89
Simon Krek2244.18