Abstract | ||
---|---|---|
We present a newly available on-line resource for Portuguese, a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous to its publication on-line. We focus on the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. |
Year | DOI | Venue |
---|---|---|
2012 | 10.1007/978-3-642-28885-2_13 | PROPOR |
Keywords | Field | DocType |
million word,contemporary portuguese,available on-line resource,user-friendly web interface,new version,reference corpus,linguistic inquiry,large portuguese corpus | Annotation,Computer science,Portuguese,Text corpus,Preprocessor,Artificial intelligence,Corpus linguistics,Natural language processing,User interface | Conference |
Citations | PageRank | References |
2 | 0.47 | 9 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Michel Généreux | 1 | 29 | 3.95 |
Iris Hendrickx | 2 | 285 | 30.91 |
Amália Mendes | 3 | 19 | 8.15 |