Abstract | ||
---|---|---|
We introduce CODE ALLTAG, a text corpus composed of German-language e-mails. It is divided into two partitions: the first of these portions, CODE ALLTAG XL, consists of a bulk-size collection drawn from an openly accessible e-mail archive (roughly 1.5M e-mails), whereas the second portion, CODE ALLTAG S+d, is much smaller in size (less than thousand e-mails), yet excels with demographic data from each author of an e-mail. CODE ALLTAG, thus, currently constitutes the largest e-mail corpus ever built. In this paper, we describe, for both parts, the solicitation process for gathering e-mails, present descriptive statistical properties of the corpus, and, for CODE ALLTAG S+d, reveal a compilation of demographic features of the donors of e-mails. |
Year | Venue | Keywords |
---|---|---|
2016 | LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | text corpus,German language,e-mails |
Field | DocType | Citations |
Computer science,Natural language processing,Artificial intelligence,German | Conference | 0 |
PageRank | References | Authors |
0.34 | 0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Ulrike Krieg-Holz | 1 | 0 | 0.34 |
Christian Schuschnig | 2 | 0 | 0.34 |
Franz Matthies | 3 | 4 | 2.17 |
Benjamin Redling | 4 | 0 | 0.34 |
Udo Hahn | 5 | 88 | 11.14 |