CodE Alltag: A German-Language E-Mail Corpus. - Citegraph

Paper Info

Title
CodE Alltag: A German-Language E-Mail Corpus.

Abstract
We introduce CODE ALLTAG, a text corpus composed of German-language e-mails. It is divided into two partitions: the first of these portions, CODE ALLTAG XL, consists of a bulk-size collection drawn from an openly accessible e-mail archive (roughly 1.5M e-mails), whereas the second portion, CODE ALLTAG S+d, is much smaller in size (less than thousand e-mails), yet excels with demographic data from each author of an e-mail. CODE ALLTAG, thus, currently constitutes the largest e-mail corpus ever built. In this paper, we describe, for both parts, the solicitation process for gathering e-mails, present descriptive statistical properties of the corpus, and, for CODE ALLTAG S+d, reveal a compilation of demographic features of the donors of e-mails.

Year	Venue	Keywords
2016	LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION	text corpus,German language,e-mails
Field	DocType	Citations
Computer science,Natural language processing,Artificial intelligence,German	Conference	0
PageRank	References	Authors
0.34	0	5

Authors (5 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Ulrike Krieg-Holz	1	0	0.34
Christian Schuschnig	2	0	0.34
Franz Matthies	3	4	2.17
Benjamin Redling	4	0	0.34
Udo Hahn	5	88	11.14

1