Title | ||
---|---|---|
How comparable are parallel corpora? Measuring the distribution of general vocabulary and connectives |
Abstract | ||
---|---|---|
In this paper, we question the homogeneity of a large parallel corpus by measuring the similarity between various sub-parts. We compare results obtained using a general measure of lexical similarity based on χ2 and by counting the number of discourse connectives. We argue that discourse connectives provide a more sensitive measure, revealing differences that are not visible with the general measure. We also provide evidence for the existence of specific characteristics defining translated texts as opposed to non-translated ones, due to a universal tendency for explicitation. |
Year | Venue | Keywords |
---|---|---|
2011 | BUCC@ACL | lexical similarity,sensitive measure,general measure,general vocabulary,specific characteristic,revealing difference,discourse connective,universal tendency,large parallel corpus,various sub-parts,homogeneity,measures,similarity,corpora |
Field | DocType | Citations |
Lexical similarity,Homogeneity (statistics),Computer science,Parallel corpora,Natural language processing,Artificial intelligence,Linguistics,Vocabulary | Conference | 12 |
PageRank | References | Authors |
0.98 | 3 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Bruno Cartoni | 1 | 62 | 6.93 |
Sandrine Zufferey | 2 | 49 | 4.98 |
thomas meyer | 3 | 110 | 9.04 |
Andrei Popescu-Belis | 4 | 573 | 64.13 |