Title
Pairwise Comparative Classification for Translator Stylometric Analysis.
Abstract
In this article, we present a new type of classification problem, which we call Comparative Classification Problem (CCP), where we use the term data record to refer to a block of instances. Given a single data record with n instances for n classes, the CCP problem is to map each instance to a unique class. This problem occurs in a wide range of applications where the independent and identically distributed assumption is broken down. The primary difference between CCP and classical classification is that in the latter, the assignment of a translator to one record is independent of the assignment of a translator to a different record. In CCP, however, the assignment of a translator to one record within a block excludes this translator from further assignments to any other record in that block. The interdependency in the data poses challenges for techniques relying on the independent and identically distributed (iid) assumption. In the Pairwise CCP (PWCCP), a pair of records is grouped together. The key difference between PWCCP and classical binary classification problems is that hidden patterns can only be unmasked by comparing the instances as pairs. In this article, we introduce a new algorithm, PWC4.5, which is based on C4.5, to manage PWCCP. We first show that a simple transformation—that we call Gradient-Based Transformation (GBT)—can fix the problem of iid in C4.5. We then evaluate PWC4.5 using two real-world corpora to distinguish between translators on Arabic-English and French-English translations. While the traditional C4.5 failed to distinguish between different translators, GBT demonstrated better performance. Meanwhile, PWC4.5 consistently provided the best results over C4.5 and GBT.
Year
DOI
Venue
2016
10.1145/2898997
ACM Trans. Asian & Low-Resource Lang. Inf. Process.
Keywords
Field
DocType
Arabic translation,classification,translator stylometry
Pairwise comparison,Binary classification,Computer science,Artificial intelligence,Natural language processing,Independent and identically distributed random variables
Journal
Volume
Issue
ISSN
16
1
2375-4699
Citations 
PageRank 
References 
0
0.34
26
Authors
3
Name
Order
Citations
PageRank
Heba El-Fiqi182.14
Eleni Petraki2196.20
Hussein A. Abbass31503144.85