Investigation of biases in identity linkage DataSets - Citegraph

Paper Info

Title
Investigation of biases in identity linkage DataSets

Abstract
In social networks, the problem of identity linkage is to find whether a pair of user identities on two social networks belong to the same individual or not. Prior works typically first collect ground truth datasets of user identities across social networks belonging to the same individuals and then build a machine learning model driven by features from user identities. User behaviors in different social networks drive the construction of these datasets, and as a consequence, behavioral biases get manifested in them. Our work performs a detailed investigation into these dataset biases, a work which has mostly remained under-explored in the identity linkage research. More specifically, we characterize, detect, and quantify behavioral biases in the dataset that manifest in the form of lexical differences in user-generated content, particularly in usernames and display names configured by users. We study these biases on more than 1 million user identity pairs obtained by leveraging two user behaviors, namely cross-posting and self-disclosure. We find that users who self-disclose their usernames and display names on different social networks show higher lexical similarity than users who cross-post. These behavioral biases lower down the performance (precision and recall) of learning models by 5-20%. Inspired by discrimination measurement metrics, we propose and implement a framework to quantify the extent of these biases and find that 15--20% of test data get affected.

Year	DOI	Venue
2020	10.1145/3341105.3374015	SAC '20: The 35th ACM/SIGAPP Symposium on Applied Computing Brno Czech Republic March, 2020
Keywords	DocType	ISBN
Bias Detection, Online Social Networks, Data Mining	Conference	978-1-4503-6866-7
Citations	PageRank	References
0	0.34	0
Authors
3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Rishabh Kaushal	1	0	1.35
Shubham Gupta	2	0	0.34
Ponnurangam Kumaraguru	3	192	16.59

1