Title
Fairness in Data Wrangling
Abstract
At the core of many data analysis processes lies the challenge of properly gathering and transforming data. This problem is known as data wrangling, and it can become even more challenging if the data sources that need to be transformed are heterogeneous and autonomous, i.e., have different origins, and if the output is meant to be used as a training dataset, thus, making it paramount for the dataset to be fair. Given the rise in usage of artificial intelligence (AI) systems for a variety of domains, it is necessary to take into account fairness issues while building these systems. In this paper, we aim to bridge the gap between gathering the data and making the datasets fair by proposing a method for performing data wrangling while considering fairness. To this end, our method comprises a data wrangling pipeline whose behaviour can be adjusted through a set of parameters. Based on the fairness metrics run on the output datasets, the system plans a set of data wrangling interventions with the aim of lowering the bias in the output dataset. The system uses Tabu Search to explore the space of candidate interventions. In this paper we consider two potential sources of dataset bias: those arising from unequal representation of sensitive groups and those arising from hidden biases through proxies for sensitive attributes. The approach is evaluated empirically.
Year
DOI
Venue
2020
10.1109/IRI49571.2020.00056
2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)
Keywords
DocType
ISBN
data wrangling,fairness,bias,sample size disparity,proxy attribute,training dataset
Conference
978-1-7281-1055-4
Citations 
PageRank 
References 
0
0.34
11
Authors
4
Name
Order
Citations
PageRank
Lacramioara Mazilu142.78
Norman W. Paton23059359.26
Nikolaos Konstantinou38810.73
Alvaro A. A. Fernandes490477.71