Fairness in Data Wrangling - Citegraph

Paper Info

Title
Fairness in Data Wrangling

Abstract
At the core of many data analysis processes lies the challenge of properly gathering and transforming data. This problem is known as data wrangling, and it can become even more challenging if the data sources that need to be transformed are heterogeneous and autonomous, i.e., have different origins, and if the output is meant to be used as a training dataset, thus, making it paramount for the dataset to be fair. Given the rise in usage of artificial intelligence (AI) systems for a variety of domains, it is necessary to take into account fairness issues while building these systems. In this paper, we aim to bridge the gap between gathering the data and making the datasets fair by proposing a method for performing data wrangling while considering fairness. To this end, our method comprises a data wrangling pipeline whose behaviour can be adjusted through a set of parameters. Based on the fairness metrics run on the output datasets, the system plans a set of data wrangling interventions with the aim of lowering the bias in the output dataset. The system uses Tabu Search to explore the space of candidate interventions. In this paper we consider two potential sources of dataset bias: those arising from unequal representation of sensitive groups and those arising from hidden biases through proxies for sensitive attributes. The approach is evaluated empirically.

Year	DOI	Venue
2020	10.1109/IRI49571.2020.00056	2020 IEEE 21st International Conference on Information Reuse and Integration for Data Science (IRI)
Keywords	DocType	ISBN
data wrangling,fairness,bias,sample size disparity,proxy attribute,training dataset	Conference	978-1-7281-1055-4
Citations	PageRank	References
0	0.34	11
Authors
4

Authors (4 rows)

Cited by (0 rows)

References (11 rows)

Name	Order	Citations	PageRank
Lacramioara Mazilu	1	4	2.78
Norman W. Paton	2	3059	359.26
Nikolaos Konstantinou	3	88	10.73
Alvaro A. A. Fernandes	4	904	77.71

1