Title
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Centric AI Systems
Abstract
It is widely accepted that data preparation is one of the most time-consuming steps of the machine learning (ML) lifecycle. It is also one of the most important steps, as the quality of data directly influences the quality of a model. In this tutorial, we will discuss the importance and the role of exploratory data analysis (EDA) and data visualisation techniques to find data quality issues and for data preparation, relevant to building ML pipelines. We will also discuss the latest advances in these fields and bring out areas that need innovation. To make the tutorial actionable for practitioners, we will also discuss the most popular open-source packages that one can get started with along with their strengths and weaknesses. Finally, we will discuss on the challenges posed by industry workloads and the gaps to be addressed to make data-centric AI real in industry settings.
Year
DOI
Venue
2022
10.1145/3534678.3542604
KDD '22: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining
DocType
Citations 
PageRank 
Conference
0
0.34
References 
Authors
0
6
Name
Order
Citations
PageRank
Hima Patel114.41
Shanmukha Guttula200.34
Ruhi Sharma Mittal312.38
Naresh Manwani400.34
Laure Berti-Equille558849.90
Abhijit Manatkar600.34