Title
Interactively discovering and ranking desired tuples by data exploration
Abstract
Data exploration—the problem of extracting knowledge from database even if we do not know exactly what we are looking for —is important for data discovery and analysis. However, precisely specifying SQL queries is not always practical, such as “finding and ranking off-road cars based on a combination of Price, Make, Model, Age, Mileage, etc”—not only due to the query complexity (e.g.,the queries may have many if-then-else, and, or and not logic), but also because the user typically does not have the knowledge of all data instances (and their variants). We propose DExPlorer, a system for interactive data exploration. From the user perspective, we propose a simple and user-friendly interface, which allows to: (1) confirm whether a tuple is desired or not, and (2) decide whether a tuple is more preferred than another. Behind the scenes, we jointly use multiple ML models to learn from the above two types of user feedback. Moreover, in order to effectively involve human-in-the-loop, we need to select a set of tuples for each user interaction so as to solicit feedback. Therefore, we devise question selection algorithms, which consider not only the estimated benefit of each tuple, but also the possible partial orders between any two suggested tuples. Experiments on real-world datasets show that DExPlorer outperforms existing approaches in effectiveness.
Year
DOI
Venue
2022
10.1007/s00778-021-00714-0
The VLDB Journal
Keywords
DocType
Volume
Data exploration, SQL query, Ranking, Decision, Human-in-the-loop
Journal
31
Issue
ISSN
Citations 
4
1066-8888
0
PageRank 
References 
Authors
0.34
42
9
Name
Order
Citations
PageRank
Xuedi Qin1675.59
Chengliang Chai212415.45
Yuyu Luo37610.07
Tianyu Zhao4173.06
Nan Tang595459.62
Guoliang Li63077154.70
Jianhua Feng72713121.30
Xiang Yu811.03
Mourad Ouzzani91213120.36