Title
Authenticated Outlier Mining for Outsourced Databases
Abstract
The Data-Mining-as-a-Service (DMaS) paradigm is becoming the focus of research, as it allows the data owner (client) who lacks expertise and/or computational resources to outsource their data and mining needs to a third-party service provider (server). Outsourcing, however, raises some issues about <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">result integrity</italic> : how could the client verify the mining results returned by the server are both sound and complete? In this paper, we focus on outlier mining, an important mining task. Previous verification techniques use an authenticated data structure (ADS) for correctness authentication, which may incur much space and communication cost. In this paper, we propose a novel solution that returns a probabilistic result integrity guarantee with much cheaper verification cost. The key idea is to insert a set of artificial records ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$AR$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq1-2754493.gif"/></alternatives></inline-formula> s) into the dataset, from which it constructs a set of artificial outliers ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$AO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq2-2754493.gif"/></alternatives></inline-formula> s) and artificial non-outliers ( <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$ANO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>N</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq3-2754493.gif"/></alternatives></inline-formula> s). The <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$AO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq4-2754493.gif"/></alternatives></inline-formula> s and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$ANO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>N</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq5-2754493.gif"/></alternatives></inline-formula> s are used by the client to detect any incomplete and/or incorrect mining results with a probabilistic guarantee. The main challenge that we address is how to construct <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$AR$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq6-2754493.gif"/></alternatives></inline-formula> s so that they do not change the (non-)outlierness of original records, while guaranteeing that the client can identify <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$ANO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>N</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq7-2754493.gif"/></alternatives></inline-formula> s and <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$AO$</tex-math><alternatives><mml:math><mml:mrow><mml:mi>A</mml:mi><mml:mi>O</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="dong-ieq8-2754493.gif"/></alternatives></inline-formula> s without executing mining. Furthermore, we build a strategic game and show that a Nash equilibrium exists only when the server returns correct outliers. Our implementation and experiments demonstrate that our verification solution is efficient and lightweight.
Year
DOI
Venue
2020
10.1109/TDSC.2017.2754493
IEEE Transactions on Dependable and Secure Computing
Keywords
Field
DocType
Servers,Authentication,Probabilistic logic,Outsourcing,Databases,Anomaly detection
Data mining,Anomaly detection,Authentication,Computer science,Server,Correctness,Outlier,Outsourcing,Service provider,Probabilistic logic,Database
Journal
Volume
Issue
ISSN
17
2
1545-5971
Citations 
PageRank 
References 
1
0.34
18
Authors
6
Name
Order
Citations
PageRank
Boxiang Dong1179.45
Hui Wendy Wang210.34
Anna Monreale358142.49
Dino Pedreschi43083244.47
Fosca Giannotti52948253.39
Wenge Guo6153.90