Title
Data sanitization against adversarial label contamination based on data complexity.
Abstract
Machine learning techniques may suffer from adversarial attack in which an attacker misleads a learning process by manipulating training samples. Data sanitization is one of countermeasures against poisoning attack. It is a data pre-processing method which filters suspect samples before learning. Recently, a number of data sanitization methods are devised for label flip attack, but their flexibility is limited due to specific assumptions. It is observed that abrupt label flip caused by attack changes complexity of classification. A data sanitization method based on data complexity, which is a measure of the difficulty of classification on a dataset, is proposed in this paper. Our method measures the data complexity of a training set after removing a sample and its nearest samples. Contaminated samples are then distinguished from untainted samples according to their data complexity values. Experimental results support the idea that data complexity can be used to identify attack samples. The proposed method achieves a better result than the current sanitization method in terms of detection accuracy for well known security application problems.
Year
DOI
Venue
2018
10.1007/s13042-016-0629-5
Int. J. Machine Learning & Cybernetics
Keywords
Field
DocType
Adversarial learning, Poisoning attack, Data sanitization, Data complexity
Training set,Data mining,Computer science,Suspect,Adversarial system,Data complexity,Data sanitization
Journal
Volume
Issue
ISSN
9
6
1868-808X
Citations 
PageRank 
References 
2
0.37
32
Authors
4
Name
Order
Citations
PageRank
Patrick P. K. Chan127133.82
Zhimin He253635.90
Hongjiang Li340.74
Chien-Chang Hsu47611.68