Data sanitization against adversarial label contamination based on data complexity. - Citegraph

Paper Info

Title
Data sanitization against adversarial label contamination based on data complexity.

Abstract
Machine learning techniques may suffer from adversarial attack in which an attacker misleads a learning process by manipulating training samples. Data sanitization is one of countermeasures against poisoning attack. It is a data pre-processing method which filters suspect samples before learning. Recently, a number of data sanitization methods are devised for label flip attack, but their flexibility is limited due to specific assumptions. It is observed that abrupt label flip caused by attack changes complexity of classification. A data sanitization method based on data complexity, which is a measure of the difficulty of classification on a dataset, is proposed in this paper. Our method measures the data complexity of a training set after removing a sample and its nearest samples. Contaminated samples are then distinguished from untainted samples according to their data complexity values. Experimental results support the idea that data complexity can be used to identify attack samples. The proposed method achieves a better result than the current sanitization method in terms of detection accuracy for well known security application problems.

Year	DOI	Venue
2018	10.1007/s13042-016-0629-5	Int. J. Machine Learning & Cybernetics
Keywords	Field	DocType
Adversarial learning, Poisoning attack, Data sanitization, Data complexity	Training set,Data mining,Computer science,Suspect,Adversarial system,Data complexity,Data sanitization	Journal
Volume	Issue	ISSN
9	6	1868-808X
Citations	PageRank	References
2	0.37	32
Authors
4

Authors (4 rows)

Cited by (2 rows)

References (32 rows)

Name	Order	Citations	PageRank
Patrick P. K. Chan	1	271	33.82
Zhimin He	2	536	35.90
Hongjiang Li	3	4	0.74
Chien-Chang Hsu	4	76	11.68

1