Title
Customs fraud detection: Assessing the value of behavioural and high-cardinality data under the imbalanced learning issue
Abstract
In this customs fraud detection application, we analyse a unique data set of 9,624,124 records resulting from a collaboration with the Belgian customs administration. They are faced with increasing levels of international trade, which pressurizes regulatory control. Governments therefore rely on data mining to focus their limited resources on the most likely fraud cases. The literature on data mining for customs fraud detection lacks in two main directions that are simultaneously addressed in this paper: (1) behavioural and high-cardinality data types are neglected due to a lack of methodology to include them. We demonstrate that such fine-grained features (e.g. the specific entities such as consignee, consignor and declarant and the commodities involved in a declaration) are very predictive. (2) Studies in the tax domain most often use standard learning algorithms on their fraud detection applications. However, customs data are highly imbalanced and this poses challenges for many inducers. We present a new EasyEnsemble method that integrates a support vector machine base learner in a confidence-rated boosting algorithm. This results in a fast and scalable learner that is able to drastically improve predictive performance over the base application of a support vector machine. The results of our proposed framework reveals high AUC and lift values that translate into an immediate impact on the customs fraud detection domain through an improved retrieval of tax losses and an enhanced deterrence.
Year
DOI
Venue
2020
10.1007/s10044-019-00852-w
Pattern Analysis and Applications
Keywords
DocType
Volume
Fraud detection, Behavioural data, High-cardinality attributes, Imbalanced learning, Support vector machines
Journal
23
Issue
ISSN
Citations 
3
1433-7541
1
PageRank 
References 
Authors
0.36
0
3
Name
Order
Citations
PageRank
Jellis Vanhoeyveld110.36
David Martens2669.52
Bruno Peeters310.70