Title
Drift detection in data stream classification without fully labelled instances
Abstract
Drift detection is an important issue in classification-based stream mining in order to be able to inform the operators in case of unintended changes in the system. Usually, current detection approaches rely on the assumption to have fully supervised labeled streams available, which is often a quite unrealistic scenario in on-line real-world applications. We propose two ways to improve economy and applicability of drift detection: 1.) a semi-supervised approach employing single-pass active learning filters for selecting the most interesting samples for supervising the performance of classifiers and 2.) a fully unsupervised approach based on the overlap degree of classifier's output certainty distributions. Both variants rely on a modified version of the Page-Hinkley test, where a fading factor is introduced to outweigh older samples, making it more flexible to detect successive drift occurrences in a stream. The approaches are compared with the fully supervised variant (SoA) on two real-world on-line applications: the semi-supervised approach is able to detect three real-occurring drifts in these streams with an even lower than resp. the same delay as the supervised variant of about 200 (versus 300) resp. 70 samples, and this by requiring only 20% labelled samples.
Year
DOI
Venue
2015
10.1109/EAIS.2015.7368802
2015 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)
Keywords
DocType
ISSN
data stream classification,drift detection,unsupervised and semi-supervised performance indicators,faded Page-Hinkley test,single-pass active learning filter,sample selection
Conference
2330-4863
Citations 
PageRank 
References 
3
0.43
13
Authors
5
Name
Order
Citations
PageRank
Edwin Lughofer1194099.72
Eva Weigl230.43
Wolfgang Heidl31037.01
Christian Eitzinger416415.33
Thomas Radauer5664.94