Title
Towards the Overcome of Performance Pitfalls in Data Stream Mining Tools
Abstract
Data stream mining is an essential task in today's scientific community. It allows machine learning models to be updated over time as new data becomes available. Three pillars should be accounted for when selecting an appropriate algorithm for data stream mining: accuracy, processing time, and memory consumption. To develop and assess machine learning models in streaming scenarios, different tools have been developed, where the Massive Online Analysis, written in Java, and scikit-multiflow, written in Python, are in the spotlight. Despite the ease of use of both tools, neither are focused on performance, which puts in jeopardy the usage of the computational resources. In this paper, we show that with the right tools, Python libraries reach performance comparable to C/C++. More specifically, we show how optimized implementations in scikit-multiflow using low-level languages, i.e., C++, C++ with Intel Intrinsics, and Rust; with bindings to Python vastly overcome existing tools in computational resources usage while keeping predictive performance intact.
Year
DOI
Venue
2021
10.1109/IJCNN52387.2021.9533375
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DocType
ISSN
Citations 
Conference
2161-4393
0
PageRank 
References 
Authors
0.34
0
3
Name
Order
Citations
PageRank
Lucca Portes Cavalheiro100.68
Marco Antonio Zanata Alves200.34
Jean Paul Barddal314016.77