Towards the Overcome of Performance Pitfalls in Data Stream Mining Tools - Citegraph

Paper Info

Title
Towards the Overcome of Performance Pitfalls in Data Stream Mining Tools

Abstract
Data stream mining is an essential task in today's scientific community. It allows machine learning models to be updated over time as new data becomes available. Three pillars should be accounted for when selecting an appropriate algorithm for data stream mining: accuracy, processing time, and memory consumption. To develop and assess machine learning models in streaming scenarios, different tools have been developed, where the Massive Online Analysis, written in Java, and scikit-multiflow, written in Python, are in the spotlight. Despite the ease of use of both tools, neither are focused on performance, which puts in jeopardy the usage of the computational resources. In this paper, we show that with the right tools, Python libraries reach performance comparable to C/C++. More specifically, we show how optimized implementations in scikit-multiflow using low-level languages, i.e., C++, C++ with Intel Intrinsics, and Rust; with bindings to Python vastly overcome existing tools in computational resources usage while keeping predictive performance intact.

Year	DOI	Venue
2021	10.1109/IJCNN52387.2021.9533375	2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)
DocType	ISSN	Citations
Conference	2161-4393	0
PageRank	References	Authors
0.34	0	3

Authors (3 rows)

Cited by (0 rows)

References (0 rows)

Name	Order	Citations	PageRank
Lucca Portes Cavalheiro	1	0	0.68
Marco Antonio Zanata Alves	2	0	0.34
Jean Paul Barddal	3	140	16.77

1