Abstract | ||
---|---|---|
The large-scale data stream problem refers to high-speed information flow which cannot be processed in scalable manner under a traditional computing platform. This problem also imposes expensive labelling cost making the deployment of fully supervised algorithms unfeasible. On the other hand, the problem of semi-supervised large-scale data streams is little explored in the literature because most works are designed in the traditional single-node computing environments while also being fully supervised approaches. This paper offers Weakly Supervised Scalable Teacher Forcing Network (WeScatterNet) to cope with the scarcity of labelled samples and the large-scale data streams simultaneously. WeScatterNet is crafted under distributed computing platform of Apache Spark with a data-free model fusion strategy for model compression after parallel computing stage. It features an open network structure to address the global and local drift problems while integrating a data augmentation, annotation and auto-correction (DA3) method for handling partially labelled data streams. The performance of WeScatterNet is numerically evaluated in the six large-scale data stream problems with only 25% label proportions. It shows highly competitive performance even if compared with fully supervised learners with 100% label proportions. |
Year | DOI | Venue |
---|---|---|
2021 | 10.1016/j.ins.2021.06.075 | Information Sciences |
Keywords | DocType | Volume |
Evolving fuzzy systems,Concept drifts,Data streams,Fuzzy classifiers | Journal | 576 |
ISSN | Citations | PageRank |
0020-0255 | 2 | 0.37 |
References | Authors | |
0 | 5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Mahardhika Pratama | 1 | 702 | 50.02 |
Choiru Za'in | 2 | 7 | 1.79 |
Edwin Lughofer | 3 | 1940 | 99.72 |
Eric Pardede | 4 | 959 | 122.09 |
Dwi A. P. Rahayu | 5 | 2 | 0.37 |