Title
Improvement In The Efficiency Of A Distributed Multi-Label Text Classification Algorithm Using Infrastructure And Task-Related Data
Abstract
Distributed computing technologies allow a wide variety of tasks that use large amounts of data to be solved. Various paradigms and technologies are already widely used, but many of them are lacking when it comes to the optimization of resource usage. The aim of this paper is to present the optimization methods used to increase the efficiency of distributed implementations of a text-mining model utilizing information about the text-mining task extracted from the data and information about the current state of the distributed environment obtained from a computational node, and to improve the distribution of the task on the distributed infrastructure. Two optimization solutions are developed and implemented, both based on the prediction of the expected task duration on the existing infrastructure. The solutions are experimentally evaluated in a scenario where a distributed tree-based multi-label classifier is built based on two standard text data collections.
Year
DOI
Venue
2019
10.3390/informatics6010012
INFORMATICS-BASEL
Keywords
DocType
Volume
text classification, multi-label classification, distributed text-mining, task assignment, resource optimization, grid computing
Journal
6
Issue
Citations 
PageRank 
1
0
0.34
References 
Authors
0
2
Name
Order
Citations
PageRank
Martin Sarnovsky193.26
Marek Olejnik200.34