Title
Unsupervised Active Learning Techniques For Labeling Training Sets: An Experimental Evaluation On Sequential Data
Abstract
Many real-world applications, such as those related to sensors, allow collecting large amounts of inexpensive unlabeled sequential data. However, the use of supervised machine learning methods is frequently hindered by the high costs involved in gathering labels for such data. These methods assume the availability of a considerable amount of labeled data to build an accurate classification model. To overcome this bottleneck, active learning methods are designed to selectively label the most informative examples instead of requesting all true labels. Although active learning has been widely used in many problems, most of the methods consider the presence of labeled data or some prior knowledge about the problem, as the number of classes. Differently, in this paper, we are interested in the realistic scenario where the active learning is performed from scratch on a fully unlabeled dataset and with the absence of any classifier or prior knowledge about the data. In general, the methods that consider fully unlabeled data use random sampling to select examples to label. The goal of this work is to show a broad experimental evaluation with different unsupervised active learning methods to select examples from fully unlabeled sequential data. We evaluated methods based on clustering algorithms and centrality measures from graphs for instance selection and the performance of supervised and semi-supervised learning algorithms in the classification task. Given our evaluation on a benchmark of sequential data and in a case study of insect species classification, we indicated the sampling based on hierarchical clustering or k-Means. These methods present a statistically significantly better performance to the popular random sampling. In addition, they are simple algorithms and readily available in many software packages.
Year
DOI
Venue
2017
10.3233/IDA-163075
INTELLIGENT DATA ANALYSIS
Keywords
Field
DocType
Unsupervised active learning, training set labeling, clustering, centrality measures, sequential data
Sequential data,Semi-supervised learning,Active learning,Pattern recognition,Computer science,Unsupervised learning,Artificial intelligence,Machine learning
Journal
Volume
Issue
ISSN
21
5
1088-467X
Citations 
PageRank 
References 
1
0.36
39
Authors
4
Name
Order
Citations
PageRank
Vinícius M. A. de Souza1336.14
Rafael Rossi2588.20
Gustavo E. Batista3192892.83
Solange Oliveira Rezende420531.02