Title
Feature selection and allocation to diverse subsets for multi-label learning problems with large datasets
Abstract
Feature selection is important phase in machine learning and in the case of multi-label classification, it can be considerably challenging. In like manner, finding the best subset of good features is involved and difficult when the dataset has significantly large number of features (more than a thousand). In this paper we address the problem of feature selection for multi-label classification with large number of features. The proposed method is a hybrid of two phases - preliminary feature selection based on the information value and additional correlation-based selection.We show how with the first phase we can do preliminary selection of features from tens of thousands to couple of hundred, and then with the second phase we can make fine-grained feature selection with more sophisticated but computationally intensive methods. Finally, we analyze the ways of allocating the selected features to diverse subsets, which are suitable for training of ensembles of classifiers.
Year
DOI
Venue
2014
10.15439/2014F500
Computer Science and Information Systems
Keywords
Field
DocType
feature selection,learning (artificial intelligence),pattern classification,set theory,computationally intensive methods,correlation-based selection,feature allocation,fine-grained feature selection,machine learning,multilabel classification,multilabel learning problems,two phase preliminary feature selection
Resource management,Data mining,Ensembles of classifiers,Feature selection,Pattern recognition,Computer science,Multi label learning,Correlation,Feature (machine learning),Artificial intelligence,Information value,Machine learning
Conference
Volume
ISSN
Citations 
2
2300-5963
4
PageRank 
References 
Authors
0.47
25
4
Name
Order
Citations
PageRank
Eftim Zdravevski15716.51
Petre Lameski26113.84
Andrea Kulakov39814.79
Dejan Gjorgjevikj433215.85