Title
Subgroup discover in large size data sets preprocessed using stratified instance selection for increasing the presence of minority classes
Abstract
The subgroup discovery is defined as: ''given a population of individuals and a property of those individuals, we are interested in finding a population of subgroups as large as possible and in having the most unusual statistical characteristic with respect to the property of interest''. The subgroup discovery algorithms have to face the scaling up problem which appears in the evaluation of large size data sets. In this paper we are interested in the extraction of subgroups from large size data sets. To avoid the scaling up problem, we propose the combination of stratification and instance selection algorithms for scaling down the data set before the subgroup discovery task. In addition, two new stratification models are proposed to increase the presence of minority classes in data sets, which affects to the subgroup discovery process on them. The results show that the subgroup discovery extraction can be executed on large data sets preprocessed independently of the presence of minority classes, which could not be executed in other way.
Year
DOI
Venue
2008
10.1016/j.patrec.2008.08.001
Pattern Recognition Letters
Keywords
Field
DocType
subgroup discovery task,subgroup discovery algorithm,subgroup discovery process,subgroup discovery,subgroup discovery extraction,minority classes,scaling up,large size data set,stratification,instance selection,new stratification model,large size data sets scaling up subgroup discovery instance selection stratification minority classes,minority class,large data,stratified instance selection,large size data sets
Population,Data mining,Data set,Data processing,Instance selection,Business process discovery,Scaling,Mathematics
Journal
Volume
Issue
ISSN
29
16
Pattern Recognition Letters
Citations 
PageRank 
References 
11
0.54
26
Authors
3
Name
Order
Citations
PageRank
JOSÉ-RAMÓN CANO1542.99
Salvador García2121934.57
Francisco Herrera3273911168.49