Title
Finding Essential Attributes from Binary Data
Abstract
We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets.
Year
DOI
Venue
2003
10.1023/A:1024653703689
Ann. Math. Artif. Intell.
Keywords
Field
DocType
feature selection,essential attributes,support sets
Set function,Data mining,Heuristic,Data set,Association rule learning,Heuristics,Knowledge extraction,Artificial intelligence,Binary data,Optimization problem,Mathematics,Machine learning
Journal
Volume
Issue
ISSN
39
3
1573-7470
Citations 
PageRank 
References 
8
0.54
31
Authors
5
Name
Order
Citations
PageRank
Endre Boros11779155.63
Takashi Horiyama28119.76
Toshihide Ibaraki32593385.64
Kazuhisa Makino41088102.74
Mutsunori Yagiura561446.59