Abstract | ||
---|---|---|
We consider data sets that consist of n-dimensional binary vectors representing positive and negative examples for some (possibly unknown) phenomenon. A subset S of the attributes (or variables) of such a data set is called a support set if the positive and negative examples can be distinguished by using only the attributes in S. In this paper we study the problem of finding small support sets, a frequently arising task in various fields, including knowledge discovery, data mining, learning theory, logical analysis of data, etc. We study the distribution of support sets in randomly generated data, and discuss why finding small support sets is important. We propose several measures of separation (real valued set functions over the subsets of attributes), formulate optimization models for finding the smallest subsets maximizing these measures, and devise efficient heuristic algorithms to solve these (typically NP-hard) optimization problems. We prove that several of the proposed heuristics have a guaranteed constant approximation ratio, and we report on computational experience comparing these heuristics with some others from the literature both on randomly generated and on real world data sets. |
Year | DOI | Venue |
---|---|---|
2003 | 10.1023/A:1024653703689 | Ann. Math. Artif. Intell. |
Keywords | Field | DocType |
feature selection,essential attributes,support sets | Set function,Data mining,Heuristic,Data set,Association rule learning,Heuristics,Knowledge extraction,Artificial intelligence,Binary data,Optimization problem,Mathematics,Machine learning | Journal |
Volume | Issue | ISSN |
39 | 3 | 1573-7470 |
Citations | PageRank | References |
8 | 0.54 | 31 |
Authors | ||
5 |
Name | Order | Citations | PageRank |
---|---|---|---|
Endre Boros | 1 | 1779 | 155.63 |
Takashi Horiyama | 2 | 81 | 19.76 |
Toshihide Ibaraki | 3 | 2593 | 385.64 |
Kazuhisa Makino | 4 | 1088 | 102.74 |
Mutsunori Yagiura | 5 | 614 | 46.59 |