Title
Redundancy, Deduction Schemes, and Minimum-Size Bases for Association Rules
Abstract
Association rules are among the most widely employed data analysis methods in the field of Data Mining. An association rule is a form of partial implication between two sets of binary variables. In the most common approach, association rules are parametrized by a lower bound on their confidence, which is the empirical conditional probability of their consequent given the antecedent, and/or by some other parameter bounds such as "support" or deviation from independence. We study here notions of redundancy among association rules from a fundamental perspective. We see each transaction in a dataset as an interpretation (or model) in the porpositional logic sense, and consider existing notions of redundancy, that is, of logical entailment, among association rules, of the form "any dataset in which this first rule holds must obey also that second rule, therefore the second is redundant". We discuss several existing alternative definitions of redundancy between association rules and provide new characterizations and relationships among them. We show that the main alternatives we discuss correspond actually to just two variants, which differ in the treatment of full-confidence implications. For each of these two notions of redundancy, we provide a sound and complete deduction calculus, and we show how to construct complete bases (that is, axiomatizations) of absolutely minimum size in terms of the number of rules. We explore finally an approach to redundancy with respect to several association rules, and fully characterize its simplest case of two partial premises.
Year
DOI
Venue
2010
10.2168/LMCS-6(2:3)2010
LOGICAL METHODS IN COMPUTER SCIENCE
Keywords
DocType
Volume
Data mining,association rules,implications,redundancy,deductive calculus,optimum bases
Journal
6
Issue
ISSN
Citations 
2
1860-5974
7
PageRank 
References 
Authors
0.63
22
1
Name
Order
Citations
PageRank
José L. Balcázar170162.06