Title
Solving inverse frequent itemset mining with infrequency constraints via large-scale linear programs
Abstract
Inverse frequent set mining (IFM) is the problem of computing a transaction database D satisfying given support constraints for some itemsets, which are typically the frequent ones. This article proposes a new formulation of IFM, called IFMI (IFM with infrequency constraints), where the itemsets that are not listed as frequent are constrained to be infrequent; that is, they must have a support less than or equal to a specified unique threshold. An instance of IFMI can be seen as an instance of the original IFM by making explicit the infrequency constraints for the minimal infrequent itemsets, corresponding to the so-called negative generator border defined in the literature. The complexity increase from PSPACE (complexity of IFM) to NEXP (complexity of IFMI) is caused by the cardinality of the negative generator border, which can be exponential in the original input size. Therefore, the article introduces a specific problem parameter κ that computes an upper bound to this cardinality using a hypergraph interpretation for which minimal infrequent itemsets correspond to minimal transversals. By fixing a constant k, the article formulates a k-bounded definition of the problem, called k-IFMI, that collects all instances for which the value of the parameter κ is less than or equal to k—its complexity is in PSPACE as for IFM. The bounded problem is encoded as an integer linear program with a large number of variables (actually exponential w.r.t. the number of constraints), which is thereafter approximated by relaxing integer constraints—the decision problem of solving the linear program is proven to be in NP. In order to solve the linear program, a column generation technique is used that is a variation of the simplex method designed to solve large-scale linear programs, in particular with a huge number of variables. The method at each step requires the solution of an auxiliary integer linear program, which is proven to be NP hard in this case and for which a greedy heuristic is presented. The resulting overall column generation solution algorithm enjoys very good scaling as evidenced by the intensive experimentation, thereby paving the way for its application in real-life scenarios.
Year
DOI
Venue
2013
10.1145/2541268.2541271
TKDD
Keywords
Field
DocType
integer linear program,decision problem,infrequency constraint,bounded problem,linear program,large-scale linear program,original ifm,minimal infrequent itemsets,inverse frequent itemset mining,specific problem parameter,auxiliary integer linear program,inverse problem
Data mining,Decision problem,Column generation,Simplex algorithm,Upper and lower bounds,Computer science,Algorithm,Cardinality,Greedy algorithm,PSPACE,Linear programming
Journal
Volume
Issue
ISSN
7
4
1556-4681
Citations 
PageRank 
References 
8
0.57
43
Authors
4
Name
Order
Citations
PageRank
Antonella Guzzo149739.90
Luigi Moccia219211.25
Domenico Sacca31936579.90
Edoardo Serra46610.23