Abstract | ||
---|---|---|
Biclustering techniques have been widely used to identify homogeneous subgroups within large data matrices, such as subsets of genes similarly expressed across subsets of patients. Mining a max-sum sub-matrix is a related but distinct problem for which one looks for a (non-necessarily contiguous) rectangular sub-matrix with a maximal sum of its entries. Le Van et al. [7] already illustrated its applicability to gene expression analysis and addressed it with a constraint programming (CP) approach combined with large neighborhood search (LNS). In this work, we exhibit some key properties of this NP-hard problem and define a bounding function such that larger problems can be solved in reasonable time. The use of these properties results in an improved CP-LNS implementation evaluated here. Two additional algorithms are also proposed in order to exploit the highlighted characteristics of the problem: a CP approach with a global constraint (CPGC) and a mixed integer linear programming (MILP). Practical experiments conducted both on synthetic and real gene expression data exhibit the characteristics of these approaches and their relative benefits over the CP-LNS method. Overall, the CPGC approach tends to be the fastest to produce a good solution. Yet, the MILP formulation is arguably the easiest to formulate and can also be competitive. |
Year | DOI | Venue |
---|---|---|
2017 | 10.1007/978-3-319-78680-3_5 | Lecture Notes in Artificial Intelligence |
DocType | Volume | ISSN |
Conference | 10785 | 0302-9743 |
Citations | PageRank | References |
0 | 0.34 | 0 |
Authors | ||
3 |
Name | Order | Citations | PageRank |
---|---|---|---|
Vincent Branders | 1 | 0 | 1.69 |
Pierre Schaus | 2 | 127 | 24.63 |
Pierre Dupont | 3 | 54 | 2.66 |