Abstract | ||
---|---|---|
Clustering by maximizing the dependency between twopaired, continuous-valued multivariate data sets is studied. The new method,
associative clustering (AC), maximizes a Bayes factor between two clustering models differing only in one respect: whether the clusterings of the two
data sets are dependent or independent. The model both extends Information Bottleneck (IB)-type dependency modeling to continuous-valued
data and offers it a well-founded and asymptotically well-behaving criterion for small data sets: With suitable prior assumptions
the Bayes factor becomes equivalent to the hypergeometric probability of a contingency table, while for large data sets it
becomes the standard mutual information. An optimization algorithm is introduced, with empirical comparisons to a combination
of IB and K-means, and to plain K-means. Two case studies cluster genes 1) to find dependencies between gene expression and
transcription factor binding, and 2) to find dependencies between expression in different organisms.
|
Year | DOI | Venue |
---|---|---|
2004 | 10.1007/978-3-540-30115-8_37 | ECML |
DocType | Citations | PageRank |
Conference | 5 | 0.54 |
References | Authors | |
9 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Janne Sinkkonen | 1 | 231 | 21.36 |
Janne Nikkilä | 2 | 200 | 16.65 |
Leo Lahti | 3 | 25 | 6.53 |
Samuel Kaski | 4 | 2755 | 245.52 |