Implicit Mixtures of Restricted Boltzmann Machines - Citegraph

Paper Info

Title
Implicit Mixtures of Restricted Boltzmann Machines

Abstract
We present a mixture model whose components are Restricted Boltzmann Ma- chines (RBMs). This possibility has not been considered before because com- puting the partition function of an RBM is intractable, which appears to make learning a mixture of RBMs intractable as well. Surprisingly, when formulated as a third-order Boltzmann machine, such a mixture model can be learned tractably using contrastive divergence. The energy function of the model captures three- way interactions among visible units, hidden units, and a single hidden discrete variable that represents the cluster label. The distinguis hing feature of this model is that, unlike other mixture models, the mixing proportions are not explicitly parameterized. Instead, they are defined implicitly via the energy function and depend on all the parameters in the model. We present results for the MNIST and NORB datasets showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data. The mixture is created by assigning a mixing proportion to each of the component models and it is typically fitted by using the EM algorithm that alternat es between two steps. The E-step uses property 1 to compute the posterior probability that each datapoint came from each of the component models. The posterior is also called the "responsibility" o f each model for a datapoint. The M-step uses property 2 to update the parameters of each model to raise the responsibility-weighted sum of the log probabilities it assigns to the datapoints. The M-st ep also changes the mixing proportions of the component models to match the proportion of the training data that they are responsible for. Restricted Boltzmann Machines (5) model binary data-vectors using binary latent variables. They are considerably more powerful than mixture of multivariate Bernoulli models 1 because they allow many of the latent variables to be on simultaneously so the number of alternative latent state vectors is exponential in the number of latent variables rather than being linear in this number as it is with a mixture of Bernoullis. An RBM withN hidden units can be viewed as a mixture of 2N Bernoulli models, one per latent state vector, with a lot of parameter s haring between the 2N component models and with the 2N mixing proportions being implicitly determined by the same parameters.

Year	Venue	Keywords
2008	NIPS	em algorithm,posterior probability,component model,boltzmann machine,partition function,latent variable,mixture model
Field	DocType	Citations
Cluster (physics),Parameterized complexity,Boltzmann machine,MNIST database,Computer science,Partition function (statistical mechanics),Artificial intelligence,Contrastive divergence,Mixture model,Machine learning,Discrete variable	Conference	41
PageRank	References	Authors
12.72	9	2

Authors (2 rows)

Cited by (41 rows)

References (9 rows)

Name	Order	Citations	PageRank
Vinod Nair	1	1658	134.40
geoffrey e hinton	2	40435	4751.69

1