Title
Implicit Mixtures of Restricted Boltzmann Machines
Abstract
We present a mixture model whose components are Restricted Boltzmann Ma- chines (RBMs). This possibility has not been considered before because com- puting the partition function of an RBM is intractable, which appears to make learning a mixture of RBMs intractable as well. Surprisingly, when formulated as a third-order Boltzmann machine, such a mixture model can be learned tractably using contrastive divergence. The energy function of the model captures three- way interactions among visible units, hidden units, and a single hidden discrete variable that represents the cluster label. The distinguis hing feature of this model is that, unlike other mixture models, the mixing proportions are not explicitly parameterized. Instead, they are defined implicitly via the energy function and depend on all the parameters in the model. We present results for the MNIST and NORB datasets showing that the implicit mixture of RBMs learns clusters that reflect the class structure in the data. The mixture is created by assigning a mixing proportion to each of the component models and it is typically fitted by using the EM algorithm that alternat es between two steps. The E-step uses property 1 to compute the posterior probability that each datapoint came from each of the component models. The posterior is also called the "responsibility" o f each model for a datapoint. The M-step uses property 2 to update the parameters of each model to raise the responsibility-weighted sum of the log probabilities it assigns to the datapoints. The M-st ep also changes the mixing proportions of the component models to match the proportion of the training data that they are responsible for. Restricted Boltzmann Machines (5) model binary data-vectors using binary latent variables. They are considerably more powerful than mixture of multivariate Bernoulli models 1 because they allow many of the latent variables to be on simultaneously so the number of alternative latent state vectors is exponential in the number of latent variables rather than being linear in this number as it is with a mixture of Bernoullis. An RBM withN hidden units can be viewed as a mixture of 2N Bernoulli models, one per latent state vector, with a lot of parameter s haring between the 2N component models and with the 2N mixing proportions being implicitly determined by the same parameters.
Year
Venue
Keywords
2008
NIPS
em algorithm,posterior probability,component model,boltzmann machine,partition function,latent variable,mixture model
Field
DocType
Citations 
Cluster (physics),Parameterized complexity,Boltzmann machine,MNIST database,Computer science,Partition function (statistical mechanics),Artificial intelligence,Contrastive divergence,Mixture model,Machine learning,Discrete variable
Conference
41
PageRank 
References 
Authors
12.72
9
2
Name
Order
Citations
PageRank
Vinod Nair11658134.40
geoffrey e hinton2404354751.69