Abstract | ||
---|---|---|
Semi-supervised learning, i.e. learning from both labeled and unlabeled data has received signicant attention in the machine learning literature in recent years. Still our understanding of the theoretical foundations of the usefulness of unla- beled data remains somewhat limited. The simplest and the best understood sit- uation is when the data is described by an identiable mixture model, and where each class comes from a pure component. This natural setup and its implications ware analyzed in (11, 5). One important result was that in certain regimes, labeled data becomes exponentially more valuable than unlabeled data. However, in most realistic situations, one would not expect that the data comes from a parametric mixture distribution with identiable components. There have been recent efforts to analyze the non-parametric situation, for example, ìclusterî and ìmanifoldî assumptions have been suggested as a basis for analysis. Still, a satisfactory and fairly complete theoretical understanding of the nonparametric problem, similar to that in (11, 5) has not yet been developed. In this paper we investigate an intermediate situation, when the data comes from a probability distribution, which can be modeled, but not perfectly, by an identiable mixture distribution. This seems applicable to many situation, when, for example, a mixture of Gaussians is used to model the data. the contribution of this paper is an analysis of the role of labeled and unlabeled data depending on the amount of imperfection in the model. |
Year | Venue | Keywords |
---|---|---|
2007 | NIPS | probability distribution,machine learning,mixture distribution,mixture of gaussians,semi supervised learning,mixture model |
Field | DocType | Citations |
Mixture distribution,Semi-supervised learning,Imperfect,Computer science,Nonparametric statistics,Parametric statistics,Probability distribution,Artificial intelligence,Mixture model,Machine learning,Manifold | Conference | 13 |
PageRank | References | Authors |
0.82 | 8 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
Kaushik Sinha | 1 | 244 | 17.81 |
Belkin, Mikhail | 2 | 3341 | 196.65 |