Abstract | ||
---|---|---|
We address the problem of learning structured unsupervised models with moment sparsity typical in many natural language induction tasks. For example, in unsu- pervised part-of-speech (POS) induction using hidden Markov models, we intro- duce a bias for words to be labeled by a small number of tags. In order to express this bias of posterior sparsity as opposed to parametric sparsity, we extend the pos- terior regularization framework (7). We evaluate our methods on three languages — English, Bulgarian and Portuguese — showing consistent and significant accu- racy improvement over EM-trained HMMs, and HMMs with sparsity-inducing Dirichlet priors trained by variational EM. We increase accuracy with respect to EM by 2.3%-6.5% in a purely unsupervised setting as well as in a weakly- supervised setting where the closed-class words are provided. Finally, we show improvements using our method when using the induced clusters as features of a discriminative model in a semi-supervised setting. |
Year | Venue | DocType |
---|---|---|
2009 | NIPS | Conference |
Citations | PageRank | References |
1 | 0.38 | 8 |
Authors | ||
4 |
Name | Order | Citations | PageRank |
---|---|---|---|
João Graça | 1 | 295 | 11.19 |
Kuzman Ganchev | 2 | 737 | 35.21 |
Ben Taskar | 3 | 3175 | 209.33 |
Fernando Pereira | 4 | 17717 | 2124.79 |