Abstract | ||
---|---|---|
Regularization is a well-recognized powerful strategy to improve the performance of a learning machine and l(q) regularization schemes with 0 < q < ∞ are central in use. It is known that different q leads to different properties of the deduced estimators, say, l(2) regularization leads to a smooth estimator, while l(1) regularization leads to a sparse estimator. Then how the generalization capability of l(q) regularization learning varies with q is worthy of investigation. In this letter, we study this problem in the framework of statistical learning theory. Our main results show that implementing l(q) coefficient regularization schemes in the sample-dependent hypothesis space associated with a gaussian kernel can attain the same almost optimal learning rates for all 0 < q < ∞. That is, the upper and lower bounds of learning rates for l(q) regularization learning are asymptotically identical for all 0 < q < ∞. Our finding tentatively reveals that in some modeling contexts, the choice of q might not have a strong impact on the generalization capability. From this perspective, q can be arbitrarily specified, or specified merely by other nongeneralization criteria like smoothness, computational complexity or sparsity. |
Year | DOI | Venue |
---|---|---|
2014 | 10.1162/NECO_a_00641 | Neural Computation |
Field | DocType | Volume |
Statistical learning theory,Early stopping,Mathematical optimization,Backus–Gilbert method,Regularization (mathematics),Artificial intelligence,Proximal gradient methods for learning,Gaussian function,Machine learning,Mathematics,Regularization perspectives on support vector machines,Estimator | Journal | 26 |
Issue | ISSN | Citations |
10 | 1530-888X | 0 |
PageRank | References | Authors |
0.34 | 15 | 4 |
Name | Order | Citations | PageRank |
---|---|---|---|
Shaobo Lin | 1 | 184 | 20.02 |
Jinshan Zeng | 2 | 236 | 18.82 |
Jian Fang | 3 | 0 | 0.68 |
Zongben Xu | 4 | 3203 | 198.88 |