Abstract | ||
---|---|---|
By employing time-varying proximal functions, adaptive subgradient methods (ADAGRAD) have improved the regret bound and been widely used in online learning and optimization. However, ADAGRAD with full matrix proximal functions (ADA-FULL) cannot handle large-scale problems due to the impractical
<inline-formula><tex-math notation="LaTeX">$O(d^3)$</tex-math></inline-formula>
time and
<inline-formula><tex-math notation="LaTeX">$O(d^2)$</tex-math></inline-formula>
space complexities, though it has better performance when gradients are correlated. In this paper, we propose two efficient variants of ADA-FULL via a matrix sketching technique called frequent directions (FD). The first variant named as ADA-FD directly utilizes FD to maintain and manipulate low-rank matrices, which reduces the space and time complexities to
<inline-formula><tex-math notation="LaTeX">$O(\tau d)$</tex-math></inline-formula>
and
<inline-formula><tex-math notation="LaTeX">$O(\tau ^2d)$</tex-math></inline-formula>
respectively, where
<inline-formula><tex-math notation="LaTeX">$d$</tex-math></inline-formula>
is the dimensionality and
<inline-formula><tex-math notation="LaTeX">$\tau \ll d$</tex-math></inline-formula>
is the sketching size. The second variant named as ADA-FFD further adopts a doubling trick to accelerate FD used in ADA-FD, which reduces the average time complexity to
<inline-formula><tex-math notation="LaTeX">$O(\tau d)$</tex-math></inline-formula>
while only doubles the space complexity of ADA-FD. Theoretical analysis reveals that the regret of ADA-FD and ADA-FFD is close to that of ADA-FULL as long as the outer product matrix of gradients is approximately low-rank. Experimental results demonstrate the efficiency and effectiveness of our algorithms. |
Year | DOI | Venue |
---|---|---|
2022 | 10.1109/TPAMI.2021.3096880 | IEEE Transactions on Pattern Analysis and Machine Intelligence |
Keywords | DocType | Volume |
Algorithms,Education, Distance | Journal | 44 |
Issue | ISSN | Citations |
10 | 0162-8828 | 0 |
PageRank | References | Authors |
0.34 | 21 | 2 |
Name | Order | Citations | PageRank |
---|---|---|---|
yuanyu wan | 1 | 5 | 4.13 |
Lijun Zhang | 2 | 245 | 37.10 |