Tags

Stochastic Methods

Sep 27, 2022 Optimization Theory 20 min read

Optimization (10): Stochastic Optimization and Variance Reduction

Why does SGD work? We prove the O(1/sqrt(T)) convex rate and the O(1/(mu T)) strongly convex rate from the gradient noise budget. Then variance reduction: SVRG, SAGA, Katyusha — methods that get to the linear rate of …