Tags
Stochastic Methods
Optimization (10): Stochastic Optimization and Variance Reduction
Why does SGD work? We prove the O(1/sqrt(T)) convex rate and the O(1/(mu T)) strongly convex rate from the gradient noise budget. Then variance reduction: SVRG, SAGA, Katyusha — methods that get to the linear rate of …
