Deep Learning Theory

Sep 29, 2022 Optimization Theory 22 min read

Optimization (11): Non-Convex Optimization and Saddle Escape

Why does SGD work for training neural networks despite the non-convex landscape? We prove perturbed GD escapes strict saddles in polynomial time, derive convergence under the Polyak-Lojasiewicz condition, and survey what …