Tagged

Optimization

Sep 22, 2025 Standalone 10 min read

Low-Rank Matrix Approximation and the Pseudoinverse: From SVD to Regularization

From the least-squares view to the Moore-Penrose pseudoinverse, the four Penrose conditions, computation via SVD, truncated SVD, Tikhonov regularization, and modern applications from PCA to LoRA.

Mar 12, 2025 Linear Algebra 16 min read

Matrix Calculus and Optimization -- The Engine Behind Machine Learning

Adjusting the shower temperature is a tiny version of training a neural network: you change a parameter based on an error signal. Matrix calculus is the language that scales this idea to millions of parameters, and …

Oct 15, 2023 Standalone 14 min read

Kernel Methods: From Theory to Practice (RKHS, Common Kernels, and Hyperparameter Tuning)

Understand the kernel trick, RKHS theory, and practical kernel selection. Covers RBF, polynomial, Matern, and periodic kernels with sklearn code and a tuning flowchart.

Mar 13, 2023 Standalone 19 min read

Learning Rate: From Basics to Large-Scale Training

A practitioner's guide to the single most important hyperparameter: why too-large LR explodes, how warmup and schedules really work, the LR range test, the LR-batch-size-weight-decay coupling, and recent ideas like WSD, …

Dec 27, 2022 Standalone 13 min read

Lipschitz Continuity, Strong Convexity & Nesterov Acceleration

Three concepts that demystify most of optimization: Lipschitz smoothness fixes the maximum step size, strong convexity sets the convergence rate and uniqueness of the minimizer, and Nesterov acceleration replaces kappa …

Dec 9, 2022 Standalone 10 min read

Optimizer Evolution: From Gradient Descent to Adam (and Beyond, 2025)

One article that traces the full lineage GD -> SGD -> Momentum -> NAG -> AdaGrad -> RMSProp -> Adam -> AdamW, then onwards to Lion / Sophia / Schedule-Free. Each step is framed by the specific failure of the previous …

Jul 25, 2022 Standalone 17 min read

Proximal Operator: From Moreau Envelope to ISTA/FISTA and ADMM

A systematic walk through the proximal operator: convex-analysis basics, the Moreau envelope, closed-form proxes, and how they power ISTA, FISTA, ADMM, LASSO, and SVM in practice.