Convex Analysis on Chen Kai Blog

Optimization (9): Interior-Point Methods and Self-Concordant Barriers

Mon, 26 Sep 2022 09:00:00 +0000

In 1984 Karmarkar showed that LPs could be solved in polynomial time practically — not just theoretically (the ellipsoid method had achieved this on paper). His interior-point method stayed inside the feasible polytope and converged in $$O(n L)$$ iterations, far better than the simplex method’s exponential worst case. Within a decade, Nesterov & Nemirovski generalized this to all convex programming via the self-concordant barrier framework. The result — $O(\sqrt{n} \log(1/\epsilon))$ Newton iterations for an $$n$$ -dimensional problem — remains the gold standard for medium-scale convex optimization.

Optimization (8): Lagrangian Duality and KKT Conditions

Sat, 24 Sep 2022 09:00:00 +0000

The most consequential idea in constrained optimization is that constraints have prices. The Lagrangian transforms a constrained problem into an unconstrained one by attaching a non-negative multiplier to each inequality and a free multiplier to each equality. The resulting unconstrained problem may be easier (the SVM dual), or it may give a verifiable lower bound (the LP duality used to certify integer programs).

This article develops:

Weak duality: the dual is always a lower bound on the primal — no assumptions needed.
Strong duality: under Slater’s condition (or convexity + linear constraints), the gap is zero.
KKT conditions: primal stationarity + dual feasibility + complementary slackness, the practical optimality system.
Saddle-point characterization: the Lagrangian’s saddle point coincides with the optimal primal–dual pair.

Each result is proved or carefully cited. We close with the SVM example, where the dual cuts the problem dimension from $$d$$ (number of features) to $$n$$ (number of training points) — the original kernel-method magic.

Optimization (6): Composite Optimization and Proximal Methods

Wed, 21 Sep 2022 09:00:00 +0000

When your objective contains a non-smooth piece (sparse regularisation, total variation, an indicator of a constraint set) or a constraint that is hard to handle directly, “just do gradient descent” stalls — there is no gradient at the kink, or every step violates feasibility. The proximal operator is the engineered, beautiful workaround: think of each update as “take a step on the smooth part, then run a tiny penalised minimisation that pulls the iterate back toward a structured solution”.

Optimization (2): Smoothness, Strong Convexity, and Nesterov Acceleration

Thu, 15 Sep 2022 09:00:00 +0000

A surprising amount of “optimizer folklore” collapses into three concepts:

How steep is the gradient? Lipschitz smoothness ( $$L$$ -smoothness) caps the step size.
How sharp is the bottom? $\mu$ -strong convexity sets the convergence rate and forces the minimizer to be unique.
Can we get there faster without losing stability? Nesterov acceleration and adaptive restart turn the per-condition-number cost from $\kappa$ into $\sqrt{\kappa}$ .

This post lays them out on a single thread: nail the geometric intuition with the minimum number of inequalities, prove the key theorems, then close with a least-squares experiment that pits GD, Heavy Ball, and Nesterov against each other. The goal is not to stack formulas — it is to make you able to look at a new problem and instantly answer “what step size, what rate, is acceleration worth it?”

Optimization (1): Convex Analysis Foundations

Wed, 14 Sep 2022 09:00:00 +0000

This article is the foundation the rest of the series is built on. Almost every result we will prove later — convergence rates of gradient descent, Lagrangian duality, the proximal operator, even the analysis of stochastic methods — relies on a small set of facts about convex sets and convex functions. We will derive all of them from scratch.

If you only remember three things from this article, make it these: