<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Convex Analysis on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/convex-analysis/</link><description>Recent content in Convex Analysis on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 26 Sep 2022 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/convex-analysis/index.xml" rel="self" type="application/rss+xml"/><item><title>Optimization (9): Interior-Point Methods and Self-Concordant Barriers</title><link>https://www.chenk.top/en/optimization-theory/09-interior-point-barrier/</link><pubDate>Mon, 26 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/optimization-theory/09-interior-point-barrier/</guid><description>&lt;p>In 1984 Karmarkar showed that LPs could be solved in polynomial time &lt;em>practically&lt;/em> — not just theoretically (the ellipsoid method had achieved this on paper). His &lt;strong>interior-point method&lt;/strong> stayed inside the feasible polytope and converged in &lt;span class="math-inline">$O(n L)$&lt;/span>
 iterations, far better than the simplex method&amp;rsquo;s exponential worst case. Within a decade, Nesterov &amp;amp; Nemirovski generalized this to &lt;strong>all convex programming&lt;/strong> via the &lt;strong>self-concordant barrier&lt;/strong> framework. The result — &lt;span class="math-inline">$O(\sqrt{n} \log(1/\epsilon))$&lt;/span>
 Newton iterations for an &lt;span class="math-inline">$n$&lt;/span>
-dimensional problem — remains the gold standard for medium-scale convex optimization.&lt;/p></description></item><item><title>Optimization (8): Lagrangian Duality and KKT Conditions</title><link>https://www.chenk.top/en/optimization-theory/08-lagrangian-duality-kkt/</link><pubDate>Sat, 24 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/optimization-theory/08-lagrangian-duality-kkt/</guid><description>&lt;p>The most consequential idea in constrained optimization is that &lt;strong>constraints have prices&lt;/strong>. The Lagrangian transforms a constrained problem into an unconstrained one by attaching a non-negative multiplier to each inequality and a free multiplier to each equality. The resulting unconstrained problem may be easier (the SVM dual), or it may give a verifiable lower bound (the LP duality used to certify integer programs).&lt;/p>
&lt;p>This article develops:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Weak duality:&lt;/strong> the dual is always a lower bound on the primal — no assumptions needed.&lt;/li>
&lt;li>&lt;strong>Strong duality:&lt;/strong> under Slater&amp;rsquo;s condition (or convexity + linear constraints), the gap is zero.&lt;/li>
&lt;li>&lt;strong>KKT conditions:&lt;/strong> primal stationarity + dual feasibility + complementary slackness, the practical optimality system.&lt;/li>
&lt;li>&lt;strong>Saddle-point characterization:&lt;/strong> the Lagrangian&amp;rsquo;s saddle point coincides with the optimal primal&amp;ndash;dual pair.&lt;/li>
&lt;/ul>
&lt;p>Each result is proved or carefully cited. We close with the SVM example, where the dual cuts the problem dimension from &lt;span class="math-inline">$d$&lt;/span>
 (number of features) to &lt;span class="math-inline">$n$&lt;/span>
 (number of training points) — the original kernel-method magic.&lt;/p></description></item><item><title>Optimization (6): Composite Optimization and Proximal Methods</title><link>https://www.chenk.top/en/optimization-theory/06-composite-proximal-methods/</link><pubDate>Wed, 21 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/optimization-theory/06-composite-proximal-methods/</guid><description>&lt;p>When your objective contains a non-smooth piece (sparse regularisation, total variation, an indicator of a constraint set) or a constraint that is hard to handle directly, &amp;ldquo;just do gradient descent&amp;rdquo; stalls — there is no gradient at the kink, or every step violates feasibility. The &lt;strong>proximal operator&lt;/strong> is the engineered, beautiful workaround: think of each update as &amp;ldquo;take a step on the smooth part, then run a tiny penalised minimisation that pulls the iterate back toward a structured solution&amp;rdquo;.&lt;/p></description></item><item><title>Optimization (2): Smoothness, Strong Convexity, and Nesterov Acceleration</title><link>https://www.chenk.top/en/optimization-theory/02-smoothness-strong-convexity-nesterov/</link><pubDate>Thu, 15 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/optimization-theory/02-smoothness-strong-convexity-nesterov/</guid><description>&lt;p>A surprising amount of &amp;ldquo;optimizer folklore&amp;rdquo; collapses into three concepts:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>How steep is the gradient?&lt;/strong> Lipschitz smoothness (&lt;span class="math-inline">$L$&lt;/span>
-smoothness) caps the step size.&lt;/li>
&lt;li>&lt;strong>How sharp is the bottom?&lt;/strong> &lt;span class="math-inline">$\mu$&lt;/span>
-strong convexity sets the convergence rate and forces the minimizer to be unique.&lt;/li>
&lt;li>&lt;strong>Can we get there faster without losing stability?&lt;/strong> Nesterov acceleration and adaptive restart turn the per-condition-number cost from &lt;span class="math-inline">$\kappa$&lt;/span>
 into &lt;span class="math-inline">$\sqrt{\kappa}$&lt;/span>
.&lt;/li>
&lt;/ul>
&lt;p>This post lays them out on a single thread: nail the geometric intuition with the minimum number of inequalities, prove the key theorems, then close with a least-squares experiment that pits GD, Heavy Ball, and Nesterov against each other. The goal is not to stack formulas — it is to make you able to look at a new problem and instantly answer &amp;ldquo;what step size, what rate, is acceleration worth it?&amp;rdquo;&lt;/p></description></item><item><title>Optimization (1): Convex Analysis Foundations</title><link>https://www.chenk.top/en/optimization-theory/01-convex-analysis-foundations/</link><pubDate>Wed, 14 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/optimization-theory/01-convex-analysis-foundations/</guid><description>&lt;p>This article is the foundation the rest of the series is built on. Almost every result we will prove later — convergence rates of gradient descent, Lagrangian duality, the proximal operator, even the analysis of stochastic methods — relies on a small set of facts about convex sets and convex functions. We will derive all of them from scratch.&lt;/p>
&lt;p>If you only remember three things from this article, make it these:&lt;/p></description></item></channel></rss>