<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Gradient Descent on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/gradient-descent/</link><description>Recent content in Gradient Descent on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 25 Jan 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/gradient-descent/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (6): Logistic Regression and Classification</title><link>https://www.chenk.top/en/ml-math-derivations/06-logistic-regression-and-classification/</link><pubDate>Sun, 25 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/06-logistic-regression-and-classification/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> Linear regression maps inputs to any real number — but what if the output has to be a probability between 0 and 1? Logistic regression solves this with one elegant trick: a sigmoid squashing function. Despite its name, logistic regression is a &lt;em>classification&lt;/em> algorithm, and its math underpins every neuron in every modern neural network.&lt;/p>
&lt;/blockquote>
&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/ml-math-derivations/06-Logistic-Regression-and-Classification/illustration_1.png" alt="ML Math Derivations (6): Logistic Regression and Classification — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="what-you-will-learn" class="heading-anchor">What You Will Learn&lt;a href="#what-you-will-learn" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Why sigmoid is the natural way to turn a real-valued score into a probability, and why its derivative is so clean.&lt;/li>
&lt;li>How cross-entropy loss falls out of maximum likelihood estimation in two lines.&lt;/li>
&lt;li>Why cross-entropy beats MSE for classification — a vanishing-gradient argument made visible.&lt;/li>
&lt;li>The full gradient and Hessian for both binary and multi-class (softmax) cases, and why the loss is convex.&lt;/li>
&lt;li>L1, L2 and elastic-net regularization, and the Bayesian priors hiding behind them.&lt;/li>
&lt;li>Decision-boundary geometry and the threshold-free metrics (ROC / PR / AUC) that you actually need under class imbalance.&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites" class="heading-anchor">Prerequisites&lt;a href="#prerequisites" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Calculus: chain rule, partial derivatives.&lt;/li>
&lt;li>Linear algebra: matrix multiplication, transpose.&lt;/li>
&lt;li>Probability: Bernoulli and categorical distributions, likelihood.&lt;/li>
&lt;li>Familiarity with &lt;a href="https://www.chenk.top/en/ml-math-derivations/05-linear-regression">Part 5: Linear Regression&lt;/a>
.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="from-linear-models-to-probabilistic-classification" class="heading-anchor">From Linear Models to Probabilistic Classification&lt;a href="#from-linear-models-to-probabilistic-classification" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;h3 id="the-problem-with-raw-linear-output" class="heading-anchor">The Problem with Raw Linear Output&lt;a href="#the-problem-with-raw-linear-output" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h3>&lt;p>Linear regression gives us &lt;span class="math-inline">$\hat y = \mathbf{w}^\top \mathbf{x}$&lt;/span>
, which is unbounded. For classification, two things go wrong:&lt;/p></description></item><item><title>ML Math Derivations (5): Linear Regression</title><link>https://www.chenk.top/en/ml-math-derivations/05-linear-regression/</link><pubDate>Sat, 24 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/05-linear-regression/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> In 1886 Francis Galton noticed something strange about heredity: children of unusually tall (or short) parents tended to be closer to the average than their parents were. He called this drift toward the mean &lt;em>regression&lt;/em>, and the name stuck. The statistical curiosity grew up into the most consequential model in machine learning — not because linear regression is powerful on its own, but because almost every other algorithm (logistic regression, neural networks, kernel methods) is some twist on the same idea: &lt;strong>fit a line, but in the right space.&lt;/strong>&lt;/p></description></item><item><title>ML Math Derivations (4): Convex Optimization Theory</title><link>https://www.chenk.top/en/ml-math-derivations/04-convex-optimization-theory/</link><pubDate>Fri, 23 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/04-convex-optimization-theory/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/ml-math-derivations/04-Convex-Optimization-Theory/illustration_1.png" alt="ML Math Derivations (4): Convex Optimization Theory — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="what-you-will-learn" class="heading-anchor">What You Will Learn&lt;a href="#what-you-will-learn" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;p>In 1947, George Dantzig proposed the simplex method for linear programming, and a working theory of optimization was born. Eight decades later, optimization has become the engine of machine learning: every model you train, from a one-line linear regression to a 70B-parameter language model, is the answer to &lt;em>some&lt;/em> optimization problem.&lt;/p></description></item></channel></rss>