<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Forward Algorithm on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/forward-algorithm/</link><description>Recent content in Forward Algorithm on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 03 Feb 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/forward-algorithm/index.xml" rel="self" type="application/rss+xml"/><item><title>ML Math Derivations (15): Hidden Markov Models</title><link>https://www.chenk.top/en/ml-math-derivations/15-hidden-markov-models/</link><pubDate>Tue, 03 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/15-hidden-markov-models/</guid><description>&lt;p>You hear footsteps behind you in the fog. You can&amp;rsquo;t see the walker, only the sounds. From the rhythm and pitch — short, soft, hurried — can you guess whether they are walking, running, or limping? And if you observed an entire sequence, which gait sequence is most likely? How likely is &lt;em>any&lt;/em> sequence of sounds under your model of how walking works?&lt;/p>
&lt;p>These are the &lt;strong>three problems of HMMs&lt;/strong>, and the surprise is that all three reduce to one trick: write the joint &lt;span class="math-inline">$P(\mathbf{O}, \mathbf{I})$&lt;/span>
 as a product of local factors along time, then &lt;strong>share sub-computations across time&lt;/strong> with dynamic programming. Brute force costs &lt;span class="math-inline">$O(N^T)$&lt;/span>
. Forward-Backward, Viterbi, and Baum-Welch all cost &lt;span class="math-inline">$O(N^2 T)$&lt;/span>
. The exponent collapses because the Markov assumption makes the future conditionally independent of the past given the present.&lt;/p></description></item></channel></rss>