<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>MDP on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/mdp/</link><description>Recent content in MDP on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 01 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/mdp/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (1): Fundamentals and Core Concepts</title><link>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</link><pubDate>Fri, 01 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</guid><description>&lt;p>The first time you sat on a bicycle, nobody handed you a manual that said &lt;em>&amp;ldquo;if your tilt angle exceeds 7.4 degrees, apply 12% counter-steer.&amp;rdquo;&lt;/em> You wobbled, you over-corrected, you fell, you got back on. After a few hundred attempts your body simply &lt;em>knew&lt;/em> what to do, even though you could not put it into words.&lt;/p>
&lt;p>That trial-feedback-improvement loop is not just how we learn to ride bikes. It is how AlphaGo learned to defeat the world Go champion, how Boston Dynamics robots learn to walk, and how recommendation systems quietly improve every time you click. They all share one mathematical framework called &lt;strong>reinforcement learning&lt;/strong> (RL).&lt;/p></description></item></channel></rss>