<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Q-Learning on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/q-learning/</link><description>Recent content in Q-Learning on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 06 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/q-learning/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (2): Q-Learning and Deep Q-Networks (DQN)</title><link>https://www.chenk.top/en/reinforcement-learning/02-q-learning-and-dqn/</link><pubDate>Wed, 06 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/02-q-learning-and-dqn/</guid><description>&lt;p>In December 2013, a small DeepMind team uploaded a paper to arXiv with a striking claim: a single neural network, trained from raw pixels and the score, learned to play seven Atari games — and beat the previous best on six of them. No game-specific features. No hand-coded heuristics. The same architecture for Pong, Breakout, and Space Invaders. The algorithm was &lt;strong>Deep Q-Network (DQN)&lt;/strong>, and it kicked off the deep reinforcement learning era.&lt;/p></description></item><item><title>Reinforcement Learning (1): Fundamentals and Core Concepts</title><link>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</link><pubDate>Fri, 01 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</guid><description>&lt;p>The first time you sat on a bicycle, nobody handed you a manual that said &lt;em>&amp;ldquo;if your tilt angle exceeds 7.4 degrees, apply 12% counter-steer.&amp;rdquo;&lt;/em> You wobbled, you over-corrected, you fell, you got back on. After a few hundred attempts your body simply &lt;em>knew&lt;/em> what to do, even though you could not put it into words.&lt;/p>
&lt;p>That trial-feedback-improvement loop is not just how we learn to ride bikes. It is how AlphaGo learned to defeat the world Go champion, how Boston Dynamics robots learn to walk, and how recommendation systems quietly improve every time you click. They all share one mathematical framework called &lt;strong>reinforcement learning&lt;/strong> (RL).&lt;/p></description></item></channel></rss>