<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>DDPG on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/ddpg/</link><description>Recent content in DDPG on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 11 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/ddpg/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods</title><link>https://www.chenk.top/en/reinforcement-learning/03-policy-gradient-and-actor-critic/</link><pubDate>Mon, 11 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/03-policy-gradient-and-actor-critic/</guid><description>&lt;p>DQN showed that deep RL can master Atari, but it has a hard ceiling: it only works in &lt;strong>discrete action spaces&lt;/strong>. Ask it to control a robot arm with seven continuous joint angles, and it fails — you&amp;rsquo;d have to solve an inner optimization problem every time you choose an action.&lt;/p>
&lt;p>&lt;strong>Policy gradient methods&lt;/strong> take a fundamentally different route. Instead of learning a value function and &lt;em>deriving&lt;/em> a policy from it, they &lt;strong>directly optimise the policy&lt;/strong>. That single change opens the door to continuous actions, stochastic strategies, and problems where the optimal play is itself random (think rock-paper-scissors).&lt;/p></description></item></channel></rss>