<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Dyna on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/dyna/</link><description>Recent content in Dyna on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 21 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/dyna/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (5): Model-Based RL and World Models</title><link>https://www.chenk.top/en/reinforcement-learning/05-model-based-rl-and-world-models/</link><pubDate>Thu, 21 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/05-model-based-rl-and-world-models/</guid><description>&lt;p>Every algorithm we have covered so far — DQN, REINFORCE, A2C, PPO, SAC — is &lt;strong>model-free&lt;/strong>: the agent treats the environment as a black box, throws actions at it, and updates its policy from the rewards that come back. The approach works, but it is profligate. DQN needs roughly &lt;strong>10 million frames&lt;/strong> to master Atari Pong. OpenAI Five trained on Dota 2 for the equivalent of &lt;strong>~45,000 years&lt;/strong> of self-play. AlphaStar consumed years of StarCraft for a single agent.&lt;/p></description></item></channel></rss>