<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RND on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/rnd/</link><description>Recent content in RND on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sat, 16 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/rnd/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning</title><link>https://www.chenk.top/en/reinforcement-learning/04-exploration-and-curiosity-driven-learning/</link><pubDate>Sat, 16 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/04-exploration-and-curiosity-driven-learning/</guid><description>&lt;p>Drop a fresh agent into Montezuma&amp;rsquo;s Revenge. To score a single point, it must walk to the right, jump over a skull, climb a rope, leap to a platform, and grab a key — roughly &lt;strong>a hundred precise actions in a row&lt;/strong>. Until the key is collected, the reward signal is always zero.&lt;/p>
&lt;p>A textbook DQN with &lt;span class="math-inline">$\varepsilon=0.1$&lt;/span>
 exploration has, by a generous estimate, a &lt;span class="math-inline">$0.1^{100} \approx 10^{-100}$&lt;/span>
 chance of stumbling onto that key by accident. Unsurprisingly, vanilla DQN scores &lt;strong>0&lt;/strong> on this game. Not &amp;ldquo;low&amp;rdquo; — literally zero, every episode, for the entire training run.&lt;/p></description></item></channel></rss>