Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning

Sat, 16 Aug 2025 09:00:00 +0000

Drop a fresh agent into Montezuma’s Revenge. To score a single point, it must walk to the right, jump over a skull, climb a rope, leap to a platform, and grab a key — roughly a hundred precise actions in a row. Until the key is collected, the reward signal is always zero.

A textbook DQN with $\varepsilon=0.1$ exploration has, by a generous estimate, a $0.1^{100} \approx 10^{-100}$ chance of stumbling onto that key by accident. Unsurprisingly, vanilla DQN scores 0 on this game. Not “low” — literally zero, every episode, for the entire training run.

RND on Chen Kai Blog

Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning