Reinforcement Learning

Foundations of RL: MDPs, policy gradients, actor-critic, and offline RL.

12 articles

01
Reinforcement Learning (1): Fundamentals and Core Concepts
A beginner-friendly guide to the mathematical foundations of reinforcement learning -- MDPs, Bellman equations, dynamic …
2025-08-01 38 min
02
Reinforcement Learning (2): Q-Learning and Deep Q-Networks (DQN)
How DQN combined neural networks with Q-Learning to master Atari games -- covering experience replay, target networks, …
2025-08-06 32 min
03
Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods
From REINFORCE to SAC -- how policy gradient methods directly optimize policies, naturally handle continuous actions, …
2025-08-11 28 min
04
Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning
How do RL agents discover rewards when the environment gives almost no feedback? From count-based methods to ICM, RND, …
2025-08-16 34 min
05
Reinforcement Learning (5): Model-Based RL and World Models
From Dyna and MBPO to World Models, Dreamer, and MuZero -- how learning a model lets agents plan in imagination and …
2025-08-21 28 min
06
Reinforcement Learning (6): PPO and TRPO — Trust Region Policy Optimization
Why PPO became the most widely used RL algorithm -- from TRPO's theoretical foundations through natural gradients to …
2025-08-26 32 min
07
Reinforcement Learning (7): Imitation Learning and Inverse RL
A practical, theory-grounded tour of imitation learning: behavioral cloning and its quadratic compounding error, DAgger …
2025-08-31 28 min
08
Reinforcement Learning (8): AlphaGo and Monte Carlo Tree Search
From MCTS to AlphaGo, AlphaGo Zero, AlphaZero, and MuZero. Understand UCT exploration-exploitation, self-play training, …
2025-09-05 28 min
09
Reinforcement Learning (9): Multi-Agent Reinforcement Learning
A working tour of multi-agent RL: Markov games, the non-stationarity and credit-assignment problems, CTDE, value …
2025-09-10 28 min
10
Reinforcement Learning (10): Offline Reinforcement Learning
Master offline RL: learn policies from fixed datasets without environment interaction. Covers distributional shift, …
2025-09-15 26 min
11
Reinforcement Learning (11): Hierarchical RL and Meta-Learning
A deep dive into hierarchical RL (Options, MAXQ, Feudal Networks, goal-conditioned policies) and meta-RL (MAML, FOMAML, …
2025-09-20 24 min
12
Reinforcement Learning (12): RLHF and LLM Applications
How RLHF turned base language models into ChatGPT and Claude: the SFT→Reward-Model→PPO pipeline, the Bradley-Terry …
2025-09-25 42 min