Reinforcement Learning

Foundations of RL: MDPs, policy gradients, actor-critic, and offline RL.

12 articles

  1. 01

    Reinforcement Learning (1): Fundamentals and Core Concepts

    A beginner-friendly guide to the mathematical foundations of reinforcement learning -- MDPs, Bellman equations, dynamic …

    38 min
  2. 02

    Reinforcement Learning (2): Q-Learning and Deep Q-Networks (DQN)

    How DQN combined neural networks with Q-Learning to master Atari games -- covering experience replay, target networks, …

    32 min
  3. 03

    Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods

    From REINFORCE to SAC -- how policy gradient methods directly optimize policies, naturally handle continuous actions, …

    28 min
  4. 04

    Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning

    How do RL agents discover rewards when the environment gives almost no feedback? From count-based methods to ICM, RND, …

    34 min
  5. 05

    Reinforcement Learning (5): Model-Based RL and World Models

    From Dyna and MBPO to World Models, Dreamer, and MuZero -- how learning a model lets agents plan in imagination and …

    28 min
  6. 06

    Reinforcement Learning (6): PPO and TRPO — Trust Region Policy Optimization

    Why PPO became the most widely used RL algorithm -- from TRPO's theoretical foundations through natural gradients to …

    32 min
  7. 07

    Reinforcement Learning (7): Imitation Learning and Inverse RL

    A practical, theory-grounded tour of imitation learning: behavioral cloning and its quadratic compounding error, DAgger …

    28 min
  8. 08

    Reinforcement Learning (8): AlphaGo and Monte Carlo Tree Search

    From MCTS to AlphaGo, AlphaGo Zero, AlphaZero, and MuZero. Understand UCT exploration-exploitation, self-play training, …

    28 min
  9. 09

    Reinforcement Learning (9): Multi-Agent Reinforcement Learning

    A working tour of multi-agent RL: Markov games, the non-stationarity and credit-assignment problems, CTDE, value …

    28 min
  10. 10

    Reinforcement Learning (10): Offline Reinforcement Learning

    Master offline RL: learn policies from fixed datasets without environment interaction. Covers distributional shift, …

    26 min
  11. 11

    Reinforcement Learning (11): Hierarchical RL and Meta-Learning

    A deep dive into hierarchical RL (Options, MAXQ, Feudal Networks, goal-conditioned policies) and meta-RL (MAML, FOMAML, …

    24 min
  12. 12

    Reinforcement Learning (12): RLHF and LLM Applications

    How RLHF turned base language models into ChatGPT and Claude: the SFT→Reward-Model→PPO pipeline, the Bradley-Terry …

    40 min