Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods

Mon, 11 Aug 2025 09:00:00 +0000

DQN showed that deep RL can master Atari, but it has a hard ceiling: it only works in discrete action spaces. Ask it to control a robot arm with seven continuous joint angles, and it fails — you’d have to solve an inner optimization problem every time you choose an action.

Policy gradient methods take a fundamentally different route. Instead of learning a value function and deriving a policy from it, they directly optimise the policy. That single change opens the door to continuous actions, stochastic strategies, and problems where the optimal play is itself random (think rock-paper-scissors).

DDPG on Chen Kai Blog

Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods