Tagged
DDPG
Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods
From REINFORCE to SAC -- how policy gradient methods directly optimize policies, naturally handle continuous actions, and power modern algorithms like PPO, TD3, and SAC.