标签

PlaNet

Aug 21, 2025 强化学习 24 分钟

从 Dyna、MBPO 到 World Models、Dreamer 和 MuZero——学一个环境模型，让智能体在想象中规划，把样本效率提高 10-100 倍。