<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GAIL on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/gail/</link><description>Recent content in GAIL on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 31 Aug 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/gail/index.xml" rel="self" type="application/rss+xml"/><item><title>Reinforcement Learning (7): Imitation Learning and Inverse RL</title><link>https://www.chenk.top/en/reinforcement-learning/07-imitation-learning/</link><pubDate>Sun, 31 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/07-imitation-learning/</guid><description>&lt;p>Every algorithm in the previous chapters assumed access to a reward function. In practice, &lt;em>designing&lt;/em> that reward is often the hardest part of an RL project. Try writing one paragraph that captures &amp;ldquo;drive like a careful human&amp;rdquo;, &amp;ldquo;fold a shirt the way a tailor would&amp;rdquo;, or &amp;ldquo;summarise this document the way an expert editor would&amp;rdquo;. You can &lt;em>show&lt;/em> those behaviours far more easily than you can &lt;em>specify&lt;/em> them.&lt;/p>
&lt;p>Imitation learning takes that intuition seriously: instead of optimising a hand-engineered scalar, it learns from expert demonstrations &lt;span class="math-inline">$\mathcal{D} = \{(s_t, a_t)\}$&lt;/span>
. This chapter walks the four canonical methods — behavioral cloning, DAgger, maximum-entropy IRL, and GAIL/AIRL — not as isolated tricks but as a single ladder where each rung relaxes one assumption and pays for it with new structure.&lt;/p></description></item></channel></rss>