<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GRU on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/gru/</link><description>Recent content in GRU on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 01 Oct 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/gru/index.xml" rel="self" type="application/rss+xml"/><item><title>Time Series Forecasting (3): GRU — Lightweight Gates and Efficiency Trade-offs</title><link>https://www.chenk.top/en/time-series/gru/</link><pubDate>Tue, 01 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/gru/</guid><description>&lt;p>After you&amp;rsquo;ve used LSTM for a while, an obvious question shows up: aren&amp;rsquo;t three gates a bit much? The forget and input gates seem to do related work — one decides what to drop, the other decides what to add — couldn&amp;rsquo;t they be merged? And does the cell state really need to be a separate vector from the hidden state, or could the hidden state do double duty?&lt;/p>
&lt;p>That is exactly the question Cho et al. answered in 2014 with the &lt;strong>Gated Recurrent Unit&lt;/strong>. They collapsed three gates into two: an &lt;strong>update gate&lt;/strong> that controls how much of the old state to keep versus how much new content to absorb, and a &lt;strong>reset gate&lt;/strong> that decides whether to ignore the old state entirely when computing a fresh candidate. The cell state is folded back into the hidden state. The result is roughly 25% fewer parameters, training that runs 10-15% faster, and accuracy on most time-series tasks that is statistically indistinguishable from LSTM.&lt;/p></description></item></channel></rss>