<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Time Series on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/time-series/</link><description>Recent content in Time Series on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 15 Dec 2024 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/time-series/index.xml" rel="self" type="application/rss+xml"/><item><title>Time Series Forecasting (8): Informer — Efficient Long-Sequence Forecasting</title><link>https://www.chenk.top/en/time-series/informer-long-sequence/</link><pubDate>Sun, 15 Dec 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/informer-long-sequence/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/time-series/informer-long-sequence/illustration_1.png" alt="Time Series Forecasting (8): Informer — Efficient Long-Sequence Forecasting — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;p>The Transformer is wonderful at sequence modeling — right up to the moment your sequence gets long. Vanilla self-attention costs &lt;span class="math-inline">$\mathcal{O}(L^2)$&lt;/span>
 in both compute and memory, so a one-week hourly window (168 steps) is fine, a one-month window (720 steps) is painful, and a three-month window (2160 steps) is essentially impossible on a single GPU. That is exactly the regime real-world long-horizon forecasting lives in: weather, energy, finance, IoT.&lt;/p></description></item><item><title>Time Series Forecasting (7): N-BEATS — Interpretable Deep Architecture</title><link>https://www.chenk.top/en/time-series/n-beats/</link><pubDate>Sat, 30 Nov 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/n-beats/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/time-series/n-beats/illustration_1.png" alt="Time Series Forecasting (7): N-BEATS — Interpretable Deep Architecture — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;p>The 2018 M4 forecasting competition served 100,000 series across six frequencies as a single benchmark. The leaderboard was dominated by hand-tuned ensembles built from decades of statistical-forecasting craft. Then a &lt;strong>pure neural network&lt;/strong> with no statistical preprocessing, no feature engineering, and no recurrence won outright. That network was &lt;strong>N-BEATS&lt;/strong> by Oreshkin et al. — a stack of fully-connected blocks with two residual paths. Its interpretable variant additionally split the forecast into a polynomial trend and a Fourier seasonality, so the very thing classical statisticians wanted (a readable decomposition) came for free.&lt;/p></description></item><item><title>Time Series Forecasting (6): Temporal Convolutional Networks (TCN)</title><link>https://www.chenk.top/en/time-series/temporal-convolutional-networks/</link><pubDate>Fri, 15 Nov 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/temporal-convolutional-networks/</guid><description>&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/time-series/temporal-convolutional-networks/illustration_1.png" alt="Time Series Forecasting (6): Temporal Convolutional Networks (TCN) — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;p>For most of the 2010s, saying &amp;ldquo;deep learning for time series&amp;rdquo; meant using LSTM. The story changed in 2018 when Bai, Kolter, and Koltun published &lt;em>An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling&lt;/em>. Their result was surprisingly simple: use a stack of 1-D convolutions, make them causal (no peeking at the future), space the filter taps exponentially (dilation), wrap the whole thing in residual connections, and train. Task after task, the resulting &lt;strong>Temporal Convolutional Network&lt;/strong> (TCN) matched or beat LSTM/GRU — while training several times faster because every time step in the forward pass runs in parallel.&lt;/p></description></item><item><title>Time Series Forecasting (5): Transformer Architecture for Time Series</title><link>https://www.chenk.top/en/time-series/transformer/</link><pubDate>Thu, 31 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/transformer/</guid><description>&lt;p>The 2017 &lt;em>Attention Is All You Need&lt;/em> paper took the attention mechanism from the previous chapter to its logical extreme: &lt;strong>drop the RNN entirely&lt;/strong>. Transformers stack pure attention into a full sequence model — no recurrence, no hidden state propagating over time. Originally designed for machine translation, the architecture was quickly adapted to every other sequence task, time series included.&lt;/p>
&lt;p>Dropping a vanilla NLP Transformer onto a time-series problem runs into two immediate complications. The first is &lt;strong>position&lt;/strong>. Attention is a set operation — shuffle the input order and the output is unchanged. For a time series, order is everything: a temperature curve that goes up-then-down and one that goes down-then-up are entirely different signals. NLP solves this with sinusoidal position encodings; do those still make sense for time series, or should we use learned encodings, or just concatenate calendar features (hour-of-day, day-of-week) directly into the input?&lt;/p></description></item><item><title>Time Series Forecasting (4): Attention Mechanisms — Direct Long-Range Dependencies</title><link>https://www.chenk.top/en/time-series/attention-mechanism/</link><pubDate>Wed, 16 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/attention-mechanism/</guid><description>&lt;p>RNNs and LSTMs handled &amp;ldquo;too many time steps&amp;rdquo; but left a subtler limitation in place: information has to travel &lt;strong>step by step&lt;/strong>. For step 100 to see what happened at step 1, the signal has to ride the hidden state through 99 intermediate stops — and each stop attenuates the signal a little and squashes it through a nonlinearity. Even with LSTM&amp;rsquo;s &amp;ldquo;highway&amp;rdquo; cell state, it&amp;rsquo;s still a single lane in a single direction.&lt;/p></description></item><item><title>Time Series Forecasting (3): GRU — Lightweight Gates and Efficiency Trade-offs</title><link>https://www.chenk.top/en/time-series/gru/</link><pubDate>Tue, 01 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/gru/</guid><description>&lt;p>After you&amp;rsquo;ve used LSTM for a while, an obvious question shows up: aren&amp;rsquo;t three gates a bit much? The forget and input gates seem to do related work — one decides what to drop, the other decides what to add — couldn&amp;rsquo;t they be merged? And does the cell state really need to be a separate vector from the hidden state, or could the hidden state do double duty?&lt;/p>
&lt;p>That is exactly the question Cho et al. answered in 2014 with the &lt;strong>Gated Recurrent Unit&lt;/strong>. They collapsed three gates into two: an &lt;strong>update gate&lt;/strong> that controls how much of the old state to keep versus how much new content to absorb, and a &lt;strong>reset gate&lt;/strong> that decides whether to ignore the old state entirely when computing a fresh candidate. The cell state is folded back into the hidden state. The result is roughly 25% fewer parameters, training that runs 10-15% faster, and accuracy on most time-series tasks that is statistically indistinguishable from LSTM.&lt;/p></description></item><item><title>Time Series Forecasting (2): LSTM — Gate Mechanisms and Long-Term Dependencies</title><link>https://www.chenk.top/en/time-series/lstm/</link><pubDate>Mon, 16 Sep 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/lstm/</guid><description>&lt;p>The first RNN I ever trained, back in 2017, was a small sales forecaster: 50 days in, the next day out. The forward pass ran cleanly, the loss went down, and yet the model had near-total amnesia about anything older than three days. The data had a clear monthly cycle. The model couldn&amp;rsquo;t see it. I assumed I needed more data, so I added rows and layers — and watched the training loss jump to NaN halfway through epoch two.&lt;/p></description></item><item><title>Time Series Forecasting (1): Traditional Statistical Models</title><link>https://www.chenk.top/en/time-series/01-traditional-models/</link><pubDate>Sun, 01 Sep 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/01-traditional-models/</guid><description>&lt;p>The first time I touched data that &amp;ldquo;looked like a time series&amp;rdquo; — hourly server CPU usage — my instinct was to throw it at a linear regression. Time on the x-axis, usage on the y-axis. The fit was terrible. The problem wasn&amp;rsquo;t the regression; the problem was that this kind of data has its own personality. It has trends, seasonality, and a stubborn dependence between consecutive observations. A vanilla regression treats every row as an independent sample and throws away the one piece of information that matters most: time itself.&lt;/p></description></item></channel></rss>