Time Series Forecasting (8): Informer — Efficient Long-Sequence Forecasting

Sun, 15 Dec 2024 09:00:00 +0000

The Transformer is wonderful at sequence modeling — right up to the moment your sequence gets long. Vanilla self-attention costs $\mathcal{O}(L^2)$ in both compute and memory, so a one-week hourly window (168 steps) is fine, a one-month window (720 steps) is painful, and a three-month window (2160 steps) is essentially impossible on a single GPU. That is exactly the regime real-world long-horizon forecasting lives in: weather, energy, finance, IoT.

Informer on Chen Kai Blog

Time Series Forecasting (8): Informer — Efficient Long-Sequence Forecasting