NLP (3): RNN and Sequence Modeling

Sat, 11 Oct 2025 09:00:00 +0000

Open Google Translate, swipe-type a message, or dictate a memo to your phone — all these systems consume an ordered stream of tokens and produce another. A feed-forward network processes each input independently, but language is fundamentally sequential: the meaning of “mat” in the cat sat on the mat depends on every word that came before. Recurrent Neural Networks (RNNs) handle this by maintaining a hidden state that evolves as they process each token. The hidden state is the network’s running summary of the past — its memory.

Time Series Forecasting (2): LSTM — Gate Mechanisms and Long-Term Dependencies

Mon, 16 Sep 2024 09:00:00 +0000

The first RNN I ever trained, back in 2017, was a small sales forecaster: 50 days in, the next day out. The forward pass ran cleanly, the loss went down, and yet the model had near-total amnesia about anything older than three days. The data had a clear monthly cycle. The model couldn’t see it. I assumed I needed more data, so I added rows and layers — and watched the training loss jump to NaN halfway through epoch two.

LSTM on Chen Kai Blog

NLP (3): RNN and Sequence Modeling

Time Series Forecasting (2): LSTM — Gate Mechanisms and Long-Term Dependencies