Attention

Oct 16, 2025 NLP 34 min read

NLP (4): Attention Mechanism and Transformer

From the bottleneck of Seq2Seq to Attention Is All You Need. Build intuition for scaled dot-product attention, multi-head attention, positional encoding, masking, and assemble a complete Transformer in PyTorch.

Oct 16, 2024 Time Series Forecasting 28 min read

Time Series Forecasting (4): Attention Mechanisms — Direct Long-Range Dependencies

Self-attention, multi-head attention, and positional encoding for time series. Step-by-step math, PyTorch implementations, and visualization techniques for interpretable forecasting.

Jan 29, 2023 Standalone 22 min read

Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation

GC-SAN combines a session-graph GGNN (local transitions) with multi-layer self-attention (global dependencies) for session-based recommendation. Covers graph construction, message passing, attention fusion, and where the …