Tagged

Attention

Oct 16, 2025 NLP 18 min read

NLP Part 4: Attention Mechanism and Transformer

From the bottleneck of Seq2Seq to Attention Is All You Need. Build intuition for scaled dot-product attention, multi-head attention, positional encoding, masking, and assemble a complete Transformer in PyTorch.

Oct 16, 2024 Time Series Forecasting 12 min read

Time Series Forecasting (4): Attention Mechanisms -- Direct Long-Range Dependencies

Self-attention, multi-head attention, and positional encoding for time series. Step-by-step math, PyTorch implementations, and visualization techniques for interpretable forecasting.

Jan 15, 2023 Standalone 12 min read

Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation

GC-SAN combines a session-graph GGNN (local transitions) with multi-layer self-attention (global dependencies) for session-based recommendation. Covers graph construction, message passing, attention fusion, and where the …