Tagged
Attention
NLP Part 4: Attention Mechanism and Transformer
From the bottleneck of Seq2Seq to Attention Is All You Need. Build intuition for scaled dot-product attention, multi-head attention, positional encoding, masking, and assemble a complete Transformer in PyTorch.
Time Series Forecasting (4): Attention Mechanisms -- Direct Long-Range Dependencies
Self-attention, multi-head attention, and positional encoding for time series. Step-by-step math, PyTorch implementations, and visualization techniques for interpretable forecasting.
Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation
GC-SAN combines a session-graph GGNN (local transitions) with multi-layer self-attention (global dependencies) for session-based recommendation. Covers graph construction, message passing, attention fusion, and where the …