Tags
Yarn
LLM Engineering (6): Long Context — RoPE, YaRN, Sinks
How RoPE encodes position, why naive extension breaks, NTK-aware and YaRN scaling, ALiBi vs RoPE, attention sinks for streaming, and why 1M-context claims often fail at retrieval.
How RoPE encodes position, why naive extension breaks, NTK-aware and YaRN scaling, ALiBi vs RoPE, attention sinks for streaming, and why 1M-context claims often fail at retrieval.