LLM Engineering (6): Long Context — RoPE, YaRN, Sinks

Wed, 01 Apr 2026 09:00:00 +0000

“1M token context” is one of the most over-claimed numbers in LLMs. A model can attend to 1M tokens — that’s an architecture statement. A model can use information at position 800K to answer a question — that’s a behavior statement, and it’s more challenging. This chapter covers the math of position encoding, the engineering tricks that extend context beyond the training length, and why most long-context claims fail needle-in-a-haystack tests.

Attention-Sinks on Chen Kai Blog

LLM Engineering (6): Long Context — RoPE, YaRN, Sinks