<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>RoPE on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/rope/</link><description>Recent content in RoPE on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 01 Apr 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/rope/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (6): Long Context — RoPE, YaRN, Sinks</title><link>https://www.chenk.top/en/llm-engineering/06-long-context/</link><pubDate>Wed, 01 Apr 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/06-long-context/</guid><description>&lt;p>&amp;ldquo;1M token context&amp;rdquo; is one of the most over-claimed numbers in LLMs. A model can attend to 1M tokens — that&amp;rsquo;s an architecture statement. A model can &lt;em>use&lt;/em> information at position 800K to answer a question — that&amp;rsquo;s a behavior statement, and it&amp;rsquo;s more challenging. This chapter covers the math of position encoding, the engineering tricks that extend context beyond the training length, and why most long-context claims fail needle-in-a-haystack tests.&lt;/p></description></item><item><title>NLP (9): Deep Dive into LLM Architecture</title><link>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</link><pubDate>Mon, 10 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</guid><description>&lt;p>The 2017 Transformer paper drew one block. Every production LLM today still uses that diagram as a silhouette, but almost every internal piece has been replaced. Pre-norm replaced post-norm. RMSNorm replaced LayerNorm. SwiGLU replaced GELU. Rotary embeddings replaced sinusoids. Multi-head attention became grouped-query attention. The dense FFN sometimes became a sparse mixture of experts. And the inference loop is dominated by a data structure that doesn&amp;rsquo;t appear in the original paper at all: the KV cache.&lt;/p></description></item></channel></rss>