<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>MoE on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/moe/</link><description>Recent content in MoE on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 27 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/moe/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (1): Architectures from Transformer to MoE</title><link>https://www.chenk.top/en/llm-engineering/01-architectures/</link><pubDate>Fri, 27 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/01-architectures/</guid><description>&lt;p>The 2017 Transformer block is still the silhouette of every production LLM in 2026, but almost every internal piece has been swapped, sparsified, or specialized. This series covers the modern stack end to end — architecture, training, inference, retrieval, evaluation, safety, deployment. Chapter 1 is about the block itself: what attention looks like in a 2026 model, how MoE breaks the param-FLOPs link, and where the non-attention alternatives (Mamba, RWKV) actually beat the Transformer.&lt;/p></description></item><item><title>NLP (9): Deep Dive into LLM Architecture</title><link>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</link><pubDate>Mon, 10 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</guid><description>&lt;p>The 2017 Transformer paper drew one block. Every production LLM today still uses that diagram as a silhouette, but almost every internal piece has been replaced. Pre-norm replaced post-norm. RMSNorm replaced LayerNorm. SwiGLU replaced GELU. Rotary embeddings replaced sinusoids. Multi-head attention became grouped-query attention. The dense FFN sometimes became a sparse mixture of experts. And the inference loop is dominated by a data structure that doesn&amp;rsquo;t appear in the original paper at all: the KV cache.&lt;/p></description></item></channel></rss>