<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Mamba on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/mamba/</link><description>Recent content in Mamba on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 27 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/mamba/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (1): Architectures from Transformer to MoE</title><link>https://www.chenk.top/en/llm-engineering/01-architectures/</link><pubDate>Fri, 27 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/01-architectures/</guid><description>&lt;p>The 2017 Transformer block is still the silhouette of every production LLM in 2026, but almost every internal piece has been swapped, sparsified, or specialized. This series covers the modern stack end to end — architecture, training, inference, retrieval, evaluation, safety, deployment. Chapter 1 is about the block itself: what attention looks like in a 2026 model, how MoE breaks the param-FLOPs link, and where the non-attention alternatives (Mamba, RWKV) actually beat the Transformer.&lt;/p></description></item></channel></rss>