LLM Engineering (1): Architectures from Transformer to MoE

Fri, 27 Mar 2026 09:00:00 +0000

The 2017 Transformer block is still the silhouette of every production LLM in 2026, but almost every internal piece has been swapped, sparsified, or specialized. This series covers the modern stack end to end — architecture, training, inference, retrieval, evaluation, safety, deployment. Chapter 1 is about the block itself: what attention looks like in a 2026 model, how MoE breaks the param-FLOPs link, and where the non-attention alternatives (Mamba, RWKV) actually beat the Transformer.

Mamba on Chen Kai Blog

LLM Engineering (1): Architectures from Transformer to MoE