Tagged

MoE

Nov 10, 2025 NLP 17 min read

NLP (9): Deep Dive into LLM Architecture

Inside modern LLMs: pre-norm + RMSNorm + SwiGLU + RoPE + GQA, KV cache mechanics, FlashAttention's IO-aware schedule, sparse Mixture-of-Experts, and INT8 / INT4 quantization.