Tags

Tokenization

Mar 28, 2026 LLM Engineering 18 min read

LLM Engineering (2): Tokenization Deep Dive

BPE vs SentencePiece vs WordPiece, byte-level fallback, the CJK token-bloat problem, vocabulary expansion costs, and the chat-template tokens that silently shape every model's behavior.