Deployment

Apr 7, 2026 LLM Engineering 36 min read

LLM Engineering (12): Production — Deployment, Monitoring, Cost

Serving stack choices in detail, autoscaling LLMs, latency budgets, prompt+completion cost tracking, multi-model routing, FrugalGPT cascading, observability you need from day one, and the on-call patterns that work.

Nov 25, 2025 NLP 36 min read

NLP (12): Frontiers and Practical Applications

Series finale: agents and tool use (Function Calling, ReAct), code generation (Code Llama, Codex), long-context attention (Longformer, Infini-attention), reasoning models (o1, R1), safety and alignment, evaluation, and …