Tags
Quantization
LLM Engineering (5): Inference Optimization
KV cache mechanics, paged attention, continuous batching, speculative decoding, INT8/INT4/AWQ/GPTQ quantization, and the vLLM vs SGLang vs TensorRT-LLM tradeoffs.
KV cache mechanics, paged attention, continuous batching, speculative decoding, INT8/INT4/AWQ/GPTQ quantization, and the vLLM vs SGLang vs TensorRT-LLM tradeoffs.