Tagged

LLM

Mar 22, 2026 Terraform Agents 10 min read

Terraform for AI Agents (6): LLM Gateway and Secrets Management

Centralise LLM API access through one gateway: per-agent quotas, request logging, and zero secrets outside KMS. Terraform-provisioned API Gateway plus self-hosted LiteLLM on ECS, with DashScope/OpenAI/Anthropic keys …

Mar 7, 2026 Aliyun PAI 5 min read

Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain

Submit a real multi-GPU training job on PAI-DLC, understand the resource pools (Lingjun vs general vs preemptible), and use AIMaster + EasyCKPT so a flaky node doesn't cost you a day.

Feb 26, 2026 Aliyun Bailian 6 min read

Aliyun Bailian (2): The Qwen LLM API in Production

Picking a Qwen variant by latency and cost, function calling done right, JSON mode without tears, and the enable_thinking + streaming requirement that the docs gloss over.

Feb 25, 2026 Aliyun Bailian 6 min read

Aliyun Bailian (1): Platform Overview and First Request

A practitioner's tour of Alibaba Cloud Bailian (DashScope) — what's actually in the model catalog, the two endpoint flavors, the async task pattern, and a working sample request to ground the rest of the series.

Jan 3, 2026 Recommendation Systems 17 min read

Recommendation Systems (12): Large Language Models and Recommendation

How LLMs reshape recommendation: enhancers (P5, M6Rec), predictors (TallRec, GenRec), and agents (LlamaRec, ChatREC). Hybrid pipelines, cold-start wins, prompt design, and the cost/quality Pareto frontier.

Dec 31, 2025 Standalone 23 min read

AI Agents Complete Guide: From Theory to Industrial Practice

A practitioner-grade guide to building AI agents: planning (CoT/ReAct/ToT), memory architectures, tool use, reflection, multi-agent patterns, frameworks (LangChain, LangGraph, AutoGen, CrewAI), evaluation, and production …

Nov 25, 2025 NLP 18 min read

NLP (12): Frontiers and Practical Applications

Series finale: agents and tool use (Function Calling, ReAct), code generation (Code Llama, Codex), long-context attention (Longformer, Infini-attention), reasoning models (o1, R1), safety and alignment, evaluation, and …

Nov 20, 2025 NLP 17 min read

NLP (11): Multimodal Large Language Models

A deep dive into multimodal LLMs: contrastive vision-language pre-training with CLIP, parameter-efficient bridging with BLIP-2's Q-Former, visual instruction tuning with LLaVA, robust speech recognition with Whisper, …

Nov 15, 2025 NLP 16 min read

NLP (10): RAG and Knowledge Enhancement Systems

Build production-grade RAG systems from first principles: the retrieve-then-generate decomposition, vector indexes (FAISS / Milvus / Chroma / Weaviate / Pinecone), dense+sparse hybrid retrieval with RRF, cross-encoder …

Nov 10, 2025 NLP 17 min read

NLP (9): Deep Dive into LLM Architecture

Inside modern LLMs: pre-norm + RMSNorm + SwiGLU + RoPE + GQA, KV cache mechanics, FlashAttention's IO-aware schedule, sparse Mixture-of-Experts, and INT8 / INT4 quantization.

Nov 5, 2025 NLP 15 min read

NLP (8): Model Fine-tuning and PEFT

A deep dive into Parameter-Efficient Fine-Tuning. Why LoRA's low-rank update works, the math and memory accounting behind QLoRA, how Adapters and Prefix-Tuning differ, and how to choose between them in production.

Oct 31, 2025 NLP 18 min read

NLP (7): Prompt Engineering and In-Context Learning

From prompt anatomy to chain-of-thought, self-consistency and ReAct: a working theory of in-context learning, the variance you have to fight, and the patterns that scale to real systems.

Oct 15, 2025 Standalone 27 min read

Prompt Engineering Complete Guide: From Zero to Advanced Optimization

Master prompt engineering from zero-shot basics to Tree of Thoughts, DSPy, and automated optimization. Includes benchmarks, code, and a debugging toolkit.

Jun 21, 2025 Standalone 15 min read

LLM Workflows and Application Architecture: Enterprise Implementation Guide

From a single API call to a production LLM platform — workflow patterns, RAG, model routing, deployment, cost levers, observability, and enterprise integration, with the trade-offs that actually matter.

Mar 31, 2025 Standalone 10 min read

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Prefix-Tuning adapts frozen LLMs by learning continuous key/value vectors injected into attention. Covers the method, reparameterization, KV-cache mechanics, and comparisons with prompt tuning, adapters, and LoRA.

Oct 12, 2024 Standalone 13 min read

MoSLoRA: Mixture-of-Subspaces in Low-Rank Adaptation

MoSLoRA boosts LoRA expressivity by mixing multiple low-rank subspaces with a lightweight mixer. Covers when vanilla LoRA fails, mixer design choices, and tuning tips.

Sep 20, 2023 Standalone 6 min read

Position Encoding Brief: From Sinusoidal to RoPE and ALiBi

A practitioner's tour of Transformer position encoding: why attention needs it at all, how sinusoidal/learned/relative/RoPE/ALiBi schemes differ, and which one to pick when long-context extrapolation matters.

Mar 13, 2023 Standalone 19 min read

Learning Rate: From Basics to Large-Scale Training

A practitioner's guide to the single most important hyperparameter: why too-large LR explodes, how warmup and schedules really work, the LR range test, the LR-batch-size-weight-decay coupling, and recent ideas like WSD, …

Dec 9, 2022 Standalone 10 min read

Optimizer Evolution: From Gradient Descent to Adam (and Beyond, 2025)

One article that traces the full lineage GD -> SGD -> Momentum -> NAG -> AdaGrad -> RMSProp -> Adam -> AdamW, then onwards to Lion / Sophia / Schedule-Free. Each step is framed by the specific failure of the previous …

Nov 26, 2022 Standalone 12 min read

LLMGR: Integrating Large Language Models with Graphical Session-Based Recommendation

LLMGR uses an LLM as the semantic engine for session-based recommendation and a GNN as the ranker. Covers the hybrid encoding layer, two-stage prompt tuning, ~8.68% HR@20 lift, and how to deploy without running an LLM …

Aug 5, 2022 Standalone 21 min read

Multimodal LLMs and Downstream Tasks: A Practitioner's Guide

End-to-end map of multimodal LLMs: vision-language alignment, cross-modal fusion, the CLIP/BLIP/LLaVA families, downstream tasks (VQA, captioning, grounding, OCR), fine-tuning trade-offs, benchmarks, and what it takes to …