Chen Kai portrait
About

Chen Kai

Engineer · Writer · Building long-running agent systems @ Alibaba Cloud

chenkai66 chenkai.nb.666@gmail.com usually replies within 24h

I came up through math and machine learning, then bent everything toward the engineering side of cloud-native, long-running agents — the kind of system that runs for hours, days, sometimes weeks, and has to stay coherent the whole way.

Most of what I ship is glue: control planes that orchestrate sub-agents, harnesses that compress failures into reusable Skills, shared memory that doesn’t drift between sessions, and the unsexy production layer between a working demo and a system you’d actually leave running overnight.

When I’m not shipping, I’m writing. 346 original Chinese + 346 original English long-form posts (692 total), organized into 33 series on this site, written from scratch in both Chinese and English — Chinese leans toward concision; English leans toward exposition. The list shows up under /projects if you want the full picture.

These posts are not write-and-forget. I regularly revisit and revise published articles — updating toolchains, incorporating newer research, and refining engineering judgments — so that every piece reflects the current state of the art, not just the moment it was first written.

Now

What I am working on right now

01

AI4Marketing — one sentence, one campaign

Trend discovery → AI copy → image generation → short-video (talking-head / narration / drama) → TTS → scheduled multi-platform publishing, driven by 120+ API endpoints. Content atomization, multi-perspective AI debate, GEO (generative engine optimization), storyboard editing, content calendars, and analytics — 128K lines of TypeScript across 11 Alibaba Cloud services.

02

AI4Science — a research agent loop that never stops

Three pipelines + six systemd services running 24/7. 41,000+ papers read, 4,000+ research ideas generated, 268 experiments completed, 1,100+ manuscripts produced. Knowledge graph maintenance + signal assessment + adversarial debate to filter ideas + FSM-driven full experiment lifecycle: design → execution → statistical analysis → failure diagnosis → writing → three-round review.

03

DaaS — distill a company's docs into Agent Skills

Point it at a folder of product docs; an LLM reads everything and writes detail skills (how to use each capability) and intent skills (which path to pick). 11 product workspaces, 670+ auto-generated Skills, 96% adversarial recall. Ships with auto-generated MCP servers, D3 knowledge graph, drift detection, live SSE telemetry, and dual-currency subscription billing.

04

MiniGameForge — idea to publishable mini-game in 30 minutes

Pick a template, pick a style, hit generate. AI auto-produces code, art, and cutscene videos. Elevator agent pipeline (planner → executor → verifier), 7-category tripwire guardrails, visual self-debug (the agent watches the running screen and fixes its own bugs), and adaptive routing across 6 providers / 17 keys. Live at llm4marketing.asia.

05

llm-elevator — autonomous coding with domestic LLMs

78-module orchestration system that makes Qwen, DeepSeek, Kimi, and other domestic LLMs reliably complete long-horizon software engineering tasks. Core loop: Planner → Critic → Executor → Reviewer → Verifier → Git → Lesson. Cross-family model review prevents sycophancy, tripwire hard-blocks prevent loops, automatic model escalation on failure, and every success distills into a reusable Skill.

06

chenk.top — long-form, in both languages

The site you're on. Both Chinese and English are written from scratch — never translations: Chinese leans toward concision, English toward exposition. As of now: 30+ thematic series, 700+ long-form posts, 2200+ original charts — all written, illustrated, and edited by me. I regularly revisit and revise published articles to keep every piece current.

07

LLM App Security — securing AI-coded applications

An open-source book in progress. Dissects the real attack surface of LLM-powered apps — prompt injection, privilege escalation, supply-chain poisoning — and extracts engineering-grade defenses from production incidents. Each chapter maps to a security dimension with reproducible attack/defense labs and ready-to-deploy guard scripts.

The question I keep returning to

How do long-running systems stay resilient through failure, model swaps, cost pressure, and infrastructure migration?

Concrete levers: dynamic token budget across providers, compressing failures into reusable Skills, type-converging shared memory, and bridging the gap between demo and production-ready observability, replay, and trust.

Principles

A few principles I code by

  1. i

    Tools expire; judgment endures.

  2. ii

    Docs deserve more time than code.

  3. iii

    Premature abstraction is the most expensive engineering instinct.

  4. iv

    A small system that runs 30 days beats a flashy one that runs 30 minutes.

  5. v

    Treat agents as real systems — with explicit cost, defined failure modes, and ops overhead — not "talking prompts".

Get in touch

Interesting projects, potential collaborations, or an idea still taking shape — all welcome. I reply within 24 hours.