<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Chen Kai Blog</title><link>https://www.chenk.top/en/</link><description>Recent content on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Thu, 26 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/index.xml" rel="self" type="application/rss+xml"/><item><title>Terraform for AI Agents (8): End-to-End — research-agent-stack in One Apply</title><link>https://www.chenk.top/en/terraform-agents/08-end-to-end-walkthrough/</link><pubDate>Thu, 26 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/08-end-to-end-walkthrough/</guid><description>&lt;p>This is the article where everything from articles 2 through 7 lands in one place. By the end you&amp;rsquo;ll have run &lt;code>terraform apply&lt;/code> once and produced a complete, observable, budgeted agent runtime stack on Alibaba Cloud. About 31 resources, ~7 minutes of wall clock.&lt;/p>
&lt;p>The stack we&amp;rsquo;re building:&lt;/p>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/terraform-agents/08-end-to-end-walkthrough/fig1_full_stack.png" alt="research-agent-stack: every box, one terraform apply" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p>
&lt;p>Five layers — edge, compute, memory, platform, ops — composed from the modules we built across this series.&lt;/p></description></item><item><title>Terraform for AI Agents (7): Observability, SLS Dashboards, and Cost Alarms</title><link>https://www.chenk.top/en/terraform-agents/07-observability-and-cost-control/</link><pubDate>Tue, 24 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/07-observability-and-cost-control/</guid><description>&lt;p>Agents are non-deterministic, multi-step, and call expensive APIs. The combination means you cannot debug them after the fact unless you instrumented them on day one. This article wires three pipelines through Terraform — logs, traces, metrics — into a unified dashboard, then layers four alarms that have actually fired and saved my projects in production.&lt;/p>
&lt;p>By the end you have one DingTalk channel that pings before the bill explodes, the latency dies, the error rate spikes, or some agent starts looping on itself.&lt;/p></description></item><item><title>Terraform for AI Agents (6): LLM Gateway and Secrets Management</title><link>https://www.chenk.top/en/terraform-agents/06-llm-gateway-and-secrets/</link><pubDate>Sun, 22 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/06-llm-gateway-and-secrets/</guid><description>&lt;p>A pattern I see repeatedly in immature agent stacks: each agent has its own copy of &lt;code>OPENAI_API_KEY&lt;/code> in its own &lt;code>.env&lt;/code> file. Sometimes the same key, sometimes different ones, sometimes a colleague&amp;rsquo;s personal key from when they prototyped. When the bill arrives nobody can tell which agent caused which token spend, and when a key leaks (it always does) you&amp;rsquo;re playing whack-a-mole across a dozen &lt;code>.env&lt;/code> files.&lt;/p>
&lt;p>This article ends that. We build one &lt;strong>LLM gateway&lt;/strong> that:&lt;/p></description></item><item><title>Terraform for AI Agents (5): Storage — Vector, Relational, and Object Memory</title><link>https://www.chenk.top/en/terraform-agents/05-storage-for-agent-memory/</link><pubDate>Fri, 20 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/05-storage-for-agent-memory/</guid><description>&lt;p>An agent&amp;rsquo;s memory is the part most tutorials hand-wave. &amp;ldquo;Just put the embeddings in Pinecone, the sessions in Postgres, the screenshots in S3.&amp;rdquo; On Aliyun, all three exist as managed services, and Terraform-provisioning them right is the difference between &amp;ldquo;memory works&amp;rdquo; and &amp;ldquo;we lost three weeks of conversation history because the disk filled up at 4am&amp;rdquo;.&lt;/p>
&lt;p>This article covers all three layers, the Terraform for each, and the boring-but-critical lifecycle and backup rules.&lt;/p></description></item><item><title>Terraform for AI Agents (4): Compute — ECS, ACK, or Function Compute?</title><link>https://www.chenk.top/en/terraform-agents/04-compute-for-agent-runtime/</link><pubDate>Wed, 18 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/04-compute-for-agent-runtime/</guid><description>&lt;p>The single most important architecture decision in an agent system is &lt;em>where the agent loop process actually runs&lt;/em>. There are exactly three good answers on Aliyun. Picking the wrong one isn&amp;rsquo;t catastrophic — you can migrate later — but it costs you weeks of unnecessary scaffolding.&lt;/p>
&lt;p>This article walks through all three with working Terraform, the cost crossover, and the operational gotchas.&lt;/p>
&lt;h2 id="the-three-patterns">The three patterns&lt;/h2>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/terraform-agents/04-compute-for-agent-runtime/fig1_three_compute_patterns.png" alt="Three places to run an agent: ECS, ACK, FC" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p></description></item><item><title>Terraform for AI Agents (3): A Reusable VPC and Security Baseline</title><link>https://www.chenk.top/en/terraform-agents/03-vpc-and-security-baseline/</link><pubDate>Mon, 16 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/03-vpc-and-security-baseline/</guid><description>&lt;p>This article builds the single most copied piece of Terraform in my agent projects: a &lt;code>vpc-baseline&lt;/code> module that gives every later component (ECS, RDS, OpenSearch, ACK) a sane place to land.&lt;/p>
&lt;p>By the end you&amp;rsquo;ll have:&lt;/p>
&lt;ul>
&lt;li>A VPC across three availability zones in one region&lt;/li>
&lt;li>Six subnets (one public + one private per zone) with non-overlapping CIDRs&lt;/li>
&lt;li>A NAT gateway with EIP for private-subnet outbound to LLM APIs&lt;/li>
&lt;li>Three security groups stacked by tier (ALB → agent runtime → memory)&lt;/li>
&lt;li>Three KMS customer master keys, one per data domain (memory, secrets, logs)&lt;/li>
&lt;li>A clean module interface: name + CIDR + zones in, IDs out&lt;/li>
&lt;/ul>
&lt;p>It&amp;rsquo;s about 200 lines of HCL all-in. Worth typing once, refer to it forever.&lt;/p></description></item><item><title>Terraform for AI Agents (2): Provider, Auth, and Remote State on OSS</title><link>https://www.chenk.top/en/terraform-agents/02-provider-and-state-setup/</link><pubDate>Sat, 14 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/02-provider-and-state-setup/</guid><description>&lt;p>This is the article where you stop reading and start typing. By the end you will have:&lt;/p>
&lt;ol>
&lt;li>The &lt;code>alicloud&lt;/code> Terraform provider installed and version-pinned&lt;/li>
&lt;li>Authentication wired up — through the right method, not the convenient one&lt;/li>
&lt;li>Remote state on an OSS bucket with Tablestore-based locking&lt;/li>
&lt;li>Three workspaces (&lt;code>dev&lt;/code>, &lt;code>staging&lt;/code>, &lt;code>prod&lt;/code>) that share a backend but isolate state&lt;/li>
&lt;li>A working &lt;code>terraform plan&lt;/code> against an empty config&lt;/li>
&lt;/ol>
&lt;p>Nothing here provisions an agent yet. We&amp;rsquo;re laying the foundation that every later article assumes.&lt;/p></description></item><item><title>Terraform for AI Agents (1): Why IaC Is the Only Sane Way to Ship Agents</title><link>https://www.chenk.top/en/terraform-agents/01-why-terraform-for-agents/</link><pubDate>Thu, 12 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/terraform-agents/01-why-terraform-for-agents/</guid><description>&lt;p>I have shipped four agent systems on Alibaba Cloud in the last eighteen months. Three of them started life as a &lt;code>tmux&lt;/code> session on a single ECS instance someone created by clicking through the console. All three of those needed a panicked weekend of rebuilding when the second engineer joined the project, when the prod region had a stockout, or when the security team asked for a network diagram.&lt;/p>
&lt;p>The fourth started life as &lt;code>terraform apply&lt;/code>. It was the only one I haven&amp;rsquo;t lost a weekend to.&lt;/p></description></item><item><title>Aliyun PAI (5): Designer vs Model Gallery — When the GUIs Actually Earn Their Keep</title><link>https://www.chenk.top/en/aliyun-pai/05-pai-designer-vs-quickstart/</link><pubDate>Mon, 09 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/05-pai-designer-vs-quickstart/</guid><description>&lt;p>The first four articles were about the underlying primitives — DSW, DLC, EAS — that you orchestrate with Python. This one is about the two GUI products that wrap those primitives and ship a runnable thing for users who do not want to write Python: &lt;strong>PAI-Designer&lt;/strong> for drag-and-drop tabular pipelines, and &lt;strong>Model Gallery&lt;/strong> for zero-code open-source model deployment and fine-tuning. They are not what serious engineers reach for first, but in two specific situations they are obviously the right answer.&lt;/p></description></item><item><title>Aliyun PAI (4): PAI-EAS — Model Serving, Cold Starts, and the TPS Lie</title><link>https://www.chenk.top/en/aliyun-pai/04-pai-eas-model-serving/</link><pubDate>Sun, 08 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/04-pai-eas-model-serving/</guid><description>&lt;p>EAS is where the money goes. DSW costs you a few hundred RMB a month for dev. DLC costs you in spikes. EAS bills 24/7 because someone might call your endpoint, and that &amp;ldquo;minimum replica count&amp;rdquo; line in the autoscaler config is the single highest-leverage knob in the whole platform. This article is what I wish I&amp;rsquo;d known the day before we shipped our first production endpoint.&lt;/p>
&lt;h2 id="what-eas-is-per-the-docs">What EAS is, per the docs&lt;/h2>
&lt;p>The official &amp;ldquo;EAS overview&amp;rdquo; frames it as: &amp;ldquo;deploy trained models as online inference services or AI web applications, with heterogeneous resources, automatic scaling, one-click stress testing, canary releases, and real-time monitoring&amp;rdquo;. The two things to underline:&lt;/p></description></item><item><title>Aliyun PAI (3): PAI-DLC — Distributed Training Without the Cluster Pain</title><link>https://www.chenk.top/en/aliyun-pai/03-pai-dlc-distributed-training/</link><pubDate>Sat, 07 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/03-pai-dlc-distributed-training/</guid><description>&lt;p>A DSW notebook is for one engineer on one GPU. The moment you need eight GPUs across two nodes, or the moment training runs longer than the eight hours you&amp;rsquo;ll keep the tab open, you switch to &lt;strong>DLC&lt;/strong>. DLC is PAI&amp;rsquo;s job-submission front-end for a managed Kubernetes cluster: you describe what you want (image, command, resources, data mounts), DLC schedules pods, runs them to completion, persists logs, and tells you what happened. The docs call this &lt;em>Deep Learning Containers&lt;/em>; we just say &amp;ldquo;DLC job&amp;rdquo;.&lt;/p></description></item><item><title>Aliyun PAI (2): PAI-DSW — Notebooks That Don't Eat Your Weights</title><link>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</link><pubDate>Fri, 06 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/02-pai-dsw-notebook/</guid><description>&lt;p>Every time I onboard a new ML engineer to PAI the first day looks the same. They start a DSW instance, &lt;code>pip install&lt;/code> their world, train for an hour, restart the kernel for some reason, and then ask me where their model file went. The honest answer — &amp;ldquo;in &lt;code>/root&lt;/code> on a node that no longer exists&amp;rdquo; — is the kind of lesson you only need to learn once. This article is the version of that lesson you read in advance.&lt;/p></description></item><item><title>Aliyun PAI (1): Platform Overview and the Product Family Map</title><link>https://www.chenk.top/en/aliyun-pai/01-platform-overview/</link><pubDate>Thu, 05 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-pai/01-platform-overview/</guid><description>&lt;p>If your team trains or serves any model on Alibaba Cloud, sooner or later you will end up in the PAI console. PAI is the umbrella; underneath it sit the actual workhorses — a notebook product, a distributed training service, a model-serving service, plus a couple of GUI/quick-deploy layers on top. After about eighteen months of running real LLM workloads on it for an AI marketing platform, this series is the field guide I wish someone had handed me before I shipped my first endpoint.&lt;/p></description></item><item><title>Aliyun Bailian (5): Qwen-TTS for Multilingual Voice</title><link>https://www.chenk.top/en/aliyun-bailian/05-qwen-tts-voice/</link><pubDate>Sun, 01 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/05-qwen-tts-voice/</guid><description>&lt;p>The reason every Chinese-language product I&amp;rsquo;ve worked on ends up calling Qwen-TTS-Flash isn&amp;rsquo;t price — there are cheaper TTS APIs. It&amp;rsquo;s that Qwen-TTS is the only one that handles &lt;strong>mainland Chinese dialects&lt;/strong> (Cantonese, Sichuanese, Wu) and English in the same SDK, with voices that don&amp;rsquo;t sound like a 2019 customs announcement. After about six months of using it for a marketing-video voice-over pipeline, this is what I wish someone had told me on day one.&lt;/p></description></item><item><title>Aliyun Bailian (4): Wanxiang Video Generation End-to-End</title><link>https://www.chenk.top/en/aliyun-bailian/04-wanxiang-video-generation/</link><pubDate>Sat, 28 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/04-wanxiang-video-generation/</guid><description>&lt;p>Wanxiang is the API that has done the most for our marketing pipeline and caused the most production surprises. The model is genuinely good — &lt;code>wan2.5-t2v-plus&lt;/code> produces 720p clips that pass for an actual video team&amp;rsquo;s output most of the time — but the surface around it is async, native-protocol, has expiring URLs, and rate-limits in non-obvious ways. This article is the version of the docs that has been through six months of &amp;ldquo;why is this happening at 2am&amp;rdquo; tickets.&lt;/p></description></item><item><title>Aliyun Bailian (3): Qwen-Omni for Video, Audio, and Image Understanding</title><link>https://www.chenk.top/en/aliyun-bailian/03-qwen-omni-multimodal/</link><pubDate>Fri, 27 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/03-qwen-omni-multimodal/</guid><description>&lt;p>Of all the Bailian models, Qwen-Omni is the one that has pulled me out of the most product-roadmap holes. &amp;ldquo;Can you tell me what&amp;rsquo;s happening in this 2-minute promo video?&amp;rdquo; used to be a 3-week project involving frame extraction, captioning per frame, and a stitch step. With Qwen-Omni it is one HTTP request. But the docs are sparse on the gotchas, and there is one (streaming is mandatory) that has cost more than one team a half-day. Let&amp;rsquo;s not have that be you.&lt;/p></description></item><item><title>Aliyun Bailian (2): The Qwen LLM API in Production</title><link>https://www.chenk.top/en/aliyun-bailian/02-qwen-llm-api/</link><pubDate>Thu, 26 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/02-qwen-llm-api/</guid><description>&lt;p>This is the article in the series where most of the production wins live. The other models are interesting; the LLMs are what every product I have shipped on Bailian has called every minute of every day. The official Qwen API reference is dense and complete; this article is the readable companion that picks one path through it.&lt;/p>
&lt;h2 id="pick-the-right-qwen-variant-for-the-workload">Pick the right Qwen variant for the workload&lt;/h2>
&lt;p>The Qwen family is large. Most teams overspend by defaulting to &lt;code>qwen-max&lt;/code> everywhere. Most teams underspend on quality by defaulting to &lt;code>qwen-turbo&lt;/code>. The right answer is &amp;ldquo;match variant to job&amp;rdquo;:&lt;/p></description></item><item><title>Aliyun Bailian (1): Platform Overview and First Request</title><link>https://www.chenk.top/en/aliyun-bailian/01-platform-overview/</link><pubDate>Wed, 25 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/aliyun-bailian/01-platform-overview/</guid><description>&lt;p>If you ship anything that touches Chinese-language users, sooner or later you will end up calling a Bailian model. Qwen-Max is the cheapest sane way to get GPT-4-class Chinese understanding, the Wanxiang video models are the only production-grade text-to-video API I can buy with a Chinese invoice, and Qwen-TTS-Flash is the only TTS that handles Cantonese and Sichuanese without sounding like a customs announcement. After about a year of running these in production for an AI-marketing platform, this series is what I wish someone had handed me on day one.&lt;/p></description></item><item><title>ML Math Derivations (20): Regularization and Model Selection</title><link>https://www.chenk.top/en/ml-math-derivations/20-regularization-and-model-selection/</link><pubDate>Sun, 08 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/20-regularization-and-model-selection/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>A 100-million-parameter network trained on 50,000 images &lt;em>should&lt;/em> overfit catastrophically. Modern deep networks generalise anyway. &lt;strong>Why?&lt;/strong> Two ingredients: &lt;em>regularisation&lt;/em> (techniques that constrain capacity) and &lt;em>generalisation theory&lt;/em> (mathematics that says when learning works at all). This article is the closing chapter of the series, and we use it to gather every tool we have built — least squares, MAP estimation, optimisation, EM, neural networks — and turn them on the deepest open question in the field: &lt;em>why does learning generalise?&lt;/em>&lt;/p></description></item><item><title>ML Math Derivations (19): Neural Networks and Backpropagation</title><link>https://www.chenk.top/en/ml-math-derivations/19-neural-networks-and-backpropagation/</link><pubDate>Sat, 07 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/19-neural-networks-and-backpropagation/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>A single perceptron cannot solve XOR. Stack enough of them with nonlinear activations and you obtain a &lt;em>universal function approximator&lt;/em>. The remaining question is how such a network learns from data. The answer — &lt;strong>backpropagation&lt;/strong>, an efficient application of the chain rule that recycles intermediate results during a single backward sweep — is the engine behind every deep learning library written in the last forty years. Understanding it mathematically reveals two further truths: why deep networks suffer from vanishing or exploding gradients, and why the choice of weight initialization is much less arbitrary than it first appears.&lt;/p></description></item><item><title>ML Math Derivations (18): Clustering Algorithms</title><link>https://www.chenk.top/en/ml-math-derivations/18-clustering-algorithms/</link><pubDate>Fri, 06 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/18-clustering-algorithms/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>A million customer records arrive with no labels. Can you discover meaningful groups automatically? That is &lt;strong>clustering&lt;/strong>, the most fundamental unsupervised learning task. Unlike classification, clustering forces you to first answer a slippery question: &lt;em>what does &amp;ldquo;similar&amp;rdquo; even mean?&lt;/em> Every clustering algorithm is, at heart, a different answer to that question &amp;ndash; a different geometric, probabilistic, or graph-theoretic prior on what a &amp;ldquo;group&amp;rdquo; is.&lt;/p>
&lt;p>&lt;strong>What you will learn:&lt;/strong>&lt;/p></description></item><item><title>ML Math Derivations (17): Dimensionality Reduction and PCA</title><link>https://www.chenk.top/en/ml-math-derivations/17-dimensionality-reduction-and-pca/</link><pubDate>Thu, 05 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/17-dimensionality-reduction-and-pca/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>Feed a clustering algorithm $10{,}000$-dimensional data and it will most likely fail &amp;ndash; not because the algorithm is broken, but because &lt;strong>high-dimensional space is a hostile environment for distance-based learning&lt;/strong>. Volumes evaporate into thin shells, the ratio of nearest- to farthest-neighbour distances tends to $1$, and &amp;ldquo;closeness&amp;rdquo; stops carrying information. Dimensionality reduction is the response: project the data into a lower-dimensional space while keeping the structure that actually matters.&lt;/p></description></item><item><title>ML Math Derivations (16): Conditional Random Fields</title><link>https://www.chenk.top/en/ml-math-derivations/16-conditional-random-fields/</link><pubDate>Wed, 04 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/16-conditional-random-fields/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>Named entity recognition, POS tagging, information extraction &amp;ndash; every one of these tasks asks you to label each element of a sequence. HMMs (&lt;a href="https://www.chenk.top/en/Machine-Learning-Mathematical-Derivations-15-Hidden-Markov-Models/">Part 15&lt;/a>
) attack this problem &lt;strong>generatively&lt;/strong> by modelling the joint distribution $P(\mathbf{X},\mathbf{Y})$, but to make the joint factorise they pay a steep price: each observation is assumed independent of everything except its own hidden label. In real text, whether &lt;em>bank&lt;/em> is a noun or a verb depends on the preceding word, the following word, the suffix, capitalisation, dictionary lookups &amp;ndash; all of these features at once.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (15): Hidden Markov Models</title><link>https://www.chenk.top/en/ml-math-derivations/15-hidden-markov-models/</link><pubDate>Tue, 03 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/15-hidden-markov-models/</guid><description>&lt;p>You hear footsteps behind you in a fog. You cannot see the walker, only the sounds. From the rhythm and pitch &amp;ndash; short, soft, hurried &amp;ndash; can you guess whether they are walking, running, or limping? And if you observed an entire sequence, which gait sequence is most likely? How likely is &lt;em>any&lt;/em> sequence of sounds under your model of how walking works?&lt;/p>
&lt;p>These are the &lt;strong>three problems of HMMs&lt;/strong>, and the surprise is that all three reduce to one trick: write the joint $P(\mathbf{O}, \mathbf{I})$ as a product of local factors along time, then &lt;strong>share sub-computations across time&lt;/strong> with dynamic programming. Brute force costs $O(N^T)$. Forward-Backward, Viterbi, and Baum-Welch all cost $O(N^2 T)$. The exponent collapses because the Markov assumption makes the future conditionally independent of the past given the present.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (14): Variational Inference and Variational EM</title><link>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</link><pubDate>Mon, 02 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/14-variational-inference-and-variational-em/</guid><description>&lt;p>When the posterior $p(\mathbf{z}\mid\mathbf{x})$ is intractable, you have two roads. &lt;strong>Sampling&lt;/strong> (MCMC) walks a Markov chain whose stationary distribution is the posterior — eventually exact, but slow and hard to diagnose. &lt;strong>Variational inference&lt;/strong> (VI) instead picks a simple family $\mathcal{Q}$ of distributions and finds the member $q^\star\in\mathcal{Q}$ that lies closest to the true posterior. Inference becomes optimization, and the same machinery that fits a neural network now fits a Bayesian model.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (13): EM Algorithm and GMM</title><link>https://www.chenk.top/en/ml-math-derivations/13-em-algorithm-and-gmm/</link><pubDate>Sun, 01 Feb 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/13-em-algorithm-and-gmm/</guid><description>&lt;p>When data carries hidden structure &amp;ndash; a cluster label you never observed, a missing feature, a topic you cannot directly see &amp;ndash; maximum likelihood becomes painful. The log of a sum has no closed form, and gradient methods get tangled in the latent variables. The &lt;strong>EM algorithm&lt;/strong> sidesteps the difficulty with a deceptively simple idea: alternate between &lt;em>guessing&lt;/em> the hidden variables under a posterior (E-step) and &lt;em>fitting&lt;/em> the parameters as if those guesses were true (M-step). Each iteration is mathematically guaranteed to push the likelihood up. This post derives EM from first principles, proves the monotone-ascent property via Jensen&amp;rsquo;s inequality, and works through its most famous application: &lt;strong>Gaussian Mixture Models (GMM)&lt;/strong> &amp;ndash; the soft, elliptical generalisation of K-means.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (12): XGBoost and LightGBM</title><link>https://www.chenk.top/en/ml-math-derivations/12-xgboost-and-lightgbm/</link><pubDate>Sat, 31 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/12-xgboost-and-lightgbm/</guid><description>&lt;p>XGBoost and LightGBM are the two libraries that quietly win most tabular-data battles &amp;mdash; on Kaggle leaderboards, in fraud-detection pipelines, in ad ranking, in churn models. They share the same backbone (gradient-boosted trees, Part 11) but make very different engineering bets:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>XGBoost&lt;/strong> sharpens the &lt;em>math&lt;/em>: it brings the second derivative of the loss into the objective, regularises the tree itself, and turns split selection into a closed-form score.&lt;/li>
&lt;li>&lt;strong>LightGBM&lt;/strong> sharpens the &lt;em>systems&lt;/em>: it bins features into a small histogram, grows trees leaf-by-leaf, throws away uninformative samples (GOSS) and bundles mutually exclusive sparse features (EFB).&lt;/li>
&lt;/ul>
&lt;p>The result is two tools that look interchangeable from the API but behave very differently when $N$ or $d$ becomes large. This post derives every formula behind those choices so you can read a tuning guide and know &lt;em>why&lt;/em> each knob exists.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (11): Ensemble Learning</title><link>https://www.chenk.top/en/ml-math-derivations/11-ensemble-learning/</link><pubDate>Fri, 30 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/11-ensemble-learning/</guid><description>&lt;p>Why does a committee of mediocre classifiers outperform a single brilliant one? The answer is unromantic but precise: averaging cuts variance, sequential reweighting cuts bias, and a little randomisation breaks the correlation that would otherwise destroy both effects. This post derives the mathematics behind that picture &amp;mdash; bias&amp;ndash;variance decomposition, bootstrap aggregating, AdaBoost as forward stagewise minimisation of exponential loss, and gradient boosting as gradient descent in function space.&lt;/p>
&lt;p>By the end you should be able to look at any ensemble method and say &lt;em>what it is reducing, why it works, and when it will fail.&lt;/em>&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (10): Semi-Naive Bayes and Bayesian Networks</title><link>https://www.chenk.top/en/ml-math-derivations/10-semi-naive-bayes-and-bayesian-networks/</link><pubDate>Thu, 29 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/10-semi-naive-bayes-and-bayesian-networks/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> Naive Bayes assumes every feature is conditionally independent given the class. It is a convenient lie &amp;ndash; one that lets us train in a single pass over the data, but one that classifiers based on tree structures and small graphs can systematically beat by a few accuracy points on virtually every UCI benchmark. This part walks the spectrum from &amp;ldquo;no dependencies&amp;rdquo; (Naive Bayes) to &amp;ldquo;all dependencies&amp;rdquo; (full joint), showing the three sweet spots that practitioners actually use: SPODE, TAN and AODE. The same factorisation idea, taken to its general form, is the Bayesian network.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (9): Naive Bayes</title><link>https://www.chenk.top/en/ml-math-derivations/09-naive-bayes/</link><pubDate>Wed, 28 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/09-naive-bayes/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook:&lt;/strong> A spam filter that trains in milliseconds, scales to a million features, has &lt;em>no hyperparameters worth tuning&lt;/em>, and still beats much fancier models on short-text problems. Naive Bayes pulls this off by making one outrageous assumption — every feature is independent given the class — and refusing to apologise for it. The assumption is wrong on essentially every real dataset, yet the classifier works. Understanding &lt;em>why&lt;/em> is a tour through generative modelling, MAP estimation, Dirichlet priors, and the bias–variance tradeoff. This article walks the entire path.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (8): Support Vector Machines</title><link>https://www.chenk.top/en/ml-math-derivations/08-support-vector-machines/</link><pubDate>Tue, 27 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/08-support-vector-machines/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> You have two clouds of points and infinitely many lines that separate them. Which line is &amp;ldquo;best&amp;rdquo;? SVM gives a startlingly geometric answer: the line that sits in the middle of the &lt;em>widest empty corridor&lt;/em> between the two classes. Push that single idea through Lagrangian duality and it produces a sparse model (only the points on the corridor wall matter), a quadratic program with a global optimum, and &amp;ndash; almost as a free gift &amp;ndash; the kernel trick that lets the same linear machinery carve curved boundaries in infinite-dimensional spaces.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (7): Decision Trees</title><link>https://www.chenk.top/en/ml-math-derivations/07-decision-trees/</link><pubDate>Mon, 26 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/07-decision-trees/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> A decision tree mimics how humans actually decide things: ask a question, branch on the answer, ask the next question. The math under that intuition is surprisingly rich — entropy from information theory tells us &lt;em>which&lt;/em> question to ask first, the Gini index gives a cheaper proxy that lands on essentially the same trees, and cost-complexity pruning gives a principled way to stop the tree from memorising noise. Almost every modern boosted ensemble (XGBoost, LightGBM, CatBoost) is just a clever sum of these objects, so getting the foundations right pays off many times over.&lt;/p></description></item><item><title>Machine Learning Mathematical Derivations (6): Logistic Regression and Classification</title><link>https://www.chenk.top/en/ml-math-derivations/06-logistic-regression-and-classification/</link><pubDate>Sun, 25 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/06-logistic-regression-and-classification/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> Linear regression maps inputs to any real number — but what if the output has to be a probability between 0 and 1? Logistic regression solves this with one elegant trick: a sigmoid squashing function. Despite its name, logistic regression is a &lt;em>classification&lt;/em> algorithm, and its math underpins every neuron in every modern neural network.&lt;/p>
&lt;/blockquote>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Why sigmoid is the natural way to turn a real-valued score into a probability, and why its derivative is so clean.&lt;/li>
&lt;li>How cross-entropy loss falls out of maximum likelihood estimation in two lines.&lt;/li>
&lt;li>Why cross-entropy beats MSE for classification — a vanishing-gradient argument made visible.&lt;/li>
&lt;li>The full gradient and Hessian for both binary and multi-class (softmax) cases, and why the loss is convex.&lt;/li>
&lt;li>L1, L2 and elastic-net regularization, and the Bayesian priors hiding behind them.&lt;/li>
&lt;li>Decision-boundary geometry and the threshold-free metrics (ROC / PR / AUC) that you actually need under class imbalance.&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Calculus: chain rule, partial derivatives.&lt;/li>
&lt;li>Linear algebra: matrix multiplication, transpose.&lt;/li>
&lt;li>Probability: Bernoulli and categorical distributions, likelihood.&lt;/li>
&lt;li>Familiarity with &lt;a href="https://www.chenk.top/en/Machine-Learning-Mathematical-Derivations-5-Linear-Regression/">Part 5: Linear Regression&lt;/a>
.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-from-linear-models-to-probabilistic-classification">1. From Linear Models to Probabilistic Classification&lt;/h2>
&lt;h3 id="11-the-problem-with-raw-linear-output">1.1 The Problem with Raw Linear Output&lt;/h3>
&lt;p>Linear regression gives us $\hat y = \mathbf{w}^\top \mathbf{x}$, which is unbounded. For classification, two things go wrong:&lt;/p></description></item><item><title>Mathematical Derivation of Machine Learning (5): Linear Regression</title><link>https://www.chenk.top/en/ml-math-derivations/05-linear-regression/</link><pubDate>Sat, 24 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/05-linear-regression/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Hook.&lt;/strong> In 1886 Francis Galton noticed something strange about heredity: children of unusually tall (or short) parents tended to be closer to the average than their parents were. He called this drift toward the mean &lt;em>regression&lt;/em>, and the name stuck. The statistical curiosity grew up into the most consequential model in machine learning &amp;ndash; not because linear regression is powerful on its own, but because almost every other algorithm (logistic regression, neural networks, kernel methods) is some twist on the same idea: &lt;strong>fit a line, but in the right space.&lt;/strong>&lt;/p></description></item><item><title>ML Math Derivations (4): Convex Optimization Theory</title><link>https://www.chenk.top/en/ml-math-derivations/04-convex-optimization-theory/</link><pubDate>Fri, 23 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/04-convex-optimization-theory/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>In 1947, George Dantzig proposed the simplex method for linear programming, and a working theory of optimization was born. Eight decades later, optimization has become the engine of machine learning: every model you train, from a one-line linear regression to a 70B-parameter language model, is the answer to &lt;em>some&lt;/em> optimization problem.&lt;/p>
&lt;p>Among all such problems, &lt;strong>convex optimization holds a privileged place&lt;/strong>. The defining property is so strong it almost feels like cheating: every local minimum is automatically a global minimum, and a handful of well-understood algorithms come with airtight convergence guarantees. The whole reason we treat &amp;ldquo;convex&amp;rdquo; as a green flag and &amp;ldquo;non-convex&amp;rdquo; as a yellow one comes down to this single fact.&lt;/p></description></item><item><title>ML Math Derivations (3): Probability Theory and Statistical Inference</title><link>https://www.chenk.top/en/ml-math-derivations/03-probability-theory-and-statistical-inference/</link><pubDate>Thu, 22 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/03-probability-theory-and-statistical-inference/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>In 1912, Ronald Fisher introduced &lt;strong>maximum likelihood estimation&lt;/strong> in a short paper that quietly redefined statistics. His insight was almost embarrassingly simple: &lt;em>if a parameter setting makes the observed data extremely likely, that parameter setting is probably right&lt;/em>. Almost every modern learning algorithm — from logistic regression to large language models — is a descendant of this idea.&lt;/p>
&lt;p>But likelihood alone is not enough. To use it we need a vocabulary for uncertainty (probability spaces, distributions), guarantees that empirical quantities track population ones (laws of large numbers, central limit theorem), and tools for incorporating prior knowledge (Bayesian inference). This article assembles those pieces into a coherent foundation for everything that follows.&lt;/p></description></item><item><title>ML Math Derivations (2): Linear Algebra and Matrix Theory</title><link>https://www.chenk.top/en/ml-math-derivations/02-linear-algebra-and-matrix-theory/</link><pubDate>Wed, 21 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/02-linear-algebra-and-matrix-theory/</guid><description>&lt;h2 id="why-this-chapter-and-whats-different">Why this chapter, and what&amp;rsquo;s different&lt;/h2>
&lt;p>If you have already worked through a standard linear-algebra course you have seen most of these objects. &lt;strong>This chapter is not that course.&lt;/strong> It is the &lt;em>ML practitioner&amp;rsquo;s slice&lt;/em> of linear algebra: the half-dozen ideas that actually appear when you implement gradient descent, run PCA, train a neural net, or read a paper.&lt;/p>
&lt;p>Concretely the goals are:&lt;/p>
&lt;ol>
&lt;li>Build a &lt;strong>geometric intuition&lt;/strong> for what matrices &lt;em>do&lt;/em> (rotate, stretch, project, kill).&lt;/li>
&lt;li>Learn the four decompositions that show up everywhere &amp;ndash; spectral, &lt;strong>SVD&lt;/strong>, QR, Cholesky &amp;ndash; and &lt;em>which one to reach for&lt;/em>.&lt;/li>
&lt;li>Master enough &lt;strong>matrix calculus&lt;/strong> to derive any neural-net gradient on the back of an envelope.&lt;/li>
&lt;/ol>
&lt;p>We skim the algebra of row reduction, determinants by cofactor, and abstract vector-space proofs. If you need those, the references at the bottom give the standard treatments. Here, every concept comes back to a picture or a line of NumPy.&lt;/p></description></item><item><title>Solving Constrained Mean-Variance Portfolio Optimization Using Spiral Optimization</title><link>https://www.chenk.top/en/standalone/solving-constrained-mean-variance-portfolio-optimization-pro/</link><pubDate>Wed, 21 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/solving-constrained-mean-variance-portfolio-optimization-pro/</guid><description>&lt;p>Markowitz&amp;rsquo;s mean-variance model is elegant until you add real trading constraints: &amp;ldquo;if you buy a stock at all, hold at least 5% of it&amp;rdquo; and &amp;ldquo;pick exactly 10 names from the S&amp;amp;P 500.&amp;rdquo; The closed-form quadratic program quietly mutates into a &lt;em>mixed-integer nonlinear program&lt;/em> (MINLP), and the standard solver chain (Lagrange multipliers, KKT conditions, interior-point methods) stops working. The paper reviewed here applies the &lt;strong>Spiral Optimization Algorithm&lt;/strong> (SOA), a population-based metaheuristic, to this problem and shows it can find competitive feasible solutions where gradient methods fail outright.&lt;/p></description></item><item><title>ML Math Derivations (1): Introduction and Mathematical Foundations</title><link>https://www.chenk.top/en/ml-math-derivations/01-introduction-and-mathematical-foundations/</link><pubDate>Tue, 20 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ml-math-derivations/01-introduction-and-mathematical-foundations/</guid><description>&lt;h2 id="what-this-chapter-does">What this chapter does&lt;/h2>
&lt;p>In 2005 Google Research showed, on a public benchmark, that a statistical translation model trained on raw bilingual text could outperform decades of carefully engineered linguistic rules. The conclusion was uncomfortable for the experts of the day, but mathematically liberating: &lt;strong>a system that has never been told the rules of a language can still recover them, given enough examples.&lt;/strong> Why?&lt;/p>
&lt;p>The answer is not a trick of engineering &amp;ndash; it is a theorem. In this chapter we build, from first principles, the part of mathematics that explains &lt;em>when&lt;/em> learning from data is possible, &lt;em>how much data&lt;/em> is required, and &lt;em>what fundamentally limits&lt;/em> what any algorithm can do.&lt;/p></description></item><item><title>Recommendation Systems (16): Industrial Architecture and Best Practices</title><link>https://www.chenk.top/en/recommendation-systems/16-industrial-practice/</link><pubDate>Thu, 15 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/16-industrial-practice/</guid><description>&lt;blockquote>
&lt;p>The hardest part of a production recommendation system is not the model. It is the &lt;strong>system around the model&lt;/strong>: the feature store that prevents training/serving skew, the canary deployment that catches a regression before it hits 100M users, the orchestration that meets a 100ms p95 latency budget while running four ML models in sequence. This final article describes the architecture that every major tech company has converged on &amp;ndash; and the trade-offs hiding inside each layer.&lt;/p></description></item><item><title>Recommendation Systems (15): Real-Time Recommendation and Online Learning</title><link>https://www.chenk.top/en/recommendation-systems/15-real-time-online/</link><pubDate>Mon, 12 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/15-real-time-online/</guid><description>&lt;blockquote>
&lt;p>A user opens your app at 14:02 and searches for &amp;ldquo;trail running shoes&amp;rdquo;. By 15:30 they have moved on and are reading kitchen reviews. A model that retrains nightly is still showing them Salomon ads at 16:00 — and that gap is exactly the bug a real-time system fixes. The interesting part is not &amp;ldquo;make it faster&amp;rdquo; but &amp;ldquo;what &lt;em>should&lt;/em> be fast&amp;rdquo; — most features add nothing to AUC even when made real-time, and the wrong design point burns money for no lift.&lt;/p></description></item><item><title>Recommendation Systems (14): Cross-Domain Recommendation and Cold-Start Solutions</title><link>https://www.chenk.top/en/recommendation-systems/14-cross-domain-cold-start/</link><pubDate>Fri, 09 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/14-cross-domain-cold-start/</guid><description>&lt;blockquote>
&lt;p>When Netflix launches in a new country, it inherits millions of users with zero history and a catalog with no local ratings. Amazon faces the same problem each time it opens a new product category. Pure collaborative filtering — the workhorse of warm-state recommendation — has nothing to compute on. The discipline that makes recommendations work in this regime is a stack of techniques: bootstrap heuristics for the first request, meta-learning after a handful of interactions, cross-domain transfer when a related domain is rich, and bandits to keep exploring once the model is confident. This post walks through that stack, anchored to the papers it descends from.&lt;/p></description></item><item><title>Recommendation Systems (13): Fairness, Debiasing, and Explainability</title><link>https://www.chenk.top/en/recommendation-systems/13-fairness-explainability/</link><pubDate>Tue, 06 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/13-fairness-explainability/</guid><description>&lt;blockquote>
&lt;p>A user opens Spotify and the same fifty songs keep appearing. They open Amazon and the top results are always the items they have already considered. They open YouTube and every recommendation is one click away from a rabbit hole they cannot remember asking for. Each of these symptoms has a name, a cause, and a fix. This article is about all three.&lt;/p>
&lt;/blockquote>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>The &lt;strong>seven biases&lt;/strong> that systematically distort what users see, where each one comes from, and how to measure it&lt;/li>
&lt;li>&lt;strong>Causal inference for recommenders&lt;/strong> — why correlations from logged data lie, and how IPS, doubly robust estimators, and propensity scoring give you unbiased signal&lt;/li>
&lt;li>&lt;strong>Production-grade debiasing&lt;/strong>: MACR for popularity bias, DICE for conformity bias, FairCo for amortized exposure fairness&lt;/li>
&lt;li>&lt;strong>Counterfactual fairness&lt;/strong> and adversarial training to keep protected attributes out of embeddings&lt;/li>
&lt;li>&lt;strong>Explainability that holds up under audit&lt;/strong>: LIME, SHAP, and counterfactual explanations&lt;/li>
&lt;li>A working &lt;strong>trade-off framework&lt;/strong> so you can pick where to operate on the accuracy–fairness Pareto frontier&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Embedding-based recommenders (&lt;a href="https://www.chenk.top/en/recommendation-systems-04-ctr-prediction/">Part 4&lt;/a>
 and &lt;a href="https://www.chenk.top/en/recommendation-systems-05-embedding-techniques/">Part 5&lt;/a>
)&lt;/li>
&lt;li>Basic causal inference vocabulary helps but is not required — we build it from scratch&lt;/li>
&lt;li>Comfortable reading PyTorch-style pseudocode&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="part-1--the-seven-biases">Part 1 — The Seven Biases&lt;/h2>
&lt;p>Bias in a recommender is not one problem. It is at least seven, and they compound. Below is the working taxonomy used in the survey of Chen et al. (2023, &lt;em>Bias and Debias in Recommender System&lt;/em>) — the cleanest reference if you want the full literature map.&lt;/p></description></item><item><title>Recommendation Systems (12): Large Language Models and Recommendation</title><link>https://www.chenk.top/en/recommendation-systems/12-llm-recommendation/</link><pubDate>Sat, 03 Jan 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/12-llm-recommendation/</guid><description>&lt;p>A user opens a movie app and types: &lt;em>&amp;ldquo;Something like Inception, but less depressing.&amp;rdquo;&lt;/em> A traditional recommender — collaborative filtering, two-tower DNN, even DIN — sees zero useful tokens here. It has no &lt;code>like&lt;/code> button to count, no co-watch graph to traverse, no user ID with history. The query has to be turned into IDs before the system can do anything.&lt;/p>
&lt;p>A Large Language Model has the opposite problem: it has &lt;em>too much&lt;/em> world knowledge but doesn&amp;rsquo;t know who this user is. It knows Inception is a Christopher Nolan film with non-linear narrative and a hopeful-but-ambiguous ending; it knows what &amp;ldquo;depressing&amp;rdquo; means in cinema; it can name twenty films that fit. But it can&amp;rsquo;t tell you which of those twenty the &lt;em>current&lt;/em> user has already seen, rated badly, or left half-watched.&lt;/p></description></item><item><title>AI Agents Complete Guide: From Theory to Industrial Practice</title><link>https://www.chenk.top/en/standalone/ai-agents-complete-guide/</link><pubDate>Wed, 31 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/ai-agents-complete-guide/</guid><description>&lt;p>A chatbot answers questions. An &lt;em>agent&lt;/em> gets things done &amp;ndash; it browses, runs code, calls APIs, queries databases, and iterates until the job is finished. The same LLM sits behind both, but the wrapper is different: an agent runs inside a loop with tools, memory, and the ability to inspect its own work.&lt;/p>
&lt;p>This guide is the long-form version of that idea. It covers the four core capabilities (planning, memory, tool use, reflection), the major framework families, multi-agent collaboration, evaluation, and the production concerns that decide whether an agent ships or quietly fails on a Tuesday afternoon.&lt;/p></description></item><item><title>Recommendation Systems (11): Contrastive Learning and Self-Supervised Learning</title><link>https://www.chenk.top/en/recommendation-systems/11-contrastive-learning/</link><pubDate>Wed, 31 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/11-contrastive-learning/</guid><description>&lt;p>Classical recommenders learn from one signal: did a user click, watch, or buy? That signal is precious, but it is also brutally sparse. Most users touch fewer than 1% of the catalogue, most items are touched by fewer than 0.1% of users, and a brand-new item or user has nothing at all. Optimising a model directly against such sparse labels almost guarantees overfitting on the head and silence on the tail.&lt;/p></description></item><item><title>Recommendation Systems (10): Deep Interest Networks and Attention Mechanisms</title><link>https://www.chenk.top/en/recommendation-systems/10-deep-interest-networks/</link><pubDate>Sun, 28 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/10-deep-interest-networks/</guid><description>&lt;p>A good chef doesn&amp;rsquo;t cook the same dish for every guest. She watches you walk in, notes the wine you order, glances at how you eyed the chalkboard — and only then decides whether tonight&amp;rsquo;s special should be the steak or the risotto. Your past visits matter, but only the parts that fit &lt;em>this&lt;/em> mood.&lt;/p>
&lt;p>A recommendation model used to be a worse chef. It would take everything the user had ever clicked, average it into a single vector, and serve the same dish to everyone in the room. That vintage leather jacket you viewed last week and the random phone charger you clicked six months ago carried equal weight, regardless of what you were looking at right now.&lt;/p></description></item><item><title>Recommendation Systems (9): Multi-Task Learning and Multi-Objective Optimization</title><link>https://www.chenk.top/en/recommendation-systems/09-multi-task-learning/</link><pubDate>Thu, 25 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/09-multi-task-learning/</guid><description>&lt;p>A live e-commerce ranker is never optimizing one number. The same model that decides which product to show you is, in the same forward pass, predicting whether you will click, whether you will add it to cart, whether you will pay, whether you will return it, and whether you will leave a positive review. Each prediction is a different &lt;em>task&lt;/em> with its own data distribution, its own scarcity, and its own incentives. They are also tightly coupled: a clicker is more likely to convert, a converter is more likely to write a review, and a high-CTR thumbnail can buy clicks that depress watch time.&lt;/p></description></item><item><title>Recommendation Systems (8): Knowledge Graph-Enhanced Recommendation</title><link>https://www.chenk.top/en/recommendation-systems/08-knowledge-graph/</link><pubDate>Mon, 22 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/08-knowledge-graph/</guid><description>&lt;p>When you search for &lt;em>The Dark Knight&lt;/em> on a streaming platform, the system does not merely log that you watched it. It knows Christian Bale played Batman, Christopher Nolan directed it, it belongs to the Batman trilogy, and it shares cinematic DNA with other cerebral action films. This rich semantic web is a &lt;strong>knowledge graph (KG)&lt;/strong> &amp;ndash; a structured network of entities (movies, actors, directors, genres) connected by typed relations (&lt;code>acted_in&lt;/code>, &lt;code>directed_by&lt;/code>, &lt;code>part_of&lt;/code>).&lt;/p></description></item><item><title>Recommendation Systems (7): Graph Neural Networks and Social Recommendation</title><link>https://www.chenk.top/en/recommendation-systems/07-graph-neural-networks/</link><pubDate>Fri, 19 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/07-graph-neural-networks/</guid><description>&lt;p>When Netflix decides what to recommend next, it does not look at your watch history in isolation. Behind the scenes there is a web of relationships: movies that share actors, users with overlapping taste, ratings that ripple through the catalogue. The &amp;ldquo;graph&amp;rdquo; view is not a metaphor — every interaction matrix &lt;em>is&lt;/em> a graph, and treating it as one unlocks ideas that flat user/item embeddings cannot express.&lt;/p>
&lt;p>&lt;strong>Graph neural networks&lt;/strong> (GNNs) are the tool that lets us reason over that graph. Instead of learning each user and each item in isolation, a GNN says: &lt;em>your representation is shaped by the company you keep.&lt;/em> That single shift powers Pinterest&amp;rsquo;s billion-node PinSage, the strikingly simple LightGCN that beats heavier baselines on collaborative filtering, and the social-recommendation systems that fuse &amp;ldquo;what you watched&amp;rdquo; with &amp;ldquo;what your friends watched.&amp;rdquo;&lt;/p></description></item><item><title>Recommendation Systems (6): Sequential Recommendation and Session-based Modeling</title><link>https://www.chenk.top/en/recommendation-systems/06-sequential-recommendation/</link><pubDate>Tue, 16 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/06-sequential-recommendation/</guid><description>&lt;p>When you scroll TikTok, every recommendation feels eerily on-point — not because the system reads your mind, but because it reads the &lt;strong>order&lt;/strong> of what you just watched. A cooking video followed by a travel vlog tells a different story than the same two clips in reverse. That ordering is exactly the signal that sequential recommenders are built to exploit.&lt;/p>
&lt;p>Compare two friends recommending shows. The first knows your favourite genres but never asks what you watched last week. The second says, &lt;em>&amp;ldquo;You just finished three sci-fi thrillers in a row — try this one.&amp;rdquo;&lt;/em> Traditional collaborative filtering is friend one. Sequential recommendation is friend two.&lt;/p></description></item><item><title>Recommendation Systems (5): Embedding and Representation Learning</title><link>https://www.chenk.top/en/recommendation-systems/05-embedding-techniques/</link><pubDate>Sat, 13 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/05-embedding-techniques/</guid><description>&lt;p>When Netflix suggests &lt;em>Inception&lt;/em> to someone who just finished &lt;em>The Dark Knight&lt;/em>, the magic is not a hand-crafted &amp;ldquo;if-watched-Nolan-then&amp;rdquo; rule. It is geometry. Both films sit close together in a 128-dimensional &lt;strong>embedding space&lt;/strong> that the model has learned from billions of viewing events. Geometry replaces enumeration: instead of comparing a movie to fifteen thousand others through brittle similarity rules, the system asks a single question — &lt;strong>how far apart are these two vectors?&lt;/strong>&lt;/p></description></item><item><title>Recommendation Systems (4): CTR Prediction and Click-Through Rate Modeling</title><link>https://www.chenk.top/en/recommendation-systems/04-ctr-prediction/</link><pubDate>Wed, 10 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/04-ctr-prediction/</guid><description>&lt;p>Every time you scroll through a social-media feed, click a product recommendation, or watch a suggested video, a CTR (click-through rate) model decided what to show you. These models answer one deceptively small question:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>&amp;ldquo;What is the probability that this specific user will click on this specific item, right now?&amp;rdquo;&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;p>Behind that question is one of the most economically valuable problems in machine learning. A 1% lift in CTR translates into millions of dollars at Google, Amazon, or Alibaba scale &amp;ndash; and the same models also drive video feeds, app stores, news apps, and dating apps. CTR prediction sits at the heart of the &lt;strong>ranking&lt;/strong> stage: candidate generation gives you a few thousand items, and the CTR model decides which dozen actually reach the user.&lt;/p></description></item><item><title>Recommendation Systems (3): Deep Learning Foundations</title><link>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</link><pubDate>Sun, 07 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</guid><description>&lt;p>In June 2016, Google published a one-page paper that quietly redrew the map of recommendation systems. The paper described &lt;strong>Wide &amp;amp; Deep Learning&lt;/strong>, the model then powering app recommendations inside Google Play &amp;ndash; a billion-user product. Within a year, every major tech company had a deep model in production. By 2019, the industry standard had shifted: matrix factorization was a baseline, not a system.&lt;/p>
&lt;p>What changed? Multi-layer neural networks brought four capabilities classical methods could not deliver:&lt;/p></description></item><item><title>Recommendation Systems (2): Collaborative Filtering and Matrix Factorization</title><link>https://www.chenk.top/en/recommendation-systems/02-collaborative-filtering/</link><pubDate>Thu, 04 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/02-collaborative-filtering/</guid><description>&lt;p>You finish &lt;em>The Shawshank Redemption&lt;/em> and want something with the same feeling. A genre filter would surface every prison drama ever made, most of them awful. Collaborative filtering takes a different route: it never looks at the movie itself. It looks at &lt;em>people who watched what you watched&lt;/em> and asks what else they loved.&lt;/p>
&lt;p>That single idea — let the crowd&amp;rsquo;s behaviour speak — powers Amazon, YouTube, Spotify and every modern feed. This article unpacks the algorithms behind it, from the neighbourhood methods of the 1990s to the matrix-factorization models that won the Netflix Prize.&lt;/p></description></item><item><title>Recommendation Systems (1): Fundamentals and Core Concepts</title><link>https://www.chenk.top/en/recommendation-systems/01-fundamentals/</link><pubDate>Mon, 01 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/01-fundamentals/</guid><description>&lt;p>Open Netflix and the homepage somehow knows you. Scroll TikTok and the next video is the one you didn&amp;rsquo;t realise you wanted. Drop into Spotify on a Monday morning and &lt;em>Discover Weekly&lt;/em> serves up thirty songs you&amp;rsquo;ve never heard of, and you save half of them.&lt;/p>
&lt;p>None of this is magic. It is one of the most commercially successful applications of machine learning, quietly running behind almost every consumer product you use: the &lt;strong>recommendation system&lt;/strong>.&lt;/p></description></item><item><title>NLP (12): Frontiers and Practical Applications</title><link>https://www.chenk.top/en/nlp/frontiers-applications/</link><pubDate>Tue, 25 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/frontiers-applications/</guid><description>&lt;p>We have spent eleven chapters climbing from raw text to multimodal foundation models. This twelfth and final chapter sits at the frontier and at the runway. It is where research stops being a paper and starts being a service: an LLM that calls tools, writes and debugs code, reasons through hundred-step problems, ingests a 200K-token contract, and serves a thousand concurrent users behind a FastAPI endpoint with p95 latency under 300 ms.&lt;/p></description></item><item><title>NLP (11): Multimodal Large Language Models</title><link>https://www.chenk.top/en/nlp/multimodal-nlp/</link><pubDate>Thu, 20 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/multimodal-nlp/</guid><description>&lt;p>Humans never perceive the world in one channel at a time. We watch a chart while reading the caption, hear a tone of voice while reading a face, glance at a screenshot while debating a bug. Pure-text language models are deaf and blind to all of that. &lt;strong>Multimodal Large Language Models (MLLMs)&lt;/strong> close the gap by aligning images, audio, and video into the same representation space the language model already speaks.&lt;/p></description></item><item><title>NLP (10): RAG and Knowledge Enhancement Systems</title><link>https://www.chenk.top/en/nlp/rag-knowledge-enhancement/</link><pubDate>Sat, 15 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/rag-knowledge-enhancement/</guid><description>&lt;p>A frozen language model is a confident liar. It cannot read yesterday&amp;rsquo;s incident report, your company wiki, or the patch notes that shipped this morning, so when you ask, it confabulates an answer that is grammatically perfect and factually wrong. &lt;strong>Retrieval-Augmented Generation (RAG)&lt;/strong> breaks the deadlock by separating &lt;em>memory&lt;/em> from &lt;em>reasoning&lt;/em>: keep the LLM small and stable, and put the volatile knowledge in an external store that you can update at any time. Before generating, retrieve the relevant evidence and condition the model on it.&lt;/p></description></item><item><title>NLP (9): Deep Dive into LLM Architecture</title><link>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</link><pubDate>Mon, 10 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/llm-architecture-deep-dive/</guid><description>&lt;p>The 2017 Transformer paper drew one block. Every production LLM today still uses that diagram as a silhouette, but almost every internal piece has been replaced. Pre-norm replaced post-norm. RMSNorm replaced LayerNorm. SwiGLU replaced GELU. Rotary embeddings replaced sinusoids. Multi-head attention became grouped-query attention. The dense FFN sometimes became a sparse mixture of experts. And the inference loop is dominated by a data structure that doesn&amp;rsquo;t appear in the original paper at all: the KV cache.&lt;/p></description></item><item><title>NLP (8): Model Fine-tuning and PEFT</title><link>https://www.chenk.top/en/nlp/fine-tuning-peft/</link><pubDate>Wed, 05 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/fine-tuning-peft/</guid><description>&lt;p>In 2020, fine-tuning a 7-billion-parameter language model was a project budget item: eight A100s, several days, and an engineer who knew how to babysit gradient checkpointing. In 2024, a graduate student does it on a laptop. The distance between those two worlds is almost entirely covered by one paper — Hu et al.&amp;rsquo;s LoRA (ICLR 2022) — and one follow-up — Dettmers et al.&amp;rsquo;s QLoRA (NeurIPS 2023).&lt;/p>
&lt;p>The shift is not just engineering. Parameter-Efficient Fine-Tuning (PEFT) reframes what it means to &amp;ldquo;have a model.&amp;rdquo; Instead of one binary blob per task, you keep a single frozen base model and a directory of small adapter files, each a few tens of megabytes. Switching tasks becomes loading a new adapter; serving N domains becomes O(1) base + N · ε.&lt;/p></description></item><item><title>NLP (7): Prompt Engineering and In-Context Learning</title><link>https://www.chenk.top/en/nlp/prompt-engineering-icl/</link><pubDate>Fri, 31 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/prompt-engineering-icl/</guid><description>&lt;p>The same model can produce a sharp answer or a confident hallucination. The difference is rarely the weights &amp;ndash; it is the framing. A vague request like &lt;em>&amp;ldquo;analyze this text&amp;rdquo;&lt;/em> gets you a generic summary; a prompt with a role, two clean examples, and a strict output schema gets you something a parser can consume. &lt;strong>Prompt engineering is the discipline of turning that gap into a repeatable system instead of a lucky shot.&lt;/strong>&lt;/p></description></item><item><title>NLP Part 6: GPT and Generative Language Models</title><link>https://www.chenk.top/en/nlp/gpt-generative-models/</link><pubDate>Sun, 26 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/gpt-generative-models/</guid><description>&lt;p>When you ask ChatGPT a question and a fluent multi-paragraph answer streams back token by token, you are watching a single deceptively simple loop: feed everything-so-far into a Transformer decoder, look at the probability distribution it produces over the vocabulary, pick one token, append it, repeat. That is &lt;em>all&lt;/em> an autoregressive language model does. The miracle is not the loop &amp;ndash; it is what happens when you scale the network behind the loop to hundreds of billions of parameters and train it on most of the internet.&lt;/p></description></item><item><title>NLP Part 5: BERT and Pretrained Models</title><link>https://www.chenk.top/en/nlp/bert-pretrained-models/</link><pubDate>Tue, 21 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/bert-pretrained-models/</guid><description>&lt;p>In October 2018, Google released BERT and broke eleven NLP benchmarks at once. The recipe is almost embarrassingly simple: take a Transformer encoder, train it to predict words that have been randomly hidden using both left and right context, and then fine-tune the same pretrained model for whatever downstream task you have. Before BERT, every task came with its own from-scratch model. After BERT, &amp;ldquo;pretrain once, fine-tune everywhere&amp;rdquo; became the default mental model for the entire field.&lt;/p></description></item><item><title>NLP Part 4: Attention Mechanism and Transformer</title><link>https://www.chenk.top/en/nlp/attention-transformer/</link><pubDate>Thu, 16 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/attention-transformer/</guid><description>&lt;p>In June 2017, eight researchers at Google Brain and Google Research published a paper with a deliberately bold title: &lt;em>Attention Is All You Need&lt;/em>. The architecture it introduced, the &lt;strong>Transformer&lt;/strong>, threw away recurrence entirely. There were no LSTMs, no GRUs, no left-to-right scanning of a sentence. Instead, every token in a sequence could look at every other token directly through a single mathematical operation: scaled dot-product attention.&lt;/p>
&lt;p>That one design decision unlocked massive parallelism on GPUs, eliminated the long-range dependency problems that had plagued RNNs for decades, and became the substrate on which BERT, GPT, T5, LLaMA, Claude, and essentially every modern large language model is built. If you understand this article well, the rest of the series is mostly variations on a theme.&lt;/p></description></item><item><title>Prompt Engineering Complete Guide: From Zero to Advanced Optimization</title><link>https://www.chenk.top/en/standalone/prompt-engineering-complete-guide/</link><pubDate>Wed, 15 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/prompt-engineering-complete-guide/</guid><description>&lt;p>The same model, two prompts: one gets 17% accuracy on grade-school math, the other gets 78%. The difference is not magic — it is prompt engineering. This guide shows you the techniques that work, the research behind them, and how to systematically optimize prompts for production.&lt;/p>
&lt;h2 id="what-you-will-learn">What you will learn&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Foundations&lt;/strong> — zero-shot, few-shot, many-shot, task decomposition, and the five-block prompt skeleton.&lt;/li>
&lt;li>&lt;strong>Reasoning techniques&lt;/strong> — Chain-of-Thought, Self-Consistency, Tree of Thoughts, Graph of Thoughts, ReAct.&lt;/li>
&lt;li>&lt;strong>Automation&lt;/strong> — Automatic Prompt Engineering (APE), DSPy, LLMLingua compression.&lt;/li>
&lt;li>&lt;strong>Practical templates&lt;/strong> — structured output, code generation, data extraction, multi-turn chat.&lt;/li>
&lt;li>&lt;strong>Evaluation and debugging&lt;/strong> — metrics, A/B testing, error analysis, the failure-mode toolkit.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Prerequisites.&lt;/strong> Basic Python; experience calling any LLM API. No math background required.&lt;/p></description></item><item><title>NLP Part 3: RNN and Sequence Modeling</title><link>https://www.chenk.top/en/nlp/rnn-sequence-modeling/</link><pubDate>Sat, 11 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/rnn-sequence-modeling/</guid><description>&lt;p>Open Google Translate, swipe-type a message, dictate a memo to your phone — every one of these systems must consume an ordered stream of tokens and produce another. A feed-forward network treats each input independently, but language is fundamentally &lt;strong>sequential&lt;/strong>: the meaning of &amp;ldquo;mat&amp;rdquo; in &lt;em>the cat sat on the mat&lt;/em> depends on every word that came before. Recurrent Neural Networks (RNNs) handle this by maintaining a &lt;strong>hidden state&lt;/strong> that evolves as they consume each token. The hidden state is the network&amp;rsquo;s running summary of the past — its memory.&lt;/p></description></item><item><title>NLP Part 2: Word Embeddings and Language Models</title><link>https://www.chenk.top/en/nlp/word-embeddings-lm/</link><pubDate>Mon, 06 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/word-embeddings-lm/</guid><description>&lt;p>For decades, machines treated &amp;ldquo;king&amp;rdquo; and &amp;ldquo;queen&amp;rdquo; as unrelated symbols &amp;ndash; nothing more than two distinct slots in a vocabulary list. Then a single idea changed everything: what if every word lived in a continuous space, and meaning was just a &lt;em>direction&lt;/em>? Once that idea took hold, models could compute&lt;/p>
$$\vec{\text{king}} - \vec{\text{man}} + \vec{\text{woman}} \approx \vec{\text{queen}}$$&lt;p>and the entire trajectory of NLP turned toward representation learning. This article walks through that turn &amp;ndash; from the failure of one-hot vectors, to Word2Vec&amp;rsquo;s shallow networks, to the global statistics that GloVe exploits, to the subword n-grams that let FastText handle words it has never seen &amp;ndash; and finally connects embeddings to the language models that gave rise to them.&lt;/p></description></item><item><title>NLP Part 1: Introduction and Text Preprocessing</title><link>https://www.chenk.top/en/nlp/introduction-and-preprocessing/</link><pubDate>Wed, 01 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/introduction-and-preprocessing/</guid><description>&lt;p>Every time you ask Claude a question, autocomplete a sentence in Gmail, or read a Google Translate page, you are touching a stack that took seventy years to assemble. Natural Language Processing is the discipline that taught machines to read, score, transform, and write human language &amp;ndash; and the surprising thing is how much of the modern stack still rests on a small set of preprocessing primitives invented decades ago.&lt;/p></description></item><item><title>Reinforcement Learning (12): RLHF and LLM Applications</title><link>https://www.chenk.top/en/reinforcement-learning/12-rlhf-and-llm-applications/</link><pubDate>Thu, 25 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/12-rlhf-and-llm-applications/</guid><description>&lt;p>GPT-3 (June 2020) and ChatGPT (November 2022) share most of their weights. The base model could write fluent prose, complete code, and continue any pattern you gave it — and yet, asked a plain question, it would happily ramble, refuse for the wrong reasons, hallucinate citations, or produce a paragraph of toxicity. The two and a half years between them were not spent on bigger transformers. They were spent learning &lt;strong>how to ask the model to be useful&lt;/strong> — and that turned out to be a reinforcement-learning problem.&lt;/p></description></item><item><title>Low-Rank Matrix Approximation and the Pseudoinverse: From SVD to Regularization</title><link>https://www.chenk.top/en/standalone/low-rank-approximation-pseudoinverse/</link><pubDate>Mon, 22 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/low-rank-approximation-pseudoinverse/</guid><description>&lt;p>Real data matrices are almost never both square and full rank: correlated features, too few samples, and noise-induced ill-conditioning all make &amp;ldquo;matrix inverse&amp;rdquo; either undefined or numerically useless. The &lt;strong>pseudoinverse&lt;/strong> (Moore-Penrose inverse) preserves the &lt;em>spirit&lt;/em> of an inverse while dropping the impossible-to-meet requirements: it redefines the &amp;ldquo;solution&amp;rdquo; of a linear system as the &lt;strong>least-squares solution&lt;/strong>, breaking ties by picking the one with &lt;strong>minimum norm&lt;/strong>. This post derives the pseudoinverse from that least-squares viewpoint, gives the four Penrose conditions, builds it from the SVD, and connects this single object to &lt;strong>the Eckart-Young low-rank approximation theorem&lt;/strong>, &lt;strong>PCA&lt;/strong>, &lt;strong>recommender-system matrix factorization&lt;/strong>, and &lt;strong>LoRA fine-tuning&lt;/strong>.&lt;/p></description></item><item><title>Reinforcement Learning (11): Hierarchical RL and Meta-Learning</title><link>https://www.chenk.top/en/reinforcement-learning/11-hierarchical-and-meta-rl/</link><pubDate>Sat, 20 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/11-hierarchical-and-meta-rl/</guid><description>&lt;p>Standard RL treats every problem as a flat sequence of atomic decisions: observe state, pick an action, receive a reward, repeat. That works when the horizon is short and rewards are dense, but it breaks down on the kind of tasks humans solve effortlessly. &amp;ldquo;Make breakfast&amp;rdquo; is not one decision; it is a tree of subtasks &amp;mdash; &lt;em>brew coffee&lt;/em>, &lt;em>fry eggs&lt;/em>, &lt;em>toast bread&lt;/em>, &lt;em>plate it up&lt;/em> &amp;mdash; each of which is itself a small policy. &lt;strong>Hierarchical RL (HRL)&lt;/strong> lets agents reason and act at multiple timescales by treating macro-actions as first-class citizens.&lt;/p></description></item><item><title>Reinforcement Learning (10): Offline Reinforcement Learning</title><link>https://www.chenk.top/en/reinforcement-learning/10-offline-reinforcement-learning/</link><pubDate>Mon, 15 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/10-offline-reinforcement-learning/</guid><description>&lt;p>Every algorithm we have studied so far has the same loop at its core: act, observe, update. That loop is what makes RL work, but it is also what stops RL from being deployed. A self-driving stack cannot rehearse intersections by crashing into them. A clinical decision-support model cannot run a randomized policy on actual patients. A factory robot cannot try ten thousand grasp variants on a production line.&lt;/p>
&lt;p>What these settings &lt;em>do&lt;/em> have is logs &amp;ndash; millions of hours of human driving, decades of de-identified patient records, terabytes of behavior cloning data. &lt;strong>Offline RL&lt;/strong> (also called &lt;em>batch RL&lt;/em>) is the subfield that asks: can we squeeze a strong policy out of a fixed dataset, with &lt;strong>zero new interaction&lt;/strong> with the environment?&lt;/p></description></item><item><title>Reinforcement Learning (9): Multi-Agent Reinforcement Learning</title><link>https://www.chenk.top/en/reinforcement-learning/09-multi-agent-rl/</link><pubDate>Wed, 10 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/09-multi-agent-rl/</guid><description>&lt;p>Single-agent RL rests on one quiet but enormous assumption: the environment is stationary. The transition kernel does not change while the agent learns. The moment a second learner shares the world, that assumption collapses. Each agent now sees an environment whose dynamics shift as its peers update, rewards become entangled across agents, and the joint action space explodes combinatorially. These are not engineering nuisances. They are the reason multi-agent RL needs its own algorithms instead of just &lt;em>running DQN n times in parallel&lt;/em>.&lt;/p></description></item><item><title>Reinforcement Learning (8): AlphaGo and Monte Carlo Tree Search</title><link>https://www.chenk.top/en/reinforcement-learning/08-alphago-and-mcts/</link><pubDate>Fri, 05 Sep 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/08-alphago-and-mcts/</guid><description>&lt;p>In March 2016, AlphaGo defeated world Go champion Lee Sedol 4–1 in Seoul. The result was not just a sporting upset; it was the moment a 60-year programme in artificial intelligence — beating the world&amp;rsquo;s best at Go — concluded a full decade ahead of most published predictions. Go has roughly $10^{170}$ legal positions, more than the number of atoms in the observable universe. No amount of brute-force search will ever crack it. AlphaGo&amp;rsquo;s victory came from a different idea: let a deep network supply the &lt;em>intuition&lt;/em> about which moves look promising, and let Monte Carlo Tree Search (MCTS) supply the &lt;em>deliberation&lt;/em> that verifies and sharpens that intuition.&lt;/p></description></item><item><title>Reinforcement Learning (7): Imitation Learning and Inverse RL</title><link>https://www.chenk.top/en/reinforcement-learning/07-imitation-learning/</link><pubDate>Sun, 31 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/07-imitation-learning/</guid><description>&lt;p>Every algorithm in the previous chapters assumed access to a reward function. In practice, &lt;em>designing&lt;/em> that reward is often the hardest part of an RL project. Try writing one paragraph that captures &amp;ldquo;drive like a careful human&amp;rdquo;, &amp;ldquo;fold a shirt the way a tailor would&amp;rdquo;, or &amp;ldquo;summarise this document the way an expert editor would&amp;rdquo;. You can &lt;em>show&lt;/em> those behaviours far more easily than you can &lt;em>specify&lt;/em> them.&lt;/p>
&lt;p>Imitation learning takes that intuition seriously: instead of optimising a hand-engineered scalar, it learns from expert demonstrations $\mathcal{D} = \{(s_t, a_t)\}$. This chapter walks the four canonical methods &amp;ndash; behavioral cloning, DAgger, maximum-entropy IRL, and GAIL/AIRL &amp;ndash; not as isolated tricks but as a single ladder where each rung relaxes one assumption and pays for it with new structure.&lt;/p></description></item><item><title>Reinforcement Learning (6): PPO and TRPO -- Trust Region Policy Optimization</title><link>https://www.chenk.top/en/reinforcement-learning/06-ppo-and-trpo/</link><pubDate>Tue, 26 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/06-ppo-and-trpo/</guid><description>&lt;p>Policy gradients (Part 3) optimise the policy directly, sidestepping discrete &lt;code>argmax&lt;/code> operators and naturally handling stochastic strategies. They have one fatal flaw: &lt;strong>a single overlong step can destroy the policy&lt;/strong>, and because the data distribution is &lt;em>coupled&lt;/em> to the policy, recovery is nearly impossible.&lt;/p>
&lt;p>&lt;strong>Trust-region methods&lt;/strong> make this concrete: bound the change in &lt;em>behaviour&lt;/em>, not in parameters, at every update. TRPO does it through a hard KL constraint and a second-order solver. PPO mimics the same effect with one line of clipped arithmetic. The cheaper trick won: PPO trains OpenAI Five, ChatGPT&amp;rsquo;s RLHF stage, almost every modern robotics policy, and remains the workhorse of applied deep RL.&lt;/p></description></item><item><title>Reinforcement Learning (5): Model-Based RL and World Models</title><link>https://www.chenk.top/en/reinforcement-learning/05-model-based-rl-and-world-models/</link><pubDate>Thu, 21 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/05-model-based-rl-and-world-models/</guid><description>&lt;p>Every algorithm we have covered so far &amp;ndash; DQN, REINFORCE, A2C, PPO, SAC &amp;ndash; is &lt;strong>model-free&lt;/strong>: the agent treats the environment as a black box, throws actions at it, and updates its policy from the rewards that come back. The approach works, but it is profligate. DQN needs roughly &lt;strong>10 million frames&lt;/strong> to master Atari Pong. OpenAI Five trained on Dota 2 for the equivalent of &lt;strong>~45,000 years&lt;/strong> of self-play. AlphaStar consumed years of StarCraft for a single agent.&lt;/p></description></item><item><title>Reinforcement Learning (4): Exploration Strategies and Curiosity-Driven Learning</title><link>https://www.chenk.top/en/reinforcement-learning/04-exploration-and-curiosity-driven-learning/</link><pubDate>Sat, 16 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/04-exploration-and-curiosity-driven-learning/</guid><description>&lt;p>Drop a fresh agent into Montezuma&amp;rsquo;s Revenge. To score a single point it must walk to the right, jump a skull, climb a rope, leap to a platform, and grab a key &amp;ndash; roughly &lt;strong>a hundred precise actions in a row&lt;/strong>. Until that key is collected, every reward signal is exactly zero.&lt;/p>
&lt;p>A textbook DQN with $\varepsilon=0.1$ exploration has, by a generous estimate, a $0.1^{100} \approx 10^{-100}$ chance of stumbling onto that key by accident. Unsurprisingly, vanilla DQN scores &lt;strong>0&lt;/strong> on this game. Not &amp;ldquo;low&amp;rdquo; &amp;ndash; literally zero, every episode, for the entire training run.&lt;/p></description></item><item><title>Reinforcement Learning (3): Policy Gradient and Actor-Critic Methods</title><link>https://www.chenk.top/en/reinforcement-learning/03-policy-gradient-and-actor-critic/</link><pubDate>Mon, 11 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/03-policy-gradient-and-actor-critic/</guid><description>&lt;p>DQN proved that deep RL can master Atari, but it has a hard ceiling: it only works in &lt;strong>discrete action spaces&lt;/strong>. Ask it to control a robot arm with seven continuous joint angles and it falls apart &amp;ndash; you would have to solve an inner optimisation problem every time you choose an action.&lt;/p>
&lt;p>&lt;strong>Policy gradient methods&lt;/strong> take a fundamentally different route. Instead of learning a value function and &lt;em>deriving&lt;/em> a policy from it, they &lt;strong>directly optimise the policy&lt;/strong>. That single change opens the door to continuous actions, stochastic strategies, and problems where the optimal play is itself random (think rock-paper-scissors).&lt;/p></description></item><item><title>Reinforcement Learning (2): Q-Learning and Deep Q-Networks (DQN)</title><link>https://www.chenk.top/en/reinforcement-learning/02-q-learning-and-dqn/</link><pubDate>Wed, 06 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/02-q-learning-and-dqn/</guid><description>&lt;p>In December 2013, a small DeepMind team uploaded a paper to arXiv with a striking claim: a single neural network, trained from raw pixels and the score, learned to play seven Atari games &amp;ndash; and beat the previous best on six of them. No game-specific features. No hand-coded heuristics. The same architecture for Pong, Breakout, and Space Invaders. The algorithm was &lt;strong>Deep Q-Network (DQN)&lt;/strong>, and it kicked off the deep reinforcement learning era.&lt;/p></description></item><item><title>Reinforcement Learning (1): Fundamentals and Core Concepts</title><link>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</link><pubDate>Fri, 01 Aug 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/reinforcement-learning/01-fundamentals-and-core-concepts/</guid><description>&lt;p>The first time you sat on a bicycle, nobody handed you a manual that said &lt;em>&amp;ldquo;if your tilt angle exceeds 7.4 degrees, apply 12% counter-steer.&amp;rdquo;&lt;/em> You wobbled, you over-corrected, you fell, you got back on. After a few hundred attempts your body simply &lt;em>knew&lt;/em> what to do, even though you could not put it into words.&lt;/p>
&lt;p>That trial-feedback-improvement loop is not just how we learn to ride bikes. It is how AlphaGo learned to defeat the world Go champion, how Boston Dynamics robots learn to walk, and how recommendation systems quietly improve every time you click. They all share one mathematical framework called &lt;strong>reinforcement learning&lt;/strong> (RL).&lt;/p></description></item><item><title>Reparameterization Trick &amp; Gumbel-Softmax: A Deep Dive</title><link>https://www.chenk.top/en/standalone/reparameterization-gumbel-softmax/</link><pubDate>Thu, 24 Jul 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/reparameterization-gumbel-softmax/</guid><description>&lt;p>The moment your model contains a sampling step, training hits a hard wall: &lt;strong>how do gradients flow through a random node?&lt;/strong>&lt;/p>
&lt;p>The reparameterization trick has a clean answer — rewrite $z\sim p_\theta(z)$ as $z=g_\theta(\epsilon)$, isolating the randomness in a parameter-free noise variable $\epsilon$, so backprop can flow through $g_\theta$. The trouble starts with discrete variables: operations like $\arg\max$ are not differentiable. &lt;strong>Gumbel-Softmax&lt;/strong> (a.k.a. the Concrete distribution) replaces the discrete sample with a tempered softmax over Gumbel-perturbed logits, giving you a smooth, differentiable surrogate that you can train end-to-end.&lt;/p></description></item><item><title>Transfer Learning (12): Industrial Applications and Best Practices</title><link>https://www.chenk.top/en/transfer-learning/12-industrial-applications-and-best-practices/</link><pubDate>Sun, 06 Jul 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/12-industrial-applications-and-best-practices/</guid><description>&lt;p>This is the final part of the series. The previous eleven parts gave you the mechanics &amp;ndash; pretraining, fine-tuning, domain adaptation, few-shot and zero-shot learning, distillation, multi-task learning, multimodality, parameter-efficient methods, continual learning, and cross-lingual transfer. This part is about the work that happens once the notebook closes: deciding &lt;strong>whether&lt;/strong> to use transfer learning, &lt;strong>how&lt;/strong> to thread it into a production pipeline, and &lt;strong>how&lt;/strong> to know it is still working six months later.&lt;/p></description></item><item><title>Transfer Learning (11): Cross-Lingual Transfer</title><link>https://www.chenk.top/en/transfer-learning/11-cross-lingual-transfer/</link><pubDate>Mon, 30 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/11-cross-lingual-transfer/</guid><description>&lt;p>English has the labels. The world has 7,000+ languages. Cross-lingual transfer is what lets a sentiment classifier trained only on English IMDB reviews score Spanish tweets, what makes a question-answering model fine-tuned on SQuAD answer Hindi questions, and what allows a model that has never seen a single labeled Swahili sentence to do passable Swahili NER.&lt;/p>
&lt;p>This post derives why that is even possible. We start from the bilingual-embedding alignment that motivated the field, walk through the multilingual pretraining recipe (mBERT, XLM-R) that made parallel data optional, and end with the practical playbook &amp;ndash; zero-shot vs translate-train vs translate-test, when to pick which, and where the wheels come off.&lt;/p></description></item><item><title>Transfer Learning (10): Continual Learning</title><link>https://www.chenk.top/en/transfer-learning/10-continual-learning/</link><pubDate>Tue, 24 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/10-continual-learning/</guid><description>&lt;p>You can teach yourself to play guitar this year and you will still remember how to ride a bike. A neural network cannot. Fine-tune a vision model on CIFAR-10 then on SVHN, evaluate it on CIFAR-10 again, and accuracy collapses to barely above chance. The phenomenon is called &lt;strong>catastrophic forgetting&lt;/strong>, and overcoming it is the central problem of &lt;strong>continual learning (CL)&lt;/strong>: a learner that absorbs a stream of tasks $\mathcal{T}_1, \mathcal{T}_2, \ldots$ without re-accessing past data and without losing what it already knew.&lt;/p></description></item><item><title>LLM Workflows and Application Architecture: Enterprise Implementation Guide</title><link>https://www.chenk.top/en/standalone/llm-workflows-architecture/</link><pubDate>Sat, 21 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/llm-workflows-architecture/</guid><description>&lt;p>Most LLM tutorials end where the interesting work begins. They show you how to call a chat completion endpoint, attach a vector store, and wrap the whole thing in a Streamlit demo. None of that is wrong, but none of it is what breaks at 3 a.m. when 10,000 users hit your service at once and every other answer is a hallucination.&lt;/p>
&lt;p>This article is about everything that comes after the demo. It is opinionated on purpose: production LLM systems are mostly plain distributed systems with one non-deterministic component bolted on, and most of the engineering effort goes into containing that non-determinism. We will work through seven dimensions — application architecture, workflow patterns, the RAG-vs-fine-tune decision, deployment topology, cost, observability, and enterprise integration — keeping each one short, concrete, and grounded in the levers that actually move the needle.&lt;/p></description></item><item><title>Symplectic Geometry and Structure-Preserving Neural Networks</title><link>https://www.chenk.top/en/standalone/symplectic-geometry-and-structure-preserving-neural-networks/</link><pubDate>Sat, 21 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/symplectic-geometry-and-structure-preserving-neural-networks/</guid><description>&lt;p>Train a vanilla feedforward network to predict a one-dimensional harmonic oscillator. Validate it on the next ten time steps &amp;ndash; the error is fine. Now roll it out for a thousand steps. The orbit no longer closes, the energy creeps upward, and what should be a periodic motion turns into a slow spiral. The network learned to fit data points; it never learned the &lt;em>physics&lt;/em>. Structure-preserving networks fix this by baking geometric invariants &amp;ndash; energy conservation, the symplectic 2-form, the Euler-Lagrange equations &amp;ndash; directly into the architecture, so the learned model cannot violate them no matter how long you integrate.&lt;/p></description></item><item><title>Transfer Learning (9): Parameter-Efficient Fine-Tuning</title><link>https://www.chenk.top/en/transfer-learning/09-parameter-efficient-fine-tuning/</link><pubDate>Wed, 18 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/09-parameter-efficient-fine-tuning/</guid><description>&lt;p>How do you fine-tune a 175B-parameter model on a single GPU? Update only 0.1% of the parameters. Parameter-Efficient Fine-Tuning (PEFT) makes this possible &amp;ndash; and on most benchmarks it matches full fine-tuning. This post derives the math behind LoRA, Adapter, Prefix-Tuning, Prompt-Tuning, BitFit and QLoRA, and gives you a single picture for choosing among them.&lt;/p>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Why the low-rank assumption holds for weight updates&lt;/li>
&lt;li>LoRA: derivation, initialization, scaling, and weight merging&lt;/li>
&lt;li>Adapter: bottleneck architecture and where to insert it&lt;/li>
&lt;li>Prefix-Tuning vs Prompt-Tuning vs P-Tuning v2&lt;/li>
&lt;li>QLoRA: how 4-bit quantisation gets a 65B model on one GPU&lt;/li>
&lt;li>Method comparison and a selection guide grounded in GLUE numbers&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Transformer architecture (attention, FFN, residual + LayerNorm)&lt;/li>
&lt;li>Matrix decomposition basics (rank, SVD)&lt;/li>
&lt;li>Transfer learning fundamentals (Parts 1-6)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="the-full-fine-tuning-problem">The Full Fine-Tuning Problem&lt;/h2>
&lt;p>Full fine-tuning updates every parameter $\boldsymbol{\theta}$:&lt;/p></description></item><item><title>Transfer Learning (8): Multimodal Transfer</title><link>https://www.chenk.top/en/transfer-learning/08-multimodal-transfer/</link><pubDate>Thu, 12 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/08-multimodal-transfer/</guid><description>&lt;p>How can a model classify an image of a Burmese cat correctly without ever having seen a label &amp;ldquo;Burmese cat&amp;rdquo;? Traditional supervised learning needs millions of labeled examples per class. CLIP, released by OpenAI in 2021, sidesteps that constraint entirely: it learns to put images and natural-language descriptions into the same vector space, and then &amp;ldquo;classification&amp;rdquo; reduces to picking which sentence — out of any candidate sentences you write down — sits closest to the image.&lt;/p></description></item><item><title>Transfer Learning (7): Zero-Shot Learning</title><link>https://www.chenk.top/en/transfer-learning/07-zero-shot-learning/</link><pubDate>Fri, 06 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/07-zero-shot-learning/</guid><description>&lt;p>You have never seen a zebra. I tell you it looks like a horse painted with black and white stripes, and the next time one walks into the zoo you recognise it instantly. No labelled examples, no fine-tuning — only a &lt;em>semantic bridge&lt;/em> between what you know (horses, stripes) and what you don&amp;rsquo;t (this new species).&lt;/p>
&lt;p>&lt;strong>Zero-shot learning (ZSL)&lt;/strong> is the machine-learning version of that trick. Train on a set of &lt;em>seen&lt;/em> classes for which you have labelled images. At test time, classify into a &lt;em>disjoint&lt;/em> set of &lt;em>unseen&lt;/em> classes that you have &lt;em>never&lt;/em> shown the model — using only a description of what those classes are: a list of attributes, a word embedding of the class name, a sentence, or an image-text contrastive prompt. The model&amp;rsquo;s only handle on the unseen classes is the geometry it has learned in a shared visual–semantic space.&lt;/p></description></item><item><title>Transfer Learning (6): Multi-Task Learning</title><link>https://www.chenk.top/en/transfer-learning/06-multi-task-learning/</link><pubDate>Sat, 31 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/06-multi-task-learning/</guid><description>&lt;p>A self-driving car looking through a single camera needs to do three things at once: detect cars and pedestrians, segment lanes and free space, and estimate how far away each pixel is. You could train three separate networks. You would burn 3x the parameters, run 3x the forward passes at inference, and ignore the obvious fact that all three tasks need the same kind of low-level features (edges, surfaces, occlusion cues).&lt;/p></description></item><item><title>Transfer Learning (5): Knowledge Distillation</title><link>https://www.chenk.top/en/transfer-learning/05-knowledge-distillation/</link><pubDate>Sun, 25 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/05-knowledge-distillation/</guid><description>&lt;p>You have a 340M-parameter BERT model that hits 95% accuracy. The product team wants it on a phone that can barely fit 10M parameters. Training a 10M model from scratch lands at 85%. Knowledge distillation closes most of the gap: train the small model on the &lt;em>output distribution&lt;/em> of the large one, not just on the labels, and you can reach 92%.&lt;/p>
&lt;p>The key insight, due to Hinton, is that a teacher&amp;rsquo;s &amp;ldquo;wrong&amp;rdquo; predictions are not noise &amp;ndash; they are information. When the teacher classifies a cat image and assigns 0.14 to &amp;ldquo;tiger&amp;rdquo;, 0.07 to &amp;ldquo;dog&amp;rdquo;, and 0.008 to &amp;ldquo;plane&amp;rdquo;, it is telling you that cats look a lot like tigers, somewhat like dogs, and nothing like aeroplanes. That structure &amp;ndash; &lt;strong>dark knowledge&lt;/strong> &amp;ndash; is invisible in a one-hot label, and learning it is what lets the student punch above its weight.&lt;/p></description></item><item><title>Transfer Learning (4): Few-Shot Learning</title><link>https://www.chenk.top/en/transfer-learning/04-few-shot-learning/</link><pubDate>Mon, 19 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/04-few-shot-learning/</guid><description>&lt;p>Show a child one photograph of a pangolin and they will spot pangolins for life. Show a deep learning model one photograph and it will give you a uniformly random guess. Few-shot learning is the field that closes that gap: building classifiers that work with only one to ten labeled examples per class.&lt;/p>
&lt;p>The trick is not to memorize individual classes harder. It is to learn &lt;em>how to learn&lt;/em> from very few examples, then carry that ability over to brand-new classes at test time. This article covers the two families that dominate the field today: &lt;strong>metric learning&lt;/strong>, which learns a good distance function, and &lt;strong>meta-learning&lt;/strong>, which learns a good initialization.&lt;/p></description></item><item><title>Transfer Learning (3): Domain Adaptation</title><link>https://www.chenk.top/en/transfer-learning/03-domain-adaptation/</link><pubDate>Tue, 13 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/03-domain-adaptation/</guid><description>&lt;p>Your autonomous-driving stack works perfectly on sunny California freeways. Then it rains in Seattle. Top-1 accuracy drops from 95% to 70%. The model did not get worse — the &lt;em>data distribution shifted&lt;/em>, and your training set never told it what wet asphalt looks like at dusk.&lt;/p>
&lt;p>This is the everyday problem of &lt;strong>domain adaptation&lt;/strong>: you have abundant labelled data in one distribution (the &lt;em>source&lt;/em>) and unlabelled data in another (the &lt;em>target&lt;/em>), and you need the model to perform on the target. This article shows you how, from first-principles theory to a working DANN implementation.&lt;/p></description></item><item><title>Transfer Learning (2): Pre-training and Fine-tuning</title><link>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</link><pubDate>Wed, 07 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</guid><description>&lt;p>BERT changed NLP overnight. A model pre-trained on Wikipedia and BookCorpus could be fine-tuned on a few thousand labelled examples and beat task-specific architectures that researchers had spent years hand-crafting. The same pattern repeated in vision (ImageNet pre-training, then SimCLR, MAE), in speech (wav2vec 2.0), and in code (Codex). Today, &amp;ldquo;pre-train once, fine-tune everywhere&amp;rdquo; is the default recipe of modern deep learning.&lt;/p>
&lt;p>But &lt;em>why&lt;/em> does pre-training work? When should you freeze layers, when should you LoRA, and how small does your learning rate need to be? This article unpacks both the theory and the engineering practice behind the most successful transfer paradigm we have.&lt;/p></description></item><item><title>Transfer Learning (1): Fundamentals and Core Concepts</title><link>https://www.chenk.top/en/transfer-learning/01-fundamentals-and-core-concepts/</link><pubDate>Thu, 01 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/01-fundamentals-and-core-concepts/</guid><description>&lt;p>You spent two weeks training an ImageNet classifier on a rack of GPUs. On Monday morning your team lead asks for a chest-X-ray pneumonia model &amp;ndash; and the entire labelled dataset is &lt;strong>two hundred images&lt;/strong>. Do you book another two weeks of GPU time and start from scratch?&lt;/p>
&lt;p>Of course not. You take what the ImageNet model already knows about edges, textures and shapes, swap out the last layer, and fine-tune on the X-rays. Two hours later you have a model that beats anything you could have trained from random weights with so little data. That is &lt;strong>transfer learning&lt;/strong>, and it is the reason most real-world deep-learning projects ship in days instead of months.&lt;/p></description></item><item><title>Essence of Linear Algebra (18): Frontiers and Summary</title><link>https://www.chenk.top/en/linear-algebra/18-frontiers-and-summary/</link><pubDate>Wed, 30 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/18-frontiers-and-summary/</guid><description>&lt;p>We have walked the long road of linear algebra together. We started with arrows in the plane and ended at the gates of quantum computers, the inner workings of large language models, and the topology of data clouds. The remarkable thing &amp;ndash; the thing this series has tried to make visible &amp;ndash; is that the same handful of ideas keeps coming back. A vector is a state. A matrix is a transformation. A decomposition is the structure hiding inside the transformation. A norm tells you when you can trust your computation. Once you internalise that loop, every &amp;ldquo;frontier&amp;rdquo; looks less like a foreign country and more like another dialect of a language you already speak.&lt;/p></description></item><item><title>Essence of Linear Algebra (17): Linear Algebra in Computer Vision</title><link>https://www.chenk.top/en/linear-algebra/17-linear-algebra-in-computer-vision/</link><pubDate>Wed, 23 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/17-linear-algebra-in-computer-vision/</guid><description>&lt;p>Computer vision is the science of teaching machines to see. What is striking is how thoroughly the whole field reduces to linear algebra: an image is a matrix, a geometric transformation is a matrix product, a camera is a $3 \times 4$ projection matrix, two-view geometry is the equation $\mathbf{x}_2^\top \mathbf{F}\, \mathbf{x}_1 = 0$, and 3D reconstruction is a sparse linear least-squares problem. Once you see the field through that lens, what once looked like a zoo of algorithms turns out to be a small set of linear-algebraic ideas applied repeatedly.&lt;/p></description></item><item><title>Essence of Linear Algebra (16): Linear Algebra in Deep Learning</title><link>https://www.chenk.top/en/linear-algebra/16-linear-algebra-in-deep-learning/</link><pubDate>Wed, 16 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/16-linear-algebra-in-deep-learning/</guid><description>&lt;p>Strip away the marketing and a deep network is one thing: a long pipeline of matrix multiplications glued together by elementwise nonlinearities. Forward pass, backward pass, convolution, attention, normalization, fine-tuning &amp;ndash; every &amp;ldquo;trick&amp;rdquo; is a small twist on the same algebraic theme. Once you see the matrices, the field stops looking like a bag of recipes and starts looking like a single language.&lt;/p>
&lt;p>This chapter rebuilds the modern stack from that single language. We follow one signal &amp;ndash; a vector $\mathbf{x}$ &amp;ndash; as it flows through linear layers, gets convolved, gets attended to, gets normalized, and gets adapted by a low-rank update. At each step we name the matrix that does the work and the property of that matrix (rank, conditioning, transpose) that makes the trick succeed.&lt;/p></description></item><item><title>Essence of Linear Algebra (15): Linear Algebra in Machine Learning</title><link>https://www.chenk.top/en/linear-algebra/15-linear-algebra-in-machine-learning/</link><pubDate>Wed, 09 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/15-linear-algebra-in-machine-learning/</guid><description>&lt;p>Ask any senior ML engineer &amp;ldquo;what math do you actually use day to day?&amp;rdquo; and the answer is almost always &lt;strong>linear algebra&lt;/strong>. Calculus shows up in derivations; probability shows up in modeling; but the runtime of a real ML system is dominated by matrix-vector multiplies, decompositions, and projections. PyTorch&amp;rsquo;s &lt;code>Linear&lt;/code>, scikit-learn&amp;rsquo;s &lt;code>PCA&lt;/code>, Spark MLlib&amp;rsquo;s &lt;code>ALS&lt;/code>, and a Transformer&amp;rsquo;s attention head are all the same primitive in different costumes.&lt;/p>
&lt;p>This chapter walks through the algorithms that production ML systems actually run &amp;ndash; PCA, LDA, SVM with kernels, matrix factorization for recommenders, regularized linear regression, neural network layers, attention &amp;ndash; and shows the linear algebra that makes each of them tick. We focus on intuition first, geometry second, formulas third.&lt;/p></description></item><item><title>Essence of Linear Algebra (14): Random Matrix Theory</title><link>https://www.chenk.top/en/linear-algebra/14-random-matrix-theory/</link><pubDate>Wed, 02 Apr 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/14-random-matrix-theory/</guid><description>&lt;p>A million i.i.d. coin flips, arranged into a thousand-by-thousand symmetric matrix, somehow produce eigenvalues that fill a perfect semicircle. A noisy sample covariance matrix that should be the identity instead spreads its eigenvalues across an interval whose width you can predict before seeing a single number. The largest eigenvalue of a Wigner matrix has a tail distribution that turns up everywhere &amp;ndash; in growing crystals, in the longest increasing subsequence of a random permutation, in the energy levels of heavy nuclei. &lt;strong>Random matrix theory&lt;/strong> (RMT) is the study of why these regularities appear, and how to use them.&lt;/p></description></item><item><title>Prefix-Tuning: Optimizing Continuous Prompts for Generation</title><link>https://www.chenk.top/en/standalone/prefix-tuning/</link><pubDate>Mon, 31 Mar 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/prefix-tuning/</guid><description>&lt;p>Fine-tuning a 1.5B-parameter GPT-2 model for each downstream task means saving a fresh 1.5B-parameter checkpoint every time. Across a dozen tasks that is a substantial storage and serving headache, and it makes sharing a single base model essentially impossible. &lt;em>Prefix-Tuning&lt;/em> (Li &amp;amp; Liang, 2021) takes the opposite stance: freeze every weight of the language model, and learn a tiny block of continuous vectors — the &lt;em>prefix&lt;/em> — that is fed into the attention layers as if it were context the model already attended to. The model never changes; only the prefix does, and a different prefix produces a different &amp;ldquo;personality&amp;rdquo; on demand.&lt;/p></description></item><item><title>Essence of Linear Algebra (13): Tensors and Multilinear Algebra</title><link>https://www.chenk.top/en/linear-algebra/13-tensors-and-multilinear-algebra/</link><pubDate>Wed, 26 Mar 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/13-tensors-and-multilinear-algebra/</guid><description>&lt;p>If you&amp;rsquo;ve used PyTorch or TensorFlow, you&amp;rsquo;ve met the word &amp;ldquo;tensor&amp;rdquo; hundreds of times. PyTorch calls every array &lt;code>torch.Tensor&lt;/code>; TensorFlow puts it in the product name. But what &lt;em>is&lt;/em> a tensor, and why did frameworks borrow this physics-flavored word for what looks like a multi-dimensional array?&lt;/p>
&lt;p>The short answer of this chapter:&lt;/p>
&lt;blockquote>
&lt;p>A tensor is the natural generalization of a scalar, vector, and matrix to &lt;strong>arbitrary&lt;/strong> dimensions. Everything you know about matrices either lifts cleanly to tensors, or breaks in instructive ways.&lt;/p></description></item><item><title>Sparse Matrices and Compressed Sensing -- Less Is More</title><link>https://www.chenk.top/en/linear-algebra/12-sparse-matrices-and-compressed-sensing/</link><pubDate>Wed, 19 Mar 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/12-sparse-matrices-and-compressed-sensing/</guid><description>&lt;h2 id="the-less-is-more-miracle">The &amp;ldquo;Less Is More&amp;rdquo; Miracle&lt;/h2>
&lt;p>A raw 24-megapixel photograph weighs in at roughly 70 MB. JPEG compresses it to a few hundred kilobytes &amp;ndash; a 100$\times$reduction &amp;ndash; and you cannot tell the difference. A traditional MRI scan takes thirty minutes; a modern compressed sensing MRI gets the same image in five.&lt;/p>
&lt;p>Both miracles run on the same engine: &lt;strong>sparsity&lt;/strong>. Most natural signals, written in the right basis, have only a handful of meaningful coefficients. Everything else is essentially zero.&lt;/p></description></item><item><title>Matrix Calculus and Optimization -- The Engine Behind Machine Learning</title><link>https://www.chenk.top/en/linear-algebra/11-matrix-calculus-and-optimization/</link><pubDate>Wed, 12 Mar 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/11-matrix-calculus-and-optimization/</guid><description>&lt;h2 id="from-shower-knobs-to-neural-networks">From Shower Knobs to Neural Networks&lt;/h2>
&lt;p>Every morning you train a tiny neural network. The water comes out too cold, so you nudge the knob &amp;ndash; a &lt;em>parameter&lt;/em> &amp;ndash; in some direction. A second later you observe a new temperature &amp;ndash; the &lt;em>error signal&lt;/em> &amp;ndash; and nudge again. After three or four iterations you have converged.&lt;/p>
&lt;p>Modern deep learning is the same loop, scaled up by seven orders of magnitude. The &amp;ldquo;knob&amp;rdquo; is a matrix$W$with hundreds of millions of entries. The &amp;ldquo;error&amp;rdquo; is a scalar loss$L$. And the question is the same: &lt;strong>for each parameter, in which direction should I push, and by how much?&lt;/strong> The answer lives in a single object: the gradient$\partial L / \partial W$.&lt;/p></description></item><item><title>Matrix Norms and Condition Numbers -- Is Your Linear System Healthy?</title><link>https://www.chenk.top/en/linear-algebra/10-matrix-norms-and-condition-numbers/</link><pubDate>Wed, 05 Mar 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/10-matrix-norms-and-condition-numbers/</guid><description>&lt;h2 id="the-question-that-haunts-engineers">The Question That Haunts Engineers&lt;/h2>
&lt;p>The equations are right. The algorithm is right. So why is the computed answer completely wrong?&lt;/p>
&lt;p>The culprit is usually a single number called the &lt;strong>condition number&lt;/strong>. It measures how &lt;em>sensitive&lt;/em> a linear system is — whether a tiny wobble in the input gets amplified into a catastrophic error in the output. To talk about condition numbers we first need a way to measure the &amp;ldquo;size&amp;rdquo; of vectors and matrices. That is what norms do.&lt;/p></description></item><item><title>Singular Value Decomposition -- The Crown Jewel of Linear Algebra</title><link>https://www.chenk.top/en/linear-algebra/09-singular-value-decomposition/</link><pubDate>Wed, 26 Feb 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/09-singular-value-decomposition/</guid><description>&lt;h2 id="why-svd-earns-the-crown">Why SVD Earns the Crown&lt;/h2>
&lt;p>The spectral theorem of &lt;a href="https://www.chenk.top/en/chapter-08-symmetric-matrices-and-quadratic-forms/">Chapter 8&lt;/a>
 gave us $A = Q\Lambda Q^T$ &amp;ndash; a beautifully clean factorisation, but &lt;strong>only for symmetric matrices&lt;/strong>. Most matrices that show up in practice are not symmetric, and many are not even square:&lt;/p>
&lt;ul>
&lt;li>a photograph stored as a $1920 \times 1080$ pixel matrix,&lt;/li>
&lt;li>a Netflix-style user&amp;ndash;movie rating matrix (millions of rows, thousands of columns),&lt;/li>
&lt;li>a document&amp;ndash;term matrix in NLP (documents by vocabulary),&lt;/li>
&lt;li>a gene-expression matrix in bioinformatics.&lt;/li>
&lt;/ul>
$$A = U\,\Sigma\,V^{\!\top}.$$&lt;p>
This is the most powerful, most universally applicable decomposition in all of linear algebra.&lt;/p></description></item><item><title>Symmetric Matrices and Quadratic Forms -- The Best Matrices in Town</title><link>https://www.chenk.top/en/linear-algebra/08-symmetric-matrices-and-quadratic-forms/</link><pubDate>Wed, 19 Feb 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/08-symmetric-matrices-and-quadratic-forms/</guid><description>&lt;h2 id="why-symmetric-matrices-are-the-best">Why Symmetric Matrices Are the &amp;ldquo;Best&amp;rdquo;&lt;/h2>
&lt;p>Of all the matrices you will ever meet, &lt;strong>symmetric matrices&lt;/strong> are the most well-behaved. They have:&lt;/p>
&lt;ul>
&lt;li>only &lt;strong>real&lt;/strong> eigenvalues,&lt;/li>
&lt;li>a complete set of &lt;strong>orthogonal&lt;/strong> eigenvectors,&lt;/li>
&lt;li>and a &lt;strong>perfect diagonalization&lt;/strong> $A = Q\Lambda Q^T$ that costs nothing to invert.&lt;/li>
&lt;/ul>
&lt;p>This is not a curiosity. Almost every important matrix you actually compute with in physics, optimization, statistics, or machine learning is symmetric:&lt;/p>
&lt;ul>
&lt;li>A &lt;strong>covariance matrix&lt;/strong> $\Sigma = \tfrac{1}{n}X^TX$ records how features vary together. It is symmetric by construction.&lt;/li>
&lt;li>A &lt;strong>Hessian matrix&lt;/strong> $H_{ij} = \partial^2 f / \partial x_i \partial x_j$ records second derivatives. By Clairaut&amp;rsquo;s theorem, mixed partials commute, so $H$ is symmetric.&lt;/li>
&lt;li>A &lt;strong>stiffness matrix&lt;/strong> $K$ encodes how connected springs push on each other. Newton&amp;rsquo;s third law forces $K = K^T$.&lt;/li>
&lt;li>A &lt;strong>kernel&lt;/strong> or &lt;strong>Gram matrix&lt;/strong> $G_{ij} = \langle x_i, x_j \rangle$ measures pairwise similarity. Inner products are symmetric, so $G$ is too.&lt;/li>
&lt;/ul>
&lt;p>This chapter explains why symmetry buys you so much, and how the geometry of &lt;strong>quadratic forms&lt;/strong> lets you read off the behaviour of a symmetric matrix at a glance.&lt;/p></description></item><item><title>Orthogonality and Projections -- When Vectors Mind Their Own Business</title><link>https://www.chenk.top/en/linear-algebra/07-orthogonality-and-projections/</link><pubDate>Wed, 12 Feb 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/07-orthogonality-and-projections/</guid><description>&lt;h2 id="why-orthogonality-matters">Why Orthogonality Matters&lt;/h2>
&lt;p>Two vectors are &lt;strong>orthogonal&lt;/strong> when they &amp;ldquo;do not interfere&amp;rdquo; with one another. That single idea &amp;ndash; one direction tells you nothing about the other &amp;ndash; powers GPS positioning, noise-canceling headphones, JPEG compression, recommendation systems, and most of numerical linear algebra.&lt;/p>
&lt;p>Orthogonality is the single biggest computational shortcut in linear algebra. With a generic basis, finding coordinates is solving a linear system. With an &lt;strong>orthogonal&lt;/strong> basis, finding coordinates is one dot product per axis. Hard problem, easy problem, same problem &amp;ndash; just a better basis.&lt;/p></description></item><item><title>Eigenvalues and Eigenvectors</title><link>https://www.chenk.top/en/linear-algebra/06-eigenvalues-and-eigenvectors/</link><pubDate>Wed, 05 Feb 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/06-eigenvalues-and-eigenvectors/</guid><description>&lt;h2 id="the-big-question">The Big Question&lt;/h2>
&lt;p>Apply a matrix to a vector and almost anything can happen. Most vectors get rotated &lt;em>and&lt;/em> stretched, landing in a brand new direction. But scattered among them are a few special vectors that refuse to leave their span. They come out of the transformation pointing exactly the way they went in &amp;ndash; only longer, shorter, or flipped.&lt;/p>
&lt;p>These survivors are &lt;strong>eigenvectors&lt;/strong>. The factor by which they get scaled is the &lt;strong>eigenvalue&lt;/strong>.&lt;/p></description></item><item><title>Linear Systems and Column Space</title><link>https://www.chenk.top/en/linear-algebra/05-linear-systems-and-column-space/</link><pubDate>Wed, 29 Jan 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/05-linear-systems-and-column-space/</guid><description>&lt;h2 id="the-central-question">The Central Question&lt;/h2>
&lt;p>Almost everything in applied mathematics eventually lands on the same question:&lt;/p>
&lt;blockquote>
&lt;p>Given a matrix $A$ and a vector $\vec{b}$, does the equation $A\vec{x} = \vec{b}$ have a solution? If so, how many?&lt;/p>
&lt;/blockquote>
&lt;p>The mechanical answer is &amp;ldquo;row-reduce and look.&amp;rdquo; The &lt;em>structural&lt;/em> answer is far more interesting &amp;ndash; and it is the goal of this chapter. Three geometric objects tell you everything:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Column space&lt;/strong> $C(A)$ &amp;ndash; the set of vectors $A$ can reach. It decides &lt;strong>whether&lt;/strong> a solution exists.&lt;/li>
&lt;li>&lt;strong>Null space&lt;/strong> $N(A)$ &amp;ndash; the set of vectors $A$ crushes to zero. It decides &lt;strong>how many&lt;/strong> solutions exist.&lt;/li>
&lt;li>&lt;strong>Rank&lt;/strong> $r$ &amp;ndash; the dimension of the column space. It quantifies how much information $A$ preserves.&lt;/li>
&lt;/ul>
&lt;p>Once these three are clear, every linear-systems result &amp;ndash; existence, uniqueness, least squares, the four fundamental subspaces &amp;ndash; becomes the same story told from different angles.&lt;/p></description></item><item><title>The Secrets of Determinants</title><link>https://www.chenk.top/en/linear-algebra/04-the-secrets-of-determinants/</link><pubDate>Wed, 22 Jan 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/04-the-secrets-of-determinants/</guid><description>&lt;h2 id="beyond-the-formula">Beyond the Formula&lt;/h2>
&lt;p>In most classrooms, determinants are introduced as a formula to memorize:&lt;/p>
$$\det\begin{pmatrix}a &amp; b\\ c &amp; d\end{pmatrix} = ad - bc$$&lt;p>You plug in numbers, compute, and move on. That misses the point entirely.&lt;/p>
&lt;p>Here is the real meaning, in one sentence:&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>The determinant of $A$ is the factor by which $A$ scales area (in 2D) or volume (in 3D).&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;p>Once you internalize this, every property of determinants stops being a rule to memorize and starts being something you can &lt;em>see&lt;/em>. The product rule $\det(AB) = \det(A)\det(B)$ becomes obvious &amp;ndash; two scalings compose multiplicatively. $\det(A) = 0$ means space gets crushed flat. $\det(A^{-1}) = 1/\det(A)$ says the inverse must undo the scaling. The sign of the determinant tells you whether orientation was preserved or flipped.&lt;/p></description></item><item><title>Matrices as Linear Transformations</title><link>https://www.chenk.top/en/linear-algebra/03-matrices-as-linear-transformations/</link><pubDate>Wed, 15 Jan 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/03-matrices-as-linear-transformations/</guid><description>&lt;h2 id="the-big-idea">The Big Idea&lt;/h2>
&lt;p>Open a traditional textbook and matrices show up as &amp;ldquo;rectangular arrays of numbers.&amp;rdquo; You learn rules for adding and multiplying them, but no one explains &lt;em>why&lt;/em> the multiplication rule looks the way it does, or why $AB \neq BA$ in general.&lt;/p>
&lt;p>Here is the secret the symbol-pushing version hides: &lt;strong>a matrix is a function that transforms space.&lt;/strong> Every $m \times n$ matrix is a machine that eats an $n$-dimensional vector and spits out an $m$-dimensional one. Once you can &lt;em>see&lt;/em> that, the strange rules stop being strange. They are simply the bookkeeping for what happens to the basis vectors.&lt;/p></description></item><item><title>Linear Combinations and Vector Spaces</title><link>https://www.chenk.top/en/linear-algebra/02-linear-combinations-and-vector-spaces/</link><pubDate>Wed, 08 Jan 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/02-linear-combinations-and-vector-spaces/</guid><description>&lt;h2 id="why-this-chapter-matters">Why This Chapter Matters&lt;/h2>
&lt;p>Open a box of crayons that contains only &lt;strong>red, green, and blue&lt;/strong>. How many colors can you draw? The honest answer is &lt;strong>infinitely many&lt;/strong> — every shade you have ever seen on a screen is just a different mix of those three. Three &amp;ldquo;ingredients&amp;rdquo; produce an entire universe.&lt;/p>
&lt;p>That recipe — &lt;em>take a few vectors, scale them, add them up&lt;/em> — is called a &lt;strong>linear combination&lt;/strong>. The whole of linear algebra is built on this one move. Once you understand it deeply, you also understand:&lt;/p></description></item><item><title>The Essence of Vectors -- More Than Just Arrows</title><link>https://www.chenk.top/en/linear-algebra/01-the-essence-of-vectors/</link><pubDate>Wed, 01 Jan 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linear-algebra/01-the-essence-of-vectors/</guid><description>&lt;h2 id="why-vectors-and-why-care">Why Vectors, and Why Care?&lt;/h2>
&lt;p>A physicist talks about a &lt;em>force&lt;/em>. A data scientist talks about a &lt;em>feature&lt;/em>. A game programmer talks about a &lt;em>velocity&lt;/em>. A quantum theorist talks about a &lt;em>state&lt;/em>. Different worlds, different languages &amp;ndash; but the same underlying object: &lt;strong>a vector&lt;/strong>.&lt;/p>
&lt;p>That is not a coincidence. A vector is the smallest piece of mathematics flexible enough to describe &lt;strong>anything you can add together and scale&lt;/strong>. Once you spot that pattern, you spot it everywhere.&lt;/p></description></item><item><title>Time Series Forecasting (8): Informer -- Efficient Long-Sequence Forecasting</title><link>https://www.chenk.top/en/time-series/informer-long-sequence/</link><pubDate>Sun, 15 Dec 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/informer-long-sequence/</guid><description>&lt;p>The Transformer is wonderful at sequence modeling &amp;ndash; right up to the moment your sequence gets long. Vanilla self-attention costs $\mathcal{O}(L^2)$ in both compute and memory, so a one-week hourly window (168 steps) is fine, a one-month window (720 steps) is painful, and a three-month window (2160 steps) is essentially impossible on a single GPU. That is exactly the regime real-world long-horizon forecasting lives in: weather, energy, finance, IoT.&lt;/p>
&lt;p>&lt;strong>Informer&lt;/strong> (Zhou et al., AAAI 2021 best paper) is the architecture that finally made Transformers practical for these settings. It does three things, each of which would be a contribution on its own:&lt;/p></description></item><item><title>Vim Essentials: Modal Editing, Motions, and a Repeatable Workflow</title><link>https://www.chenk.top/en/standalone/vim-essentials/</link><pubDate>Fri, 06 Dec 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/vim-essentials/</guid><description>&lt;p>Most people quit Vim because they try to memorize shortcuts. That is the wrong frame. Vim is a &lt;em>small language&lt;/em>: learn the grammar &amp;ndash; &lt;strong>operator + motion&lt;/strong> &amp;ndash; and you can express any edit without ever opening a cheat sheet again. This guide walks you through the 80% of Vim you will use daily, then shows how the remaining 20% composes naturally from the same handful of rules.&lt;/p>
&lt;h2 id="what-you-will-learn">What you will learn&lt;/h2>
&lt;ul>
&lt;li>The single core idea: &lt;strong>modes&lt;/strong> plus &lt;strong>composable operations&lt;/strong> (operator + motion)&lt;/li>
&lt;li>The handful of motions, text objects, and operators that cover almost everything&lt;/li>
&lt;li>File operations, search &amp;amp; replace, macros, marks, registers&lt;/li>
&lt;li>Buffers vs windows vs tabs &amp;ndash; the mental model people most often get wrong&lt;/li>
&lt;li>A minimal &lt;code>.vimrc&lt;/code> and a one-week deliberate-practice plan to build muscle memory&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Any terminal (Vim ships with virtually every Unix-like system)&lt;/li>
&lt;li>A willingness to feel slow for about a week&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-the-core-idea----modes-plus-a-tiny-grammar">1. The core idea &amp;ndash; modes plus a tiny grammar&lt;/h2>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/standalone/vim-essentials/fig1_mode_state_diagram.png" alt="The Four Modes of Vim" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p></description></item><item><title>Time Series Forecasting (7): N-BEATS -- Interpretable Deep Architecture</title><link>https://www.chenk.top/en/time-series/n-beats/</link><pubDate>Sat, 30 Nov 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/n-beats/</guid><description>&lt;p>The 2018 M4 forecasting competition served 100,000 series across six frequencies as a single benchmark. The leaderboard was dominated by hand-tuned ensembles built from decades of statistical-forecasting craft. Then a &lt;strong>pure neural network&lt;/strong> with no statistical preprocessing, no feature engineering, and no recurrence won outright. That network was &lt;strong>N-BEATS&lt;/strong> by Oreshkin et al. &amp;ndash; a stack of fully-connected blocks with two residual paths. Its interpretable variant additionally split the forecast into a polynomial trend and a Fourier seasonality, so the very thing classical statisticians wanted (a readable decomposition) came for free.&lt;/p></description></item><item><title>Time Series Forecasting (6): Temporal Convolutional Networks (TCN)</title><link>https://www.chenk.top/en/time-series/temporal-convolutional-networks/</link><pubDate>Fri, 15 Nov 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/temporal-convolutional-networks/</guid><description>&lt;p>For most of the 2010s, anyone who said &amp;ldquo;deep learning for time series&amp;rdquo; meant LSTM. The story changed in 2018 when Bai, Kolter, and Koltun published &lt;em>An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling&lt;/em>. Their result was annoyingly simple: take a stack of 1-D convolutions, make them causal (no peeking at the future), space the filter taps out exponentially (dilation), wrap the whole thing in residual connections, and train. On task after task, the resulting &lt;strong>Temporal Convolutional Network&lt;/strong> (TCN) matched or beat LSTM/GRU &amp;ndash; while training several times faster because every time step in the forward pass runs in parallel.&lt;/p></description></item><item><title>Time Series Forecasting (5): Transformer Architecture for Time Series</title><link>https://www.chenk.top/en/time-series/transformer/</link><pubDate>Thu, 31 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/transformer/</guid><description>&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>The full encoder-decoder Transformer, redrawn for time series&lt;/li>
&lt;li>Why position must be injected, and how sinusoidal / learned / time-aware encodings differ&lt;/li>
&lt;li>What multi-head attention actually learns over a temporal sequence&lt;/li>
&lt;li>Where vanilla attention breaks down (O(n^2)) and the four families of fixes: sparse, linear, patched, decoder-only&lt;/li>
&lt;li>A clean PyTorch reference implementation, plus when to reach for Autoformer / FEDformer / Informer / PatchTST&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Self-attention and multi-head attention (Part 4)&lt;/li>
&lt;li>Encoder-decoder architectures and teacher forcing&lt;/li>
&lt;li>PyTorch fundamentals (&lt;code>nn.Module&lt;/code>, training loops)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-why-transformers-for-time-series">1. Why Transformers for Time Series&lt;/h2>
&lt;p>LSTM and GRU process a sequence step by step. Three things follow from
that:&lt;/p></description></item><item><title>Time Series Forecasting (4): Attention Mechanisms -- Direct Long-Range Dependencies</title><link>https://www.chenk.top/en/time-series/attention-mechanism/</link><pubDate>Wed, 16 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/attention-mechanism/</guid><description>&lt;h2 id="what-you-will-learn">What you will learn&lt;/h2>
&lt;ul>
&lt;li>Why recurrent models hit a wall on long-range dependencies, and how attention removes it.&lt;/li>
&lt;li>The Query / Key / Value mechanism, scaled dot-product attention, and the role of $1/\sqrt{d_k}$.&lt;/li>
&lt;li>Two classic scoring functions &amp;ndash; &lt;strong>Bahdanau&lt;/strong> (additive) and &lt;strong>Luong&lt;/strong> (multiplicative).&lt;/li>
&lt;li>How to wire &lt;strong>attention into an LSTM encoder/decoder&lt;/strong> for time series.&lt;/li>
&lt;li>&lt;strong>Multi-head attention&lt;/strong> specialised for time &amp;ndash; different heads for recency, period, anomaly.&lt;/li>
&lt;li>The $O(n^2)$ memory wall and how sparse / linear attention bypass it.&lt;/li>
&lt;li>A worked &lt;strong>stock-prediction case&lt;/strong> with attention-weight overlays.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Prerequisites&lt;/strong>: RNN/LSTM/GRU intuition (Parts 2-3), basic linear algebra, PyTorch.&lt;/p></description></item><item><title>MoSLoRA: Mixture-of-Subspaces in Low-Rank Adaptation</title><link>https://www.chenk.top/en/standalone/moslora/</link><pubDate>Sat, 12 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/moslora/</guid><description>&lt;p>LoRA is the default tool for adapting a frozen base model: cheap, stable, mergeable, and good enough for most single-task settings. But the moment your fine-tuning data is genuinely heterogeneous &amp;ndash; code mixed with math, instruction following mixed with creative writing, several domains in one adapter &amp;ndash; a single low-rank subspace starts to feel cramped. You can grow $r$, but cost grows with it and you still get &lt;em>one&lt;/em> subspace, just a fatter one.&lt;/p></description></item><item><title>Tennis-Scene Computer Vision: From Paper Survey to Production</title><link>https://www.chenk.top/en/standalone/tennis-cv-system-design/</link><pubDate>Mon, 07 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/tennis-cv-system-design/</guid><description>&lt;p>A 6.7 cm tennis ball travels at over 200 km/h. Reconstructing its 3D trajectory from eight 4K cameras in real time, while simultaneously classifying what stroke each player is making, is a system problem that touches &lt;strong>small-object detection, multi-view geometry, Kalman filtering, physics modelling, and human-pose estimation&lt;/strong> — all at once. This post walks the same path you&amp;rsquo;d walk at deployment time: state the constraints, survey the literature, choose, then build, and finally lay out a millisecond-by-millisecond budget for what runs in production.&lt;/p></description></item><item><title>Time Series Forecasting (3): GRU -- Lightweight Gates and Efficiency Trade-offs</title><link>https://www.chenk.top/en/time-series/gru/</link><pubDate>Tue, 01 Oct 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/gru/</guid><description>&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>How GRU&amp;rsquo;s &lt;strong>update gate&lt;/strong> $z_t$ and &lt;strong>reset gate&lt;/strong> $r_t$ achieve LSTM-quality memory with one fewer gate and one fewer state.&lt;/li>
&lt;li>Why GRU has exactly &lt;strong>25% fewer parameters&lt;/strong> than LSTM, and what that buys you in practice.&lt;/li>
&lt;li>How to read GRU &lt;strong>gate activations&lt;/strong> to debug what the model is paying attention to.&lt;/li>
&lt;li>A practical &lt;strong>decision matrix&lt;/strong> for picking GRU vs LSTM, backed by parameter, speed, and forecast-quality benchmarks.&lt;/li>
&lt;li>A clean PyTorch reference implementation with the regularisation and stability tricks that actually matter.&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Comfort with the LSTM gates from &lt;a href="https://www.chenk.top/en/time-series-lstm/">Part 2&lt;/a>
.&lt;/li>
&lt;li>Basic PyTorch (&lt;code>nn.Module&lt;/code>, autograd, optimizers).&lt;/li>
&lt;li>Recall that gradient flow through tanh nonlinearities is what kills vanilla RNNs.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/time-series/gru/fig1_gru_cell_architecture.png" alt="GRU cell with reset and update gates and the (1 - z) gradient highway from h_{t-1} to h_t." loading="lazy" decoding="async">
 
&lt;/figure>

&lt;em>Figure 1. The GRU cell. Two gates (&lt;code>r&lt;/code>, &lt;code>z&lt;/code>) and one state (&lt;code>h&lt;/code>) replace LSTM&amp;rsquo;s three gates and separate cell state. The orange &lt;code>(1 - z) ⊙ h_{t-1}&lt;/code> skip path is the linear gradient highway that makes long-range learning tractable.&lt;/em>&lt;/p></description></item><item><title>Time Series Forecasting (2): LSTM -- Gate Mechanisms and Long-Term Dependencies</title><link>https://www.chenk.top/en/time-series/lstm/</link><pubDate>Mon, 16 Sep 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/lstm/</guid><description>&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Why vanilla RNNs fail on long sequences and how LSTM fixes the gradient problem&lt;/li>
&lt;li>The intuition behind each gate (forget, input, output) and the cell-state &amp;ldquo;highway&amp;rdquo;&lt;/li>
&lt;li>How to structure inputs/outputs for one-step and multi-step time series forecasting&lt;/li>
&lt;li>Practical recipes: regularization, sequence length, bidirectional vs stacked LSTM, when to choose LSTM vs GRU&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Basic understanding of neural networks (forward pass, backpropagation)&lt;/li>
&lt;li>Familiarity with PyTorch (&lt;code>nn.Module&lt;/code>, tensors, optimizers)&lt;/li>
&lt;li>Part 1 of this series (helpful but not required)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-the-problem-lstm-solves">1. The Problem LSTM Solves&lt;/h2>
$$h_t = \tanh(W_h h_{t-1} + W_x x_t + b).$$$$\frac{\partial h_T}{\partial h_k} = \prod_{t=k+1}^{T} \mathrm{diag}\!\left(1 - h_t^2\right) W_h.$$&lt;p>Two regimes appear:&lt;/p></description></item><item><title>Time Series Forecasting (1): Traditional Statistical Models</title><link>https://www.chenk.top/en/time-series/01-traditional-models/</link><pubDate>Sun, 01 Sep 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/time-series/01-traditional-models/</guid><description>&lt;blockquote>
&lt;p>&lt;a href="https://www.chenk.top/en/time-series-lstm/">Next: LSTM Deep Dive &amp;ndash;&amp;gt;&lt;/a>
&lt;/p>
&lt;/blockquote>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Why &lt;strong>stationarity&lt;/strong> is the entry ticket for the whole ARIMA family, and how differencing buys it.&lt;/li>
&lt;li>How to read &lt;strong>ACF and PACF&lt;/strong> plots like a Box-Jenkins practitioner: cut-off vs. tail-off as the rule for identifying $p$ and $q$.&lt;/li>
&lt;li>The full &lt;strong>ARIMA / SARIMA&lt;/strong> machinery, including how seasonality is folded in via lag-$s$ operators.&lt;/li>
&lt;li>Where &lt;strong>VAR, GARCH, exponential smoothing, Prophet and the Kalman filter&lt;/strong> sit on the same map &amp;ndash; mean dynamics vs. variance dynamics vs. state-space recursion.&lt;/li>
&lt;li>A decision rule for when a traditional model is the right answer and when to graduate to the deep models in the rest of this series.&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Basic probability and statistics (mean, variance, covariance, correlation).&lt;/li>
&lt;li>Familiarity with NumPy and &lt;code>pandas&lt;/code> time indexes.&lt;/li>
&lt;li>A little linear algebra for the VAR / Kalman sections (matrix multiplication, eigenvalues).&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-why-traditional-models-still-matter">1. Why traditional models still matter&lt;/h2>
&lt;p>Before the deep-learning era, the time-series toolbox was already remarkably complete. ARIMA captures linear autocorrelation, SARIMA adds calendar effects, VAR generalises to vectors, GARCH models the variance, and the Kalman filter unifies the lot inside a state-space recursion. They share three properties that deep models do not give for free:&lt;/p></description></item><item><title>PDE and Machine Learning (8): Reaction-Diffusion Systems and Graph Neural Networks</title><link>https://www.chenk.top/en/pde-ml/08-reaction-diffusion-systems/</link><pubDate>Wed, 14 Aug 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/08-reaction-diffusion-systems/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>Stack 32 layers of GCN on a citation graph and accuracy collapses from 81 % to 20 %. Every node converges to the same vector. This is &lt;strong>over-smoothing&lt;/strong>, the GNN equivalent of heat death — and the diagnosis comes straight from PDE theory. &lt;strong>A GCN layer is one explicit-Euler step of the heat equation on a graph&lt;/strong>, and the heat equation has exactly one fixed point: the constant. The cure was published in 1952. Alan Turing showed that adding a &lt;em>reaction&lt;/em> term to a diffusion equation can make a uniform state spontaneously break apart into stripes, spots, or labyrinths. The same trick — a learned reaction term — keeps deep GNNs alive.&lt;/p></description></item><item><title>PDE and Machine Learning (7): Diffusion Models and Score Matching</title><link>https://www.chenk.top/en/pde-ml/07-diffusion-models/</link><pubDate>Tue, 30 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/07-diffusion-models/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>Since 2020, &lt;strong>diffusion models&lt;/strong> have become the dominant paradigm in generative AI. From DALL·E 2 to Stable Diffusion to Sora, their generation quality and training stability are unmatched by GANs and VAEs. Beneath this success lies a remarkably clean mathematical structure: &lt;strong>diffusion models are numerical solvers for partial differential equations&lt;/strong>.&lt;/p>
&lt;ul>
&lt;li>Adding Gaussian noise corresponds to integrating the &lt;strong>Fokker–Planck equation&lt;/strong> forward in time.&lt;/li>
&lt;li>Learning to denoise is equivalent to learning the &lt;strong>score function&lt;/strong> $\nabla\log p_t$.&lt;/li>
&lt;li>DDPM is a discretised &lt;strong>reverse SDE&lt;/strong>; DDIM is the corresponding &lt;strong>probability-flow ODE&lt;/strong>.&lt;/li>
&lt;li>Stable Diffusion is the same machinery, executed in a low-dimensional latent space.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>What you will learn&lt;/strong>&lt;/p></description></item><item><title>PDE and Machine Learning (6): Continuous Normalizing Flows and Neural ODE</title><link>https://www.chenk.top/en/pde-ml/06-continuous-normalizing-flows/</link><pubDate>Mon, 15 Jul 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/06-continuous-normalizing-flows/</guid><description>&lt;h2 id="what-this-article-covers">What This Article Covers&lt;/h2>
&lt;p>Generative modeling reduces to one geometric question: &lt;strong>how do you transform a simple distribution (a Gaussian) into a complex one (faces, molecules, motion)?&lt;/strong> Discrete normalizing flows stack invertible blocks, but each block needs a Jacobian determinant at $O(d^3)$ cost. &lt;strong>Neural ODEs&lt;/strong> replace discrete depth with a continuous ODE; &lt;strong>Continuous Normalizing Flows (CNF)&lt;/strong> then push densities through that ODE using the &lt;em>instantaneous&lt;/em> change-of-variables formula, dropping density computation to $O(d)$. &lt;strong>Flow Matching&lt;/strong> removes the divergence integral altogether and turns training into plain regression on a target velocity field.&lt;/p></description></item><item><title>PDE and Machine Learning (5): Symplectic Geometry and Structure-Preserving Networks</title><link>https://www.chenk.top/en/pde-ml/05-symplectic-geometry/</link><pubDate>Sun, 30 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/05-symplectic-geometry/</guid><description>&lt;h2 id="what-this-article-covers">What this article covers&lt;/h2>
&lt;p>Train an unconstrained neural network on pendulum data and ask it to extrapolate. After a few seconds of integration the prediction is fine; after a minute the pendulum has either crept to a halt or, more often, accelerated to escape velocity. Energy was supposed to be conserved, but the network has no idea what energy is. The bug is not in the data, the optimizer, or the depth of the network. &lt;strong>The bug is in the architecture.&lt;/strong> A standard MLP can represent any vector field, including unphysical ones, and a tiny systematic bias in that vector field is amplified into macroscopic energy drift over a long rollout.&lt;/p></description></item><item><title>PDE and Machine Learning (4): Variational Inference and the Fokker-Planck Equation</title><link>https://www.chenk.top/en/pde-ml/04-variational-inference/</link><pubDate>Sat, 15 Jun 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/04-variational-inference/</guid><description>&lt;h2 id="seven-dimensions-of-this-article">Seven Dimensions of This Article&lt;/h2>
&lt;ol>
&lt;li>&lt;strong>Motivation&lt;/strong>: why VI and MCMC look different but solve the same PDE.&lt;/li>
&lt;li>&lt;strong>Theory&lt;/strong>: derivation of the Fokker-Planck equation from the SDE.&lt;/li>
&lt;li>&lt;strong>Geometry&lt;/strong>: KL divergence as a Wasserstein gradient flow.&lt;/li>
&lt;li>&lt;strong>Algorithms&lt;/strong>: Langevin Monte Carlo, mean-field VI, and SVGD.&lt;/li>
&lt;li>&lt;strong>Convergence&lt;/strong>: log-Sobolev inequality and exponential KL decay.&lt;/li>
&lt;li>&lt;strong>Numerical experiments&lt;/strong>: 7 figures with reproducible code.&lt;/li>
&lt;li>&lt;strong>Application&lt;/strong>: Bayesian neural networks via posterior sampling.&lt;/li>
&lt;/ol>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>How the Fokker-Planck equation governs probability density evolution from any Itô SDE.&lt;/li>
&lt;li>Langevin dynamics as a practical sampling algorithm and its discretization error.&lt;/li>
&lt;li>Why minimizing $\mathrm{KL}(q\|p^\star)$ in Wasserstein space &lt;em>is&lt;/em> the Fokker-Planck PDE.&lt;/li>
&lt;li>The deep equivalence between variational inference and Langevin MCMC in continuous time.&lt;/li>
&lt;li>Stein Variational Gradient Descent (SVGD): a deterministic particle method that bridges both worlds.&lt;/li>
&lt;li>Practical posterior inference for Bayesian neural networks.&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Probability theory (Bayes&amp;rsquo; rule, KL divergence, expectations).&lt;/li>
&lt;li>Wasserstein gradient flows from Part 3.&lt;/li>
&lt;li>Light stochastic calculus intuition (Brownian motion, Itô integral).&lt;/li>
&lt;li>Python / PyTorch for the experiments.&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="1-the-inference-problem">1. The Inference Problem&lt;/h2>
&lt;p>Bayesian inference asks for the posterior&lt;/p></description></item><item><title>PDE and Machine Learning (3): Variational Principles and Optimization</title><link>https://www.chenk.top/en/pde-ml/03-variational-principles/</link><pubDate>Fri, 31 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/03-variational-principles/</guid><description>&lt;p>What is the essence of neural-network training? When we run gradient descent in a high-dimensional parameter space, is there a deeper continuous-time dynamics at work? As the network width tends to infinity, does discrete parameter updating converge to some elegant partial differential equation? The answers live at the intersection of the calculus of variations, optimal transport, and PDE theory.&lt;/p>
&lt;p>The last decade of deep-learning success has rested mostly on engineering intuition. Recently, however, mathematicians have made a striking observation: &lt;strong>viewing a neural network as a particle system on the space of probability measures&lt;/strong>, and studying its evolution under Wasserstein geometry, exposes the global structure of training — convergence guarantees, the role of over-parameterization, the meaning of initialization. The tool that makes this visible is &lt;strong>the variational principle&lt;/strong> — from least action in physics, through the JKO scheme of modern optimal transport, to the mean-field limit of neural networks.&lt;/p></description></item><item><title>PDE and Machine Learning (2) — Neural Operator Theory</title><link>https://www.chenk.top/en/pde-ml/02-neural-operator-theory/</link><pubDate>Thu, 16 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/02-neural-operator-theory/</guid><description>&lt;p>A classical PDE solver — finite difference, finite element, spectral — is a function: feed it one initial condition and one set of coefficients, get back one solution. A PINN is the same kind of object dressed in neural-network clothes: each new initial condition demands a fresh round of training. Switch the inflow velocity on a wing or move a single sensor reading in a forecast and you reset the clock.&lt;/p></description></item><item><title>PDE and Machine Learning (1): Physics-Informed Neural Networks</title><link>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</link><pubDate>Wed, 01 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/pde-ml/01-physics-informed-neural-networks/</guid><description>&lt;blockquote>
&lt;p>&lt;strong>Series chapter 1 — about a 35-minute read.&lt;/strong> This is the foundation of the entire series. Neural operators, variational principles, score matching — every later chapter is, at heart, &lt;em>the same idea&lt;/em>: how do we encode physical or mathematical constraints directly into the optimisation objective of a neural network? Get PINNs right and the rest is &amp;ldquo;swap one constraint for another&amp;rdquo;.&lt;/p>
&lt;/blockquote>
&lt;hr>
&lt;h2 id="1-prologue-a-metal-rod">1 Prologue: a metal rod&lt;/h2>
&lt;p>Suppose you want the temperature distribution $u(x,t)$ along a metal rod. Half a century of numerical analysis offers two standard answers:&lt;/p></description></item><item><title>Ordinary Differential Equations (18): Frontiers and Series Finale</title><link>https://www.chenk.top/en/ode/18-advanced-topics-summary/</link><pubDate>Mon, 15 Apr 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/18-advanced-topics-summary/</guid><description>&lt;p>&lt;strong>The journey ends here.&lt;/strong> Eighteen chapters ago we picked up a falling apple. Today we&amp;rsquo;re going to finish in the same vein in which we began &amp;ndash; by treating ODEs as the &lt;em>universal language of change&lt;/em> &amp;ndash; but standing on a much taller mountain.&lt;/p>
&lt;p>This chapter does three things. First, it surveys four active research frontiers that are reshaping how we &lt;em>model&lt;/em> dynamical systems: Neural ODEs, delay equations, stochastic differential equations, and fractional calculus. Second, it reviews the entire series with a problem-solving flowchart and a chapter-by-chapter map. Third, it draws explicit connections from the classical theory you have just mastered to modern machine learning &amp;ndash; the place where ODEs are most alive in 2025.&lt;/p></description></item><item><title>Ordinary Differential Equations (17): Physics and Engineering Applications</title><link>https://www.chenk.top/en/ode/17-physics-engineering-applications/</link><pubDate>Fri, 29 Mar 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/17-physics-engineering-applications/</guid><description>&lt;p>&lt;strong>Differential equations are not a pure mathematical game &amp;ndash; they are the language for understanding the physical world.&lt;/strong> From celestial motion to circuit response, from a swinging pendulum to vortex shedding behind a bridge cable, every dynamical system &amp;ldquo;speaks&amp;rdquo; ODE.&lt;/p>
&lt;p>This chapter is a deliberate tour through five canonical applications. Each one will pay back the entire ODE toolkit we built in chapters 1-16: phase planes, eigenvalues, Laplace transforms, modal analysis, conservation laws, numerical integration, control. None of the examples is a &amp;ldquo;toy&amp;rdquo; &amp;ndash; they are all genuine working physics, written tightly so that the structure remains visible.&lt;/p></description></item><item><title>Ordinary Differential Equations (16): Fundamentals of Control Theory</title><link>https://www.chenk.top/en/ode/16-control-theory/</link><pubDate>Tue, 12 Mar 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/16-control-theory/</guid><description>&lt;p>&lt;strong>When you steer a car you constantly correct based on lane position. A thermostat compares room temperature with the setpoint and adjusts a heater. A rocket gimbal nudges its thrust vector to keep the booster vertical.&lt;/strong> Strip away the hardware and the same idea remains: &lt;em>measure, compare, act&lt;/em>. Control theory is the mathematics of that loop &amp;ndash; and its native language is the ordinary differential equation.&lt;/p>
&lt;p>This chapter shows how the entire ODE toolkit &amp;ndash; Laplace transforms (Ch 4), linear systems (Ch 6), eigenvalue stability (Ch 7), nonlinear stability (Ch 8) &amp;ndash; collapses into a single unified discipline whose job is no longer to &lt;em>describe&lt;/em> dynamics, but to &lt;em>design&lt;/em> them.&lt;/p></description></item><item><title>Ordinary Differential Equations (15): Population Dynamics</title><link>https://www.chenk.top/en/ode/15-population-dynamics/</link><pubDate>Sat, 24 Feb 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/15-population-dynamics/</guid><description>&lt;p>&lt;strong>Why do lynx and snowshoe hare populations cycle with eerie regularity over a 10-year period?&lt;/strong> Why does introducing a single new species sometimes collapse an entire ecosystem? Why do similar competitors sometimes coexist and sometimes drive each other extinct? The answers are not in the species; they are in the &lt;em>equations&lt;/em> relating the species. This chapter walks through the canonical models of mathematical ecology: from the single-population logistic and Allee models to multi-species competition, predator-prey oscillations, age structure, and spatial spread.&lt;/p></description></item><item><title>Ordinary Differential Equations (14): Epidemic Models and Epidemiology</title><link>https://www.chenk.top/en/ode/14-epidemiology/</link><pubDate>Wed, 07 Feb 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/14-epidemiology/</guid><description>&lt;p>&lt;strong>In early 2020 the entire world watched a small system of ordinary differential equations decide policy.&lt;/strong> &amp;ldquo;Flatten the curve&amp;rdquo; was not a slogan; it was the intuition of a specific equation. &lt;em>Herd immunity&lt;/em> was not a guess; it was the threshold $1 - 1/R_0$ derived in a single line. The SIR model &amp;ndash; four lines of math, written down in 1927 by Kermack and McKendrick &amp;ndash; turned out to be precise enough to drive trillion-dollar decisions.&lt;/p></description></item><item><title>Ordinary Differential Equations (13): Introduction to Partial Differential Equations</title><link>https://www.chenk.top/en/ode/13-pde-introduction/</link><pubDate>Sun, 21 Jan 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/13-pde-introduction/</guid><description>&lt;p>&lt;strong>Once a quantity depends on more than one variable, the ODE world splinters into a vastly richer one: partial differential equations.&lt;/strong> Heat in a metal rod is a function of position &lt;em>and&lt;/em> time; a vibrating string moves in space &lt;em>and&lt;/em> time; a steady electrostatic potential lives in three spatial dimensions. ODE techniques become tools, not solutions &amp;ndash; separation of variables turns one PDE into a &lt;em>family&lt;/em> of ODEs, the eigenvalues of those ODEs become the spectrum of the operator, and superposition stitches everything back together.&lt;/p></description></item><item><title>Ordinary Differential Equations (12): Boundary Value Problems</title><link>https://www.chenk.top/en/ode/12-boundary-value-problems/</link><pubDate>Thu, 04 Jan 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/12-boundary-value-problems/</guid><description>&lt;p>An initial value problem hands you a starting state and asks you to march forward. A boundary value problem (BVP) hands you partial information at two different points and asks you to find a path that fits both. The change is small in wording, large in consequence: BVPs can have a unique solution, no solution at all, or infinitely many. They demand a fundamentally different toolkit &amp;ndash; one that is iterative, global, and intimately connected to linear algebra.&lt;/p></description></item><item><title>Ordinary Differential Equations (11): Numerical Methods</title><link>https://www.chenk.top/en/ode/11-numerical-methods/</link><pubDate>Mon, 18 Dec 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/11-numerical-methods/</guid><description>&lt;p>Almost every interesting differential equation in science and engineering refuses to yield a closed-form solution. Nonlinear vector fields, variable coefficients, ten thousand coupled state variables &amp;ndash; pen and paper give up long before the problem does. Numerical integration is the way through. This chapter builds, evaluates, and compares the small set of algorithms that solve essentially every ODE you will meet, and gives you the diagnostics to know when an integrator is lying to you.&lt;/p></description></item><item><title>HCGR: Hyperbolic Contrastive Graph Representation Learning for Session-based Recommendation</title><link>https://www.chenk.top/en/standalone/hcgr/</link><pubDate>Sat, 16 Dec 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/hcgr/</guid><description>&lt;p>A user opens a sneaker app, taps &amp;ldquo;running shoes&amp;rdquo;, drills into a brand, then a price band, then a single SKU. That trajectory is a &lt;em>tree&lt;/em>: each click narrows the candidate set roughly multiplicatively. In Euclidean space you need many dimensions to keep all the leaves of that tree apart, because Euclidean volume only grows polynomially with radius. In hyperbolic space volume grows &lt;em>exponentially&lt;/em> with radius, so the tree fits naturally — a few dimensions are enough to keep the whole long tail untangled.&lt;/p></description></item><item><title>Ordinary Differential Equations (10): Bifurcation Theory</title><link>https://www.chenk.top/en/ode/10-bifurcation-theory/</link><pubDate>Fri, 01 Dec 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/10-bifurcation-theory/</guid><description>&lt;p>A lake stays clear for decades, then turns murky in a single season. A power grid hums along stably, then trips into a cascading blackout in seconds. A column under slowly increasing load is straight, straight, straight &amp;ndash; and then suddenly buckles.&lt;/p>
&lt;p>These are not failures of prediction. They are the universe doing exactly what dynamical systems theory says it must do: cross a &lt;strong>bifurcation&lt;/strong>. When a parameter drifts past a critical value, the topology of phase space rearranges itself, and what was once impossible becomes inevitable. This chapter is about classifying those rearrangements. There turn out to be only a handful of them, and once you see the catalogue you start spotting them everywhere.&lt;/p></description></item><item><title>ODE Chapter 9: Chaos Theory and the Lorenz System</title><link>https://www.chenk.top/en/ode/09-bifurcation-chaos/</link><pubDate>Tue, 14 Nov 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/09-bifurcation-chaos/</guid><description>&lt;p>&lt;strong>In 1961, Edward Lorenz restarted a weather simulation from a rounded-off number &amp;ndash; 0.506 instead of 0.506127.&lt;/strong> Within simulated weeks the forecast was unrecognisable. That single accident gave us &lt;strong>the butterfly effect&lt;/strong> and turned chaos from a metaphor into a science. The lesson is profound and sober: equations that are &lt;em>exactly&lt;/em> deterministic can still be &lt;em>practically&lt;/em> unpredictable.&lt;/p>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>The four conditions that &lt;em>together&lt;/em> define chaos&lt;/li>
&lt;li>The Lorenz system: paradigm of deterministic chaos&lt;/li>
&lt;li>Butterfly effect, visualised on the attractor itself&lt;/li>
&lt;li>Lyapunov exponents: numerical fingerprint of chaos&lt;/li>
&lt;li>Bifurcation cascades and the period-doubling route to chaos&lt;/li>
&lt;li>Other chaotic systems: Rossler and the double pendulum&lt;/li>
&lt;li>Strange attractors, fractal dimension, stretching-and-folding&lt;/li>
&lt;li>Applications: weather, encryption, controlling chaos, ensemble forecasting&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Chapter 8: nonlinear systems, phase portraits, limit cycles&lt;/li>
&lt;li>Chapter 7: stability and bifurcation basics&lt;/li>
&lt;li>Comfort with 3D visualization&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="what-is-chaos">What Is Chaos?&lt;/h2>
&lt;p>A chaotic system satisfies &lt;strong>all four&lt;/strong> of:&lt;/p></description></item><item><title>ODE Chapter 8: Nonlinear Systems and Phase Portraits</title><link>https://www.chenk.top/en/ode/08-nonlinear-stability/</link><pubDate>Sat, 28 Oct 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/08-nonlinear-stability/</guid><description>&lt;p>&lt;strong>The real world is nonlinear.&lt;/strong> Predator-prey cycles, heartbeat rhythms, neuron firing &amp;ndash; none of these can be captured by linear equations. When superposition fails, the world acquires &lt;em>new&lt;/em> behaviors: limit cycles, multiple equilibria, bistability, hysteresis. This chapter gives you the geometric and analytic tools to read those behaviors directly off a 2D phase portrait.&lt;/p>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Why nonlinear systems are &lt;em>fundamentally&lt;/em> different from linear ones&lt;/li>
&lt;li>Lyapunov stability visualized: level sets, bowls, and basins&lt;/li>
&lt;li>Linearization vs. the full nonlinear picture (Hartman-Grobman in action)&lt;/li>
&lt;li>Lotka-Volterra predator-prey: closed orbits and conserved quantities&lt;/li>
&lt;li>Competition models: four canonical outcomes&lt;/li>
&lt;li>Van der Pol oscillator and the geometry of limit cycles&lt;/li>
&lt;li>Gradient and Hamiltonian systems&lt;/li>
&lt;li>Poincare-Bendixson: why 2D systems cannot be chaotic&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Chapter 6: linear systems, phase portrait classification&lt;/li>
&lt;li>Chapter 7: stability, linearization, Lyapunov functions&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="from-linear-to-nonlinear">From Linear to Nonlinear&lt;/h2>
&lt;p>Linear systems obey &lt;strong>superposition&lt;/strong>: if $\mathbf{x}_1$ and $\mathbf{x}_2$ are solutions, so is $c_1\mathbf{x}_1 + c_2\mathbf{x}_2$. This is the engine that powers the entire toolkit of Chapters 1-6 &amp;ndash; exponential ansatz, eigenvectors, fundamental matrices.&lt;/p></description></item><item><title>Kernel Methods: From Theory to Practice (RKHS, Common Kernels, and Hyperparameter Tuning)</title><link>https://www.chenk.top/en/standalone/kernel-methods/</link><pubDate>Sun, 15 Oct 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/kernel-methods/</guid><description>&lt;p>You have non-linear data and a linear algorithm. The kernel trick lets you run that linear algorithm on the non-linear data &amp;ndash; without ever writing down the high-dimensional feature map. This guide builds the intuition first, then the math, then a practical toolkit you can ship.&lt;/p>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>The kernel trick: why it works and what it actually buys you&lt;/li>
&lt;li>Mathematical foundations: positive-definite kernels, RKHS, Mercer&amp;rsquo;s theorem&lt;/li>
&lt;li>Common kernels: RBF, polynomial, linear, Matern, periodic, sigmoid&lt;/li>
&lt;li>Hyperparameter tuning: grid search, random search, marginal likelihood&lt;/li>
&lt;li>Troubleshooting: overfitting, underfitting, numerical instability, scale&lt;/li>
&lt;li>A kernel-selection decision tree for SVM, GP, and Kernel PCA&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Linear algebra basics (dot products, eigendecomposition)&lt;/li>
&lt;li>Familiarity with SVM or Gaussian Processes (conceptual)&lt;/li>
&lt;li>Python + scikit-learn&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h1 id="why-kernel-methods-matter">Why Kernel Methods Matter&lt;/h1>
&lt;p>&lt;figure>
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/standalone/kernel-methods/fig1_kernel_trick.png" alt="The Kernel Trick: a 2D ring becomes linearly separable in 3D" loading="lazy" decoding="async">
 
&lt;/figure>
&lt;/p></description></item><item><title>ODE Chapter 7: Stability Theory</title><link>https://www.chenk.top/en/ode/07-systems-and-phase-plane/</link><pubDate>Wed, 11 Oct 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/07-systems-and-phase-plane/</guid><description>&lt;p>&lt;strong>A small push hits a system. Does it return to rest, drift away, or break entirely?&lt;/strong> That single question decides whether bridges survive storms, ecosystems recover from droughts, and economies bounce back from crises. Stability theory answers it &amp;ndash; and it does so &lt;em>without ever solving the differential equation&lt;/em>. We will learn to read the destiny of a system off the geometry of its phase plane.&lt;/p>
&lt;h2 id="what-you-will-learn">What You Will Learn&lt;/h2>
&lt;ul>
&lt;li>Three precise notions: Lyapunov stable, asymptotically stable, unstable&lt;/li>
&lt;li>Linearization via the Jacobian and the Hartman-Grobman theorem&lt;/li>
&lt;li>Lyapunov&amp;rsquo;s direct method &amp;ndash; proving stability with energy-like functions&lt;/li>
&lt;li>LaSalle&amp;rsquo;s invariance principle for borderline cases&lt;/li>
&lt;li>Trace-determinant classification of all 2D linear systems&lt;/li>
&lt;li>Four canonical bifurcations: saddle-node, transcritical, pitchfork, Hopf&lt;/li>
&lt;li>Worked applications: pendulum, predator-prey, inverted pendulum control&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites">Prerequisites&lt;/h2>
&lt;ul>
&lt;li>Chapter 6: linear systems, eigenvalues, phase portraits&lt;/li>
&lt;li>Multivariable calculus: partial derivatives, Jacobian matrix&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="a-visual-tour-before-the-theory">A Visual Tour Before the Theory&lt;/h2>
&lt;p>Stability is, at heart, a &lt;em>geometric&lt;/em> statement about how trajectories move in phase space. Six pictures tell the entire story of 2D linear systems.&lt;/p></description></item><item><title>ODE Chapter 6: Linear Systems and the Matrix Exponential</title><link>https://www.chenk.top/en/ode/06-power-series/</link><pubDate>Sun, 24 Sep 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/06-power-series/</guid><description>&lt;p>&lt;strong>One equation describes one quantity. The world is rarely that obliging.&lt;/strong> Predator and prey populations push each other up and down. Currents and voltages in an RLC network oscillate together. Chemical species in a reaction network feed into one another. The moment two unknowns share an equation, you have a &lt;em>system&lt;/em>, and a single $y'=ay$ is no longer enough.&lt;/p>
&lt;p>The miracle of the linear case is this: the scalar formula $y(t)=e^{at}y_0$ generalizes verbatim once you learn what $e^{At}$ means for a &lt;em>matrix&lt;/em> $A$. Linear algebra and ODEs fuse into one object — the matrix exponential — and its eigenstructure tells you everything about the long-term behavior, the geometry of the flow, and the physics of normal modes and beats.&lt;/p></description></item><item><title>Position Encoding Brief: From Sinusoidal to RoPE and ALiBi</title><link>https://www.chenk.top/en/standalone/position-encoding-brief/</link><pubDate>Wed, 20 Sep 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/position-encoding-brief/</guid><description>&lt;p>Self-attention has a strange property that surprises most people the first time they compute it by hand: it does not know the order of its inputs. Permute the tokens and every attention score is permuted along with them — the function is exactly equivariant. So before we can do anything useful with a Transformer, we have to inject position information from the outside.&lt;/p>
&lt;p>That single design decision — &lt;em>how&lt;/em> to inject it — has spawned a remarkable amount of research. Sinusoidal, learned, relative, T5-style buckets, RoPE, ALiBi, NoPE, and more. This post is a practitioner&amp;rsquo;s brief: enough math to know why each scheme works, enough comparison to choose one, and a clear focus on the property that matters most in the LLM era — &lt;strong>length extrapolation&lt;/strong>, the ability to handle sequences longer than anything seen in training.&lt;/p></description></item><item><title>ODE Chapter 5: Power Series and Special Functions</title><link>https://www.chenk.top/en/ode/05-laplace-transform/</link><pubDate>Thu, 07 Sep 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/05-laplace-transform/</guid><description>&lt;p>&lt;strong>Some ODEs have no solutions in terms of familiar functions.&lt;/strong> The Bessel equation, the Legendre equation, the Airy equation &amp;ndash; all arise naturally in physics (heat conduction in cylinders, gravitational fields of planets, quantum tunneling). Their solutions &lt;em>define&lt;/em> entirely new functions. This chapter shows you how to find them using power series, why the Frobenius extension is forced upon us at singular points, and why the same handful of &amp;ldquo;special functions&amp;rdquo; keeps appearing across physics and engineering.&lt;/p></description></item><item><title>LAMP Stack on Alibaba Cloud ECS: From Fresh Instance to Production-Ready Web Server</title><link>https://www.chenk.top/en/standalone/lamp-on-ecs/</link><pubDate>Fri, 01 Sep 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/lamp-on-ecs/</guid><description>&lt;p>You have a fresh ECS instance and SSH access. Your goal is a public website running Apache, PHP and MySQL. Between you and that goal sit three classes of problems that catch every beginner the first time:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Network reachability&lt;/strong> &amp;ndash; packets are silently dropped at the cloud security group, the OS firewall, or the listening socket, and the symptom is the same in all three cases: nothing happens.&lt;/li>
&lt;li>&lt;strong>Service wiring&lt;/strong> &amp;ndash; Apache, PHP and MySQL are three separate processes that have to find each other through file extensions, Unix sockets and TCP ports. Each interface has its own failure mode.&lt;/li>
&lt;li>&lt;strong>Identity and permissions&lt;/strong> &amp;ndash; Apache runs as &lt;code>www-data&lt;/code>, MySQL runs as &lt;code>mysql&lt;/code>, files are owned by &lt;code>root&lt;/code> after &lt;code>wget&lt;/code>. The wrong combination produces 403, &amp;ldquo;Access denied&amp;rdquo;, or &lt;code>chmod 777&lt;/code> desperation.&lt;/li>
&lt;/ol>
&lt;p>This guide walks through all of them in the order you actually hit them on day one, then keeps going into the things that show up on day thirty: TLS, virtual hosts, backups, source compilation, and when to stop running everything on a single box.&lt;/p></description></item><item><title>Variational Autoencoder (VAE): From Intuition to Implementation and Troubleshooting</title><link>https://www.chenk.top/en/standalone/vae-guide/</link><pubDate>Sat, 26 Aug 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/vae-guide/</guid><description>&lt;p>A plain autoencoder compresses and reconstructs. A variational autoencoder learns something far more useful: a smooth, structured latent space you can &lt;em>sample&lt;/em> from to generate genuinely new data. That single change — making the encoder output a &lt;em>distribution&lt;/em> instead of a vector — turns the network from a fancy compressor into a generative model with a tractable likelihood lower bound.&lt;/p>
&lt;p>This guide walks the full path: why autoencoders fail at generation, how the ELBO derivation gets you to the loss function, why the reparameterization trick is the trick that makes everything trainable, a complete PyTorch implementation, and a tour of every common failure mode with concrete fixes.&lt;/p></description></item><item><title>paper2repo: GitHub Repository Recommendation for Academic Papers</title><link>https://www.chenk.top/en/standalone/paper2repo-github-repository-recommendation/</link><pubDate>Tue, 22 Aug 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/paper2repo-github-repository-recommendation/</guid><description>&lt;p>You read a paper, want the code, and the &amp;ldquo;code available at&amp;rdquo; link is dead, missing, or points to a stub. Search engines fall back to keyword matching over the README, which works for popular repos with descriptive names and dies on everything else. paper2repo (WWW 2020) frames this as a cross-platform recommendation problem: learn one embedding space in which a paper abstract and a GitHub repository are directly comparable by dot product, then rank.&lt;/p></description></item><item><title>ODE Chapter 4: The Laplace Transform</title><link>https://www.chenk.top/en/ode/04-constant-coefficients/</link><pubDate>Mon, 21 Aug 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/04-constant-coefficients/</guid><description>&lt;p>&lt;strong>The Laplace transform turns calculus into algebra.&lt;/strong> Instead of grinding through integration, guessing trial solutions, and bolting on initial conditions at the end, you transform the entire ODE — equation, forcing, and initial data — into a single polynomial equation in a complex variable $s$. You solve it like a high-school problem, then transform back. Along the way, the &lt;em>shape&lt;/em> of the solution becomes geometry: poles in the left half of the complex plane decay, poles on the right blow up, poles on the imaginary axis ring forever. This chapter develops that picture from first principles and connects it to the engineering tools — transfer functions, Bode plots, PID control — that turned the Laplace transform into the lingua franca of dynamics.&lt;/p></description></item><item><title>ODE Chapter 3: Higher-Order Linear Theory</title><link>https://www.chenk.top/en/ode/03-linear-theory/</link><pubDate>Fri, 04 Aug 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/03-linear-theory/</guid><description>&lt;p>&lt;strong>A first-order ODE has memory of one number; a second-order ODE has memory of two.&lt;/strong> That tiny extra degree of freedom is what lets the same equation describe a plucked guitar string, the suspension of your car, the L-C tank circuit inside an FM radio, and the swaying of a tall building in the wind. In every case the same three regimes appear &amp;ndash; oscillate, return-with-a-touch-of-overshoot, or crawl back &amp;ndash; and the same algebraic gadget, the &lt;em>characteristic equation&lt;/em>, predicts which one happens.&lt;/p></description></item><item><title>ODE Chapter 2: First-Order Methods</title><link>https://www.chenk.top/en/ode/02-first-order-methods/</link><pubDate>Tue, 18 Jul 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/02-first-order-methods/</guid><description>&lt;p>A bank account, a drug clearing the bloodstream, a tank of brine, a charging capacitor — they all obey the same kind of equation: a first-order ODE. The trick is recognising which of four shapes you are looking at, because each shape has a closed-form move that solves it cleanly. By the end of this chapter you will pattern-match an unfamiliar first-order equation in seconds and know exactly which lever to pull.&lt;/p></description></item><item><title>Session-based Recommendation with Graph Neural Networks (SR-GNN)</title><link>https://www.chenk.top/en/standalone/session-based-recommendation-with-graph-neural-networks/</link><pubDate>Thu, 13 Jul 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/session-based-recommendation-with-graph-neural-networks/</guid><description>&lt;p>A user clicks &lt;strong>A, B, C, B, D&lt;/strong>. A sequence model reads this as five tokens and folds them into a hidden state. &lt;strong>SR-GNN&lt;/strong> sees a &lt;em>graph&lt;/em> in which the edge &lt;code>B -&amp;gt; C&lt;/code> survives even after the user returns to &lt;code>B&lt;/code>, the node &lt;code>B&lt;/code> is reused (so its in/out neighbours both inform its embedding), and the geometry of the click stream is preserved as adjacency. That structural insight is why &lt;a href="https://arxiv.org/abs/1811.00855" target="_blank" rel="noopener noreferrer">SR-GNN (Wu et al., AAAI 2019) &lt;span aria-hidden="true" style="font-size:0.75em; opacity:0.55; margin-left:2px;">↗&lt;/span>&lt;/a>
 outperforms purely sequential baselines such as GRU4Rec and NARM on standard session-based recommendation (SBR) benchmarks.&lt;/p></description></item><item><title>ODE Chapter 1: Origins and Intuition</title><link>https://www.chenk.top/en/ode/01-origins-and-intuition/</link><pubDate>Sat, 01 Jul 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/ode/01-origins-and-intuition/</guid><description>&lt;p>&lt;strong>Everything around you is changing.&lt;/strong> Coffee cools, populations grow, pendulums swing, viruses spread, stocks oscillate, planets orbit. None of these systems are described by &lt;em>what something equals&lt;/em> — they are described by &lt;em>how fast something changes&lt;/em>. That second mode of description is what differential equations are for, and learning to read them is, quite literally, learning to read the language physics and biology are written in.&lt;/p>
&lt;p>This chapter rebuilds your intuition from scratch. We start with a single cup of coffee, derive the same equation that governs radioactive decay and capacitor discharge, then climb upward to direction fields, classification, and the existence-and-uniqueness theorem that tells you when an ODE has a sensible answer at all.&lt;/p></description></item><item><title>Multi-Cloud and Hybrid Architecture</title><link>https://www.chenk.top/en/cloud-computing/multi-cloud-hybrid/</link><pubDate>Wed, 14 Jun 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/multi-cloud-hybrid/</guid><description>&lt;p>The first article in this series asked &amp;ldquo;what is the cloud, and why does it matter?&amp;rdquo; Eight articles later, the question has matured into something more practical: &lt;strong>which clouds, in what combination, and how do you operate the result without losing your mind?&lt;/strong> Multi-cloud and hybrid architectures are how serious organizations answer that question. They distribute workloads across providers and on-premises infrastructure for resilience, cost optimization, and strategic flexibility &amp;ndash; but they introduce a new class of problems that single-cloud architectures never face.&lt;/p></description></item><item><title>Cloud Operations and DevOps Practices</title><link>https://www.chenk.top/en/cloud-computing/operations-devops/</link><pubDate>Fri, 26 May 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/operations-devops/</guid><description>&lt;p>In 2017 GitLab lost six hours of database state. An engineer, exhausted, ran &lt;code>rm -rf&lt;/code> on the wrong server during an incident. The backup procedures had silently been broken for months; nobody noticed because no one was restoring from backups. The lesson is not &amp;ldquo;be careful with rm&amp;rdquo;. The lesson is that operations is a &lt;em>system&lt;/em> - tools, runbooks, monitoring, automation, and the rituals around them. When the system is healthy, no single tired engineer can take down production. When the system is rotten, every late-night fix is one keystroke from disaster.&lt;/p></description></item><item><title>Cloud Security and Privacy Protection</title><link>https://www.chenk.top/en/cloud-computing/security-privacy/</link><pubDate>Sun, 07 May 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/security-privacy/</guid><description>&lt;p>In 2019 Capital One lost a hundred million customer records. The exploit chain was small: a misconfigured WAF allowed server-side request forgery against the EC2 metadata endpoint, that endpoint handed back IAM credentials, and the IAM role those credentials belonged to had wildcard &lt;code>s3:*&lt;/code> on every bucket in the account. One misconfiguration, one over-broad role, one rule the security team had not written. The bill, before legal: more than 80 million dollars.&lt;/p></description></item><item><title>Cloud Network Architecture and SDN</title><link>https://www.chenk.top/en/cloud-computing/networking-sdn/</link><pubDate>Tue, 18 Apr 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/networking-sdn/</guid><description>&lt;p>A cloud platform is, in the end, a network with computers attached. The compute layer scales by adding boxes; the storage layer scales by adding disks; the &lt;em>network&lt;/em> layer is what makes those boxes and disks behave as a single coherent system. Get the network right and the rest of the stack feels effortless. Get it wrong &amp;ndash; a missing route, a 5-tuple mismatch on a security group, an under-provisioned load balancer &amp;ndash; and the whole platform goes dark.&lt;/p></description></item><item><title>Cloud Storage Systems and Distributed Architecture</title><link>https://www.chenk.top/en/cloud-computing/storage-systems/</link><pubDate>Thu, 30 Mar 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/storage-systems/</guid><description>&lt;p>When Netflix stores petabytes of video, when Instagram serves billions of photos, when a quant fund replays a year of market data in minutes &amp;ndash; behind every one of these workloads is a &lt;em>distributed storage system&lt;/em>. Storage looks deceptively simple from a developer&amp;rsquo;s window (&lt;code>PUT key&lt;/code>, &lt;code>GET key&lt;/code>), but the moment you cross the boundary of a single machine, you inherit a stack of problems that has driven decades of research: how to survive disk failures, how to scale linearly, how to provide a consistency model that does not surprise the application, and how to do all of this while paying cents per gigabyte rather than dollars.&lt;/p></description></item><item><title>Learning Rate: From Basics to Large-Scale Training</title><link>https://www.chenk.top/en/standalone/learning-rate-guide/</link><pubDate>Mon, 13 Mar 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/learning-rate-guide/</guid><description>&lt;p>Your model diverges. You halve the learning rate. Now it trains, but takes forever. You halve again — now the loss is a flat line. Sound familiar? Of all the knobs you can turn, &lt;strong>learning rate&lt;/strong> is the one that most often decides whether training converges, crawls, or blows up. This guide gives you the intuition, the minimal math, and a practical workflow to get it right — from a 12-layer CNN on your laptop to a 70B-parameter LLM on a thousand GPUs.&lt;/p></description></item><item><title>Cloud-Native and Container Technologies</title><link>https://www.chenk.top/en/cloud-computing/cloud-native-containers/</link><pubDate>Sat, 11 Mar 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/cloud-native-containers/</guid><description>&lt;p>The shift from monolithic applications to cloud-native architectures is one of the most consequential changes in software engineering this decade. The headline &amp;ndash; containers and Kubernetes &amp;ndash; is well known. The interesting story is &lt;em>why&lt;/em> this stack won, what each layer actually does, and where the seams are that determine whether your platform feels effortless or feels like a maze.&lt;/p>
&lt;p>This article walks the cloud-native stack from first principles. We start with the architectural shift that motivates everything else, then dig into what a container really is at the Linux kernel level, climb up to Kubernetes orchestration, examine when a service mesh earns its complexity, and finish with packaging and delivery via Helm and GitOps. Examples are deliberately concrete: copy-pastable Dockerfiles, real manifests, and the trade-offs that matter when you run this in production.&lt;/p></description></item><item><title>Virtualization Technology Deep Dive</title><link>https://www.chenk.top/en/cloud-computing/virtualization/</link><pubDate>Mon, 20 Feb 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/virtualization/</guid><description>&lt;p>Without virtualization, there is no cloud. Every EC2 instance, every Lambda invocation, every Kubernetes pod ultimately stands on the same trick: lying convincingly to an operating system about the hardware underneath it. This article walks the full stack &amp;ndash; from the CPU instructions that make the trick cheap, through the four hypervisors that dominate the market, to the production-grade tuning knobs that decide whether your VMs run at 70 % or 99 % of bare metal.&lt;/p></description></item><item><title>Cloud Computing Fundamentals and Architecture</title><link>https://www.chenk.top/en/cloud-computing/fundamentals/</link><pubDate>Wed, 01 Feb 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/cloud-computing/fundamentals/</guid><description>&lt;p>Every team building software in 2025 inherits the same buy-or-rent question their predecessors faced &amp;ndash; only the answer has flipped. Twenty years ago you put hardware in a closet; today you describe the hardware in YAML and a global provider conjures it up in seconds, bills it by the second, and tears it down when you stop paying. Cloud computing is not just &amp;ldquo;someone else&amp;rsquo;s computer&amp;rdquo;. It is a programmable, metered, multi-tenant abstraction over compute, storage and networking that has fundamentally changed how businesses are built and how engineers spend their day.&lt;/p></description></item><item><title>Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation</title><link>https://www.chenk.top/en/standalone/gcsan/</link><pubDate>Sun, 15 Jan 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/gcsan/</guid><description>&lt;p>In session-based recommendation you only see a short anonymous click sequence &amp;ndash; no user profile, no long history, no demographics. Every signal you have lives inside that single window. &lt;strong>GC-SAN&lt;/strong> (IJCAI 2019) takes the strongest two ideas of the time &amp;ndash; SR-GNN&amp;rsquo;s session graph and the Transformer&amp;rsquo;s self-attention &amp;ndash; and stacks them: a &lt;em>graph&lt;/em> view captures local transition patterns and loops, a &lt;em>sequence&lt;/em> view captures long-range intent, and a tiny weighted sum decides how much of each to trust. The result is a clean &amp;ldquo;best of both worlds&amp;rdquo; baseline that is genuinely hard to beat at its parameter budget.&lt;/p></description></item><item><title>Computer Fundamentals: Deep Dive and System Integration</title><link>https://www.chenk.top/en/computer-fundamentals/06-deep-dive/</link><pubDate>Sat, 14 Jan 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/06-deep-dive/</guid><description>&lt;p>We&amp;rsquo;ve spent five chapters opening one box at a time — the CPU, the cache hierarchy, storage, the motherboard and GPU, the network and power supply. Each part is interesting on its own, but a computer is not its components. A computer is what happens when those components have to agree, every nanosecond, on what to do next.&lt;/p>
&lt;p>This finale is about that conversation. We&amp;rsquo;ll wire everything together into a single picture, look at the system through the eyes of a profiler, revisit the 80-year-old design tension that still shapes every chip you buy, and end by looking forward — chiplets, photonic interconnects, and the quietly arriving quantum era.&lt;/p></description></item><item><title>Lipschitz Continuity, Strong Convexity &amp; Nesterov Acceleration</title><link>https://www.chenk.top/en/standalone/lipschitz-continuity-strong-convexity-nesterov/</link><pubDate>Tue, 27 Dec 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/lipschitz-continuity-strong-convexity-nesterov/</guid><description>&lt;p>A surprising amount of &amp;ldquo;optimizer folklore&amp;rdquo; collapses into three concepts:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>How steep is the gradient?&lt;/strong> Lipschitz smoothness ($L$-smoothness) caps the step size.&lt;/li>
&lt;li>&lt;strong>How sharp is the bottom?&lt;/strong> $\mu$-strong convexity sets the convergence rate and forces the minimizer to be unique.&lt;/li>
&lt;li>&lt;strong>Can we get there faster without losing stability?&lt;/strong> Nesterov acceleration and adaptive restart turn the per-condition-number cost from $\kappa$ into $\sqrt{\kappa}$.&lt;/li>
&lt;/ul>
&lt;p>This post lays them out on a single thread: nail the geometric intuition with the minimum number of inequalities, prove the key theorems, then close with a least-squares experiment that pits GD, Heavy Ball, and Nesterov against each other. The goal is not to stack formulas — it is to make you able to look at a new problem and instantly answer &amp;ldquo;what step size, what rate, is acceleration worth it?&amp;rdquo;&lt;/p></description></item><item><title>Computer Fundamentals: Network, Power, and Troubleshooting</title><link>https://www.chenk.top/en/computer-fundamentals/05-network-power/</link><pubDate>Sat, 24 Dec 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/05-network-power/</guid><description>&lt;p>Why does the gigabit NIC on your motherboard sometimes negotiate down to 100 Mbps? Why does a brand-new build with a 650 W &amp;ldquo;Gold&amp;rdquo; PSU randomly reboot under heavy GPU load? Why does the room next to the server rack always feel warm? These are the everyday consequences of two systems that most people never look at: &lt;strong>the network I/O pipeline&lt;/strong> and &lt;strong>the power-and-cooling chain&lt;/strong> that keeps the silicon alive.&lt;/p></description></item><item><title>Optimizer Evolution: From Gradient Descent to Adam (and Beyond, 2025)</title><link>https://www.chenk.top/en/standalone/optimizer-evolution-gd-to-adam/</link><pubDate>Fri, 09 Dec 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/optimizer-evolution-gd-to-adam/</guid><description>&lt;p>Why is &amp;ldquo;tuning the LR is an art&amp;rdquo; a meme for ResNet, while every modern LLM paper just writes &amp;ldquo;AdamW, $\beta_1{=}0.9, \beta_2{=}0.95, \mathrm{wd}{=}0.1$&amp;rdquo; and moves on? It is not an accident — it is the &lt;strong>end-point of three decades of optimizer evolution&lt;/strong>.&lt;/p>
&lt;p>This post walks the lineage end-to-end on a single thread: each step exists because of a &lt;strong>specific failure&lt;/strong> of the previous one. We end with the three directions that have actually entered the post-2023 large-model toolkit: Lion, Sophia, and Schedule-Free.&lt;/p></description></item><item><title>Computer Fundamentals: Motherboard, Graphics, and Expansion</title><link>https://www.chenk.top/en/computer-fundamentals/04-motherboard-gpu/</link><pubDate>Sat, 03 Dec 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/04-motherboard-gpu/</guid><description>&lt;p>A modern desktop motherboard is an unusually honest object. Every important design decision — how many PCIe lanes the CPU exposes, which slots are wired straight to the CPU and which tunnel through the chipset, how the VRM is sized to feed a 250 W processor, why the second long PCIe slot only runs at ×4 — is laid out in plain copper on the PCB. If you can read the board, you can predict almost every performance cliff a user will hit. This fourth instalment of the &lt;strong>Computer Fundamentals Deep Dive Series&lt;/strong> teaches that reading skill, then turns it inward to the GPU, where the same lesson applies in miniature: a GPU is a chip whose entire architecture exists to keep thousands of arithmetic lanes fed with data, and almost everything else — caches, schedulers, tensor cores, HBM stacks — is in service of that goal.&lt;/p></description></item><item><title>LLMGR: Integrating Large Language Models with Graphical Session-Based Recommendation</title><link>https://www.chenk.top/en/standalone/llmgr/</link><pubDate>Sat, 26 Nov 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/llmgr/</guid><description>&lt;p>Session-based recommendation lives or dies on the click graph. New items have no edges. Long-tail items have a handful of noisy edges. Yet every item ships with a title and a description that the model never reads. &lt;strong>LLMGR&lt;/strong> plugs that hole: treat the LLM as a &amp;ldquo;semantic engine&amp;rdquo; that turns text into representations a graph encoder can fuse with, then let a GNN do what it does best &amp;ndash; rank. The headline result on Amazon Music/Beauty/Pantry: HR@20 up ~8.68%, NDCG@20 up ~10.71%, MRR@20 up ~11.75% over the strongest GNN baseline, with the largest uplift concentrated on cold-start items.&lt;/p></description></item><item><title>Computer Fundamentals: Storage Systems (HDD vs SSD)</title><link>https://www.chenk.top/en/computer-fundamentals/03-storage/</link><pubDate>Sat, 12 Nov 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/03-storage/</guid><description>&lt;p>Why can a single SSD swap &amp;ldquo;resurrect&amp;rdquo; a five-year-old laptop? Why does a TLC drive rated for only 1 000 P/E cycles still last more than a decade for normal users? Why does a brand-new SSD that benchmarks at 3 500 MB/s sometimes collapse to 50 MB/s after a few weeks? This third instalment of the &lt;strong>Computer Fundamentals Deep Dive Series&lt;/strong> answers those questions from first principles. We will look at how rotating magnetic platters compare with charge-trap NAND cells, how the bandwidth of an interface (SATA, PCIe Gen 3/4/5) interacts with the parallelism of a protocol (AHCI vs NVMe), how RAID levels trade capacity for fault tolerance, how a file system organises bytes on a raw block device, and how to keep all of this fast and safe in production.&lt;/p></description></item><item><title>Computer Fundamentals: Memory and Cache Systems</title><link>https://www.chenk.top/en/computer-fundamentals/02-memory/</link><pubDate>Sat, 22 Oct 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/02-memory/</guid><description>&lt;p>A CPU core can complete a multiplication in roughly &lt;strong>0.3 ns&lt;/strong>. A spinning hard disk needs &lt;strong>10 ms&lt;/strong> to seat its head over a sector. Between those two numbers sits a factor of about &lt;strong>30 million&lt;/strong>. Every line of memory engineering — caches, DRAM cells, page tables, TLBs, ECC, NUMA, channels — is a coordinated answer to that single, brutal asymmetry.&lt;/p>
&lt;p>This is part 2 of the &lt;strong>Computer Fundamentals Deep Dive&lt;/strong>. We will not stop at &amp;ldquo;DDR is fast and RAM is volatile&amp;rdquo;. We will trace a single load instruction from the CPU pipeline through the L1, L2, L3 caches, the TLB, the page table, the memory controller, the channels, and finally the DRAM cells themselves — and look at what each layer is actually doing, and why.&lt;/p></description></item><item><title>Computer Fundamentals: CPU and the Computing Core</title><link>https://www.chenk.top/en/computer-fundamentals/01-cpu/</link><pubDate>Sat, 01 Oct 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/computer-fundamentals/01-cpu/</guid><description>&lt;p>Why does your 100 Mbps internet only download at about 12 MB/s? Why does a &amp;ldquo;1 TB&amp;rdquo; hard drive show only 931 GB in Windows? Why does a 32-bit system top out around 3.2 GB of usable RAM? And what &lt;em>actually&lt;/em> happens, cycle by cycle, when the CPU runs your code?&lt;/p>
&lt;p>This is part 1 of the &lt;strong>Computer Fundamentals&lt;/strong> series. We start from bits and bytes, then go down into the CPU itself: pipelines, caches, branch prediction, out-of-order execution, multiple cores, and SMT. By the end you should be able to read a CPU spec sheet — or a perf profile — and know what each number is paying for.&lt;/p></description></item><item><title>LeetCode Patterns: Greedy Algorithms</title><link>https://www.chenk.top/en/leetcode/09-greedy-algorithms/</link><pubDate>Tue, 13 Sep 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/09-greedy-algorithms/</guid><description>&lt;p>Greedy is the algorithm paradigm that feels too good to be true: at every step, take the choice that looks best right now, never look back, and somehow end up at the global optimum. When it works, the code is almost embarrassingly short. When it doesn&amp;rsquo;t, it produces confidently wrong answers — which is why the real skill is not writing greedy code, but recognising &lt;strong>when greedy is allowed&lt;/strong>.&lt;/p>
&lt;p>This article walks through the structural reason greedy is correct on some problems and broken on others, then applies that lens to seven LeetCode classics: &lt;strong>Jump Game&lt;/strong>, &lt;strong>Jump Game II&lt;/strong>, &lt;strong>Gas Station&lt;/strong>, &lt;strong>Best Time to Buy and Sell Stock II&lt;/strong>, &lt;strong>Non-overlapping Intervals&lt;/strong>, &lt;strong>Task Scheduler&lt;/strong>, and &lt;strong>Partition Labels&lt;/strong>.&lt;/p></description></item><item><title>LeetCode Patterns: Stack and Queue</title><link>https://www.chenk.top/en/leetcode/stack-and-queue/</link><pubDate>Mon, 29 Aug 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/stack-and-queue/</guid><description>&lt;p>Stacks and queues look unassuming next to graphs or DP, but they sit underneath an astonishing fraction of interview problems. The reason is simple: most algorithmic questions are really questions about &lt;em>order of access&lt;/em>. Stacks give you LIFO (last in, first out); queues give you FIFO (first in, first out); and once you add the variants — monotonic stack, deque, priority queue — you have efficient answers for bracket matching, next-greater-element, sliding-window extrema, top-K, BFS, and a long tail of &amp;ldquo;implement X using Y&amp;rdquo; puzzles.&lt;/p></description></item><item><title>LeetCode Patterns: Backtracking Algorithms</title><link>https://www.chenk.top/en/leetcode/backtracking/</link><pubDate>Sun, 14 Aug 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/backtracking/</guid><description>&lt;p>Backtracking is the algorithm you reach for whenever a problem asks you to &lt;em>enumerate&lt;/em> something — every permutation, every subset, every legal board, every path through a grid. It is brute force with a brain: you build a candidate solution one decision at a time, abandon it the moment a constraint says &amp;ldquo;this cannot work&amp;rdquo;, and undo your last move so the next branch sees a clean slate. The whole technique fits in three lines:&lt;/p></description></item><item><title>Multimodal LLMs and Downstream Tasks: A Practitioner's Guide</title><link>https://www.chenk.top/en/standalone/multimodal-llm-downstream-tasks/</link><pubDate>Fri, 05 Aug 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/multimodal-llm-downstream-tasks/</guid><description>&lt;p>Stuffing pixels, audio, and video into a language model so it can &amp;ldquo;see,&amp;rdquo; &amp;ldquo;hear,&amp;rdquo; and reason &amp;ndash; that was a research curiosity before CLIP landed in 2021. Today it&amp;rsquo;s table stakes for most consumer-facing AI products. But shipping a Multimodal LLM (MLLM) in production turns out to be hard in places people rarely talk about. Almost never the vision encoder. Almost always these four:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>Alignment.&lt;/strong> How does the language model &amp;ldquo;understand&amp;rdquo; what the vision encoder produces? Is the projector a 2-layer MLP or a Q-Former? Which parameters thaw during training?&lt;/li>
&lt;li>&lt;strong>Task framing.&lt;/strong> The same MLLM has to do captioning, VQA, grounding, OCR. Each needs a prompt template that doesn&amp;rsquo;t quietly drop several points of accuracy.&lt;/li>
&lt;li>&lt;strong>Cost.&lt;/strong> A 1024x1024 image becomes hundreds of visual tokens. Prefill is brutal. Stretch that to video and the bill goes vertical. Token compression, KV cache reuse, and batching are not optional.&lt;/li>
&lt;li>&lt;strong>Evaluation.&lt;/strong> A model that scores 80 on MMBench can still hallucinate confidently on your customer&amp;rsquo;s invoice. Public benchmarks are the easy part.&lt;/li>
&lt;/ol>
&lt;p>This post follows the natural research arc &amp;ndash; architecture, model families, downstream tasks, fine-tuning, evaluation, deployment &amp;ndash; and tries to be specific enough at each stop that you can act on it. Less &amp;ldquo;what&amp;rsquo;s possible,&amp;rdquo; more &amp;ldquo;what to actually pick.&amp;rdquo;&lt;/p></description></item><item><title>Operating System Fundamentals: A Deep Dive</title><link>https://www.chenk.top/en/standalone/operating-system-fundamentals-deep-dive/</link><pubDate>Mon, 01 Aug 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/operating-system-fundamentals-deep-dive/</guid><description>&lt;p>Open a terminal and type &lt;code>cat hello.txt&lt;/code>. The instant you press Enter, at least seven layers of machinery wake up: bash parses the line, fork+execve launches the cat process, the kernel hands it a virtual address space, cat issues a &lt;code>read()&lt;/code> syscall, the CPU traps into kernel mode, VFS dispatches to ext4, the block layer queues an NVMe request, the SSD DMA-writes the bytes back, an interrupt wakes cat, the bytes are copied through the page cache into the user buffer, and finally something appears on your screen.&lt;/p></description></item><item><title>LeetCode Patterns: Dynamic Programming Basics</title><link>https://www.chenk.top/en/leetcode/dynamic-programming-basics/</link><pubDate>Sat, 30 Jul 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/dynamic-programming-basics/</guid><description>&lt;p>Dynamic programming has a reputation for being the algorithm topic that
separates &amp;ldquo;competent coder&amp;rdquo; from &amp;ldquo;interview wizard&amp;rdquo;. A lot of that
reputation is unearned. DP is not a bag of clever tricks; it is a single
recipe applied to problems that happen to have repeated subproblems. If
you can answer three questions cleanly, you can solve almost any DP
problem on LeetCode:&lt;/p>
&lt;ol>
&lt;li>&lt;strong>What does &lt;code>dp[i]&lt;/code> actually mean?&lt;/strong> (state)&lt;/li>
&lt;li>&lt;strong>How do I build &lt;code>dp[i]&lt;/code> from smaller answers?&lt;/strong> (transition)&lt;/li>
&lt;li>&lt;strong>What are the smallest answers I already know?&lt;/strong> (base case)&lt;/li>
&lt;/ol>
&lt;p>This article walks through that recipe, then applies it to the seven
problems every DP study list eventually converges on: Climbing Stairs,
House Robber, Coin Change, Longest Increasing Subsequence, 0/1 Knapsack,
Longest Common Subsequence, and Edit Distance.&lt;/p></description></item><item><title>Proximal Operator: From Moreau Envelope to ISTA/FISTA and ADMM</title><link>https://www.chenk.top/en/standalone/proximal-operator/</link><pubDate>Mon, 25 Jul 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/proximal-operator/</guid><description>&lt;p>When your objective contains a non-smooth piece (sparse regularisation, total variation, an indicator of a constraint set) or a constraint that is hard to handle directly, &amp;ldquo;just do gradient descent&amp;rdquo; stalls &amp;ndash; there is no gradient at the kink, or every step violates feasibility. The &lt;strong>proximal operator&lt;/strong> is the engineered, beautiful workaround: think of each update as &amp;ldquo;take a step on the smooth part, then run a tiny penalised minimisation that pulls the iterate back toward a structured solution&amp;rdquo;.&lt;/p></description></item><item><title>Graph Neural Networks for Learning Equivariant Representations of Neural Networks</title><link>https://www.chenk.top/en/standalone/gnn-equivariant-representations/</link><pubDate>Fri, 22 Jul 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/gnn-equivariant-representations/</guid><description>&lt;p>You can shuffle the hidden neurons of a trained MLP and get the &lt;em>exact&lt;/em> same function back &amp;ndash; but the flat parameter vector now looks completely different. This single fact ruins most attempts at &amp;ldquo;learning over neural networks&amp;rdquo;: naive representations treat two functionally identical models as two unrelated points in parameter space, and the downstream learner wastes capacity rediscovering a symmetry it should have for free. This paper &amp;ndash; &lt;em>Graph Neural Networks for Learning Equivariant Representations of Neural Networks&lt;/em> (Kofinas et al., ICML 2024) &amp;ndash; proposes the clean fix: turn the network itself into a graph, then use a GNN whose architecture &lt;em>natively&lt;/em> respects the relevant permutation symmetry.&lt;/p></description></item><item><title>LeetCode Patterns: Binary Tree Traversal and Construction</title><link>https://www.chenk.top/en/leetcode/binary-tree-traversal/</link><pubDate>Fri, 15 Jul 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/binary-tree-traversal/</guid><description>&lt;p>A binary tree problem is rarely about the tree. It is about &lt;em>the order in which you touch nodes&lt;/em> and &lt;em>what you remember from the children before deciding what to do at the parent&lt;/em>. Once those two ideas click, the four traversal orders, the iterative rewrites, the construction problems, and even classics like Validate BST and Maximum Depth all collapse into a handful of variations on the same recipe. This article builds that recipe end to end.&lt;/p></description></item><item><title>LeetCode Patterns: Binary Search</title><link>https://www.chenk.top/en/leetcode/binary-search/</link><pubDate>Thu, 30 Jun 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/binary-search/</guid><description>&lt;p>Binary search is the algorithm everyone thinks they understand until they have to write it under interview pressure. The idea is one sentence — &lt;em>halve the search space at every step&lt;/em> — but the implementation is a minefield of off-by-one errors, infinite loops, and subtly wrong return values. The goal of this article is not to give you yet another recitation of the standard template; it is to give you a &lt;strong>mental model&lt;/strong> that explains why each template looks the way it does, and a small toolkit (three templates plus the answer-space pattern) that covers the vast majority of LeetCode problems.&lt;/p></description></item><item><title>LeetCode Patterns: Sliding Window Technique</title><link>https://www.chenk.top/en/leetcode/sliding-window/</link><pubDate>Wed, 15 Jun 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/sliding-window/</guid><description>&lt;p>If you have ever caught yourself writing a double &lt;code>for&lt;/code> loop to inspect every contiguous subarray, &lt;strong>sliding window&lt;/strong> is probably the optimisation you are missing. It turns an $O(nk)$ or $O(n^2)$ scan into a single linear pass by &lt;em>reusing the work&lt;/em> it has already done. This article walks through the technique from first principles, then drills four canonical LeetCode problems plus a monotonic-deque variant.&lt;/p>
&lt;h2 id="1-the-idea-in-one-picture">1. The Idea in One Picture&lt;/h2>
&lt;p>A sliding window is a contiguous range &lt;code>[left, right]&lt;/code> over an array or string. Instead of recomputing everything when the range moves, we &lt;strong>add the element entering on the right&lt;/strong> and &lt;strong>remove the element leaving on the left&lt;/strong>. Each element is touched at most twice, so the total cost is $O(n)$.&lt;/p></description></item><item><title>LeetCode Patterns: Linked List Operations</title><link>https://www.chenk.top/en/leetcode/linked-list-operations/</link><pubDate>Tue, 31 May 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/linked-list-operations/</guid><description>&lt;p>A linked list is the simplest data structure that forces you to &lt;strong>think in pointers&lt;/strong>. Arrays let you index in $O(1)$ and forget about layout; linked lists hand you a head pointer and ask, &lt;em>&amp;ldquo;now what?&amp;rdquo;&lt;/em> That single shift — from indices to references — is what makes linked-list problems so common in interviews. They are short to state, brutal to get right, and reward exactly the habits good engineers build: drawing pictures, naming pointers, and &lt;strong>never dereferencing without checking for &lt;code>None&lt;/code>&lt;/strong>.&lt;/p></description></item><item><title>LeetCode Patterns: Two Pointers</title><link>https://www.chenk.top/en/leetcode/two-pointers/</link><pubDate>Mon, 16 May 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/two-pointers/</guid><description>&lt;p>Hash tables buy you speed by spending memory. Two pointers is the opposite trade: spend a little structural assumption — the array is sorted, the list might have a cycle, the answer lives in a contiguous window — and you get $O(n)$ time with $O(1)$ extra space. The pattern looks trivial in code (two indices and a &lt;code>while&lt;/code> loop) but it has more failure modes than any other beginner technique: off-by-one indices, infinite loops, missed duplicates, wrong pointer moved on tie. The cure is to think in &lt;strong>invariants&lt;/strong> rather than in moves.&lt;/p></description></item><item><title>LeetCode Patterns: Hash Tables</title><link>https://www.chenk.top/en/leetcode/hash-tables/</link><pubDate>Sun, 01 May 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/leetcode/hash-tables/</guid><description>&lt;p>A hash table is the cheapest superpower in your toolbox. You spend a constant amount of memory per stored item, and in return every &amp;ldquo;is &lt;em>x&lt;/em> in here?&amp;rdquo; question costs roughly one CPU instruction. Whole families of &lt;code>O(n²)&lt;/code> brute-force solutions collapse into a single &lt;code>O(n)&lt;/code> pass once you reach for one.&lt;/p>
&lt;p>This article is the first installment of the &lt;strong>LeetCode Patterns&lt;/strong> series. We will build hash table intuition from scratch, then work through four template problems — &lt;strong>Two Sum&lt;/strong>, &lt;strong>Group Anagrams&lt;/strong>, &lt;strong>Longest Substring Without Repeating Characters&lt;/strong>, and &lt;strong>Top K Frequent Elements&lt;/strong> — each illustrating a reusable pattern you will see again and again on harder problems.&lt;/p></description></item><item><title>Linux Pipelines and File Operations: Composing Tools into Data Flows</title><link>https://www.chenk.top/en/linux/pipelines/</link><pubDate>Sat, 02 Apr 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/pipelines/</guid><description>&lt;p>The biggest productivity jump on Linux is not memorising more commands. It is learning to &lt;strong>compose small tools&lt;/strong> into clean data flows. The pipe operator &lt;code>|&lt;/code> is the embodiment of the Unix philosophy: each tool does one thing and does it well (&lt;code>grep&lt;/code> only filters, &lt;code>awk&lt;/code> only extracts fields, &lt;code>sort&lt;/code> only sorts), and you chain them into a pipeline that is readable, debuggable, and obvious to maintain. This article starts from the data-flow model &amp;ndash; &lt;code>stdin&lt;/code>, &lt;code>stdout&lt;/code>, &lt;code>stderr&lt;/code> and the file descriptors behind them &amp;ndash; then walks through every common redirection form (&lt;code>&amp;gt;&lt;/code>, &lt;code>&amp;gt;&amp;gt;&lt;/code>, &lt;code>&amp;lt;&lt;/code>, &lt;code>2&amp;gt;&lt;/code>, &lt;code>2&amp;gt;&amp;amp;1&lt;/code>, &lt;code>&amp;amp;&amp;gt;&lt;/code>), builds up the text-processing toolchain (&lt;code>grep&lt;/code>, &lt;code>awk&lt;/code>, &lt;code>sed&lt;/code>, &lt;code>cut&lt;/code>, &lt;code>tr&lt;/code>, &lt;code>sort&lt;/code>, &lt;code>uniq&lt;/code>, &lt;code>xargs&lt;/code>, &lt;code>tee&lt;/code>), and ends with two patterns most introductions skip: named pipes (FIFOs) and process substitution. By the end you should be able to replace many &amp;ldquo;I need to write a script&amp;rdquo; tasks with one or two readable command lines, and read other people&amp;rsquo;s one-liners without squinting.&lt;/p></description></item><item><title>Linux Process and Resource Management: From `top` to cgroups</title><link>https://www.chenk.top/en/linux/process-resource-management/</link><pubDate>Sun, 20 Mar 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/process-resource-management/</guid><description>&lt;p>The job of a Linux operator is rarely &amp;ldquo;memorise more commands&amp;rdquo;. It is to take a fuzzy symptom — &lt;em>the site feels slow, the API timed out, the box is unresponsive&lt;/em> — and quickly &lt;strong>map it to the right axis&lt;/strong>: is the CPU saturated, is memory being eaten by cache (which is fine) or by a runaway process (which is not), is the disk queue full, is some socket leaking? Once the axis is named, the tool follows almost mechanically.&lt;/p></description></item><item><title>Linux Service Management: systemd, systemctl, and journald</title><link>https://www.chenk.top/en/linux/service-management/</link><pubDate>Mon, 07 Mar 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/service-management/</guid><description>&lt;p>A &amp;ldquo;service&amp;rdquo; on Linux is a long-running background process whose
job is to be there when something needs it: synchronise the clock,
listen for SSH connections, accept HTTP requests, run a backup at 3 AM.
You almost never start one of these by hand. Something has to start
them at boot, restart them when they crash, capture their logs, decide
what depends on what, and shut everything down cleanly when the machine
powers off. On every modern distribution that something is
&lt;strong>systemd&lt;/strong>.&lt;/p></description></item><item><title>Linux User Management: Users, Groups, sudo, and Security</title><link>https://www.chenk.top/en/linux/user-management/</link><pubDate>Tue, 22 Feb 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/user-management/</guid><description>&lt;p>If you only ever ran &lt;code>useradd&lt;/code> and &lt;code>passwd&lt;/code> on a single laptop, you can probably get away without thinking about any of this. The moment more than one human (or more than one service) shares a host, &amp;ldquo;user management&amp;rdquo; stops being paperwork and starts being the security model: it decides who can log in, which UID owns the files a process writes, which commands &lt;code>sudo&lt;/code> will lift to root, and how long a stolen password remains useful.&lt;/p></description></item><item><title>Linux Package Management: apt, dnf, pacman, and Building from Source</title><link>https://www.chenk.top/en/linux/package-management/</link><pubDate>Wed, 09 Feb 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/package-management/</guid><description>&lt;p>Most people learn package management as three commands: &lt;code>install&lt;/code>, &lt;code>remove&lt;/code>, &lt;code>upgrade&lt;/code>. That works until something goes wrong - a dependency conflict, an upgrade that won&amp;rsquo;t apply, a kernel that doesn&amp;rsquo;t boot, a mirror that times out from inside China. At that point you need a model of what is actually happening: what a &lt;em>package&lt;/em> contains, what the &lt;em>manager&lt;/em> is solving for, where it stores state, and how the difference between Debian&amp;rsquo;s &lt;code>apt/dpkg&lt;/code> and Red Hat&amp;rsquo;s &lt;code>dnf/rpm&lt;/code> shows up at 2 a.m. on a production box.&lt;/p></description></item><item><title>Linux Disk Management: Partitions, Filesystems, LVM, and the Mount Stack</title><link>https://www.chenk.top/en/linux/disk-management/</link><pubDate>Thu, 27 Jan 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/disk-management/</guid><description>&lt;p>Disk problems in production almost never have a one-line fix. You are
usually navigating a layered stack: the &lt;strong>block device&lt;/strong> (a physical
or virtual disk), the &lt;strong>partition table&lt;/strong> (MBR or GPT), an optional
&lt;strong>LVM&lt;/strong> layer that decouples filesystems from disks, the
&lt;strong>filesystem driver&lt;/strong> (ext4, xfs, btrfs) that gives meaning to the
raw bytes, and finally the &lt;strong>mount point&lt;/strong> in the directory tree that
applications actually open files through. Most outages I have seen
become tractable the moment you can name which layer is misbehaving.&lt;/p></description></item><item><title>Linux File Permissions: rwx, chmod, chown, and Beyond</title><link>https://www.chenk.top/en/linux/file-permissions/</link><pubDate>Fri, 14 Jan 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/file-permissions/</guid><description>&lt;p>File permissions look elementary — &lt;code>chmod 755&lt;/code>, done — but they remain one of the top causes of production incidents I see: a service won&amp;rsquo;t start, a deploy script silently does nothing, Nginx returns &lt;code>403&lt;/code>, a shared directory leaks, or &lt;code>rm&lt;/code> refuses on a file that &amp;ldquo;should&amp;rdquo; be removable. Memorising magic numbers does not get you out of any of these. What does is understanding three things at the same time:&lt;/p></description></item><item><title>Linux Basics: Core Concepts and Essential Commands</title><link>https://www.chenk.top/en/linux/basics/</link><pubDate>Sat, 01 Jan 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/linux/basics/</guid><description>&lt;p>The &amp;ldquo;difficulty&amp;rdquo; of Linux rarely lives in the commands themselves. The hard part is whether you have a clear &lt;em>map&lt;/em> of the system: why it dominates servers, what multi-user and per-file permissions actually buy you, what changes when you switch between Debian and Red Hat lineages, and what to do in the first ten minutes after an SSH prompt opens. This post is the &lt;strong>entry guide&lt;/strong> for the entire Linux series. It first builds the mental model &amp;ndash; philosophy, distributions, the FHS tree &amp;ndash; and then walks you through the commands you will use ten times an hour: &lt;code>cd ls pwd&lt;/code>, &lt;code>cp mv rm mkdir&lt;/code>, &lt;code>cat less head tail&lt;/code>, &lt;code>find grep&lt;/code>, plus pipelines, redirection, SSH, and a quick taste of permissions and processes. Each topic is intentionally &lt;strong>kept short&lt;/strong>; depth lives in the dedicated articles (File Permissions, Disk Management, User Management, Service Management, Process Management, Package Management, Advanced File Operations).&lt;/p></description></item><item><title>About</title><link>https://www.chenk.top/en/about/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.chenk.top/en/about/</guid><description/></item><item><title>Archives</title><link>https://www.chenk.top/en/archives/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.chenk.top/en/archives/</guid><description/></item><item><title>Projects</title><link>https://www.chenk.top/en/projects/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.chenk.top/en/projects/</guid><description/></item><item><title>Series</title><link>https://www.chenk.top/en/series/</link><pubDate>Mon, 01 Jan 0001 00:00:00 +0000</pubDate><guid>https://www.chenk.top/en/series/</guid><description/></item></channel></rss>