<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>LoRA on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/lora/</link><description>Recent content in LoRA on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 30 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/lora/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (4): Post-training — SFT, DPO, RLHF, RLAIF</title><link>https://www.chenk.top/en/llm-engineering/04-post-training/</link><pubDate>Mon, 30 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/04-post-training/</guid><description>&lt;p>A base model from pretraining can complete text but cannot follow instructions, refuse harmful requests, or maintain a persona—these are post-training behaviors. Post-training is where the gap between a research paper&amp;rsquo;s claims and a production-grade model lies. This chapter covers what each post-training algorithm optimizes, why most reward models are subtly flawed, and the effective methods for 2026.&lt;/p>
&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/llm-engineering/04-post-training/illustration_1.png" alt="LLM Engineering (4): Post-training — SFT, DPO, RLHF, RLAIF — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p></description></item><item><title>NLP (8): Model Fine-tuning and PEFT</title><link>https://www.chenk.top/en/nlp/fine-tuning-peft/</link><pubDate>Wed, 05 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/fine-tuning-peft/</guid><description>&lt;p>In 2020, fine-tuning a 7-billion-parameter language model was a project budget item: eight A100s, several days, and an engineer who knew how to babysit gradient checkpointing. In 2024, a graduate student does it on a laptop. The distance between those two worlds is almost entirely covered by one paper — Hu et al.&amp;rsquo;s LoRA (ICLR 2022) — and one follow-up — Dettmers et al.&amp;rsquo;s QLoRA (NeurIPS 2023).&lt;/p>
&lt;p>The shift is not just engineering. Parameter-Efficient Fine-Tuning (PEFT) reframes what it means to &amp;ldquo;have a model.&amp;rdquo; Instead of one binary blob per task, you keep a single frozen base model and a directory of small adapter files, each a few tens of megabytes. Switching tasks becomes loading a new adapter; serving N domains becomes O(1) base + N · ε.&lt;/p></description></item><item><title>Transfer Learning (9): Parameter-Efficient Fine-Tuning</title><link>https://www.chenk.top/en/transfer-learning/09-parameter-efficient-fine-tuning/</link><pubDate>Wed, 18 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/09-parameter-efficient-fine-tuning/</guid><description>&lt;p>How do you fine-tune a 175B-parameter model on a single GPU? Update only 0.1% of the parameters. Parameter-Efficient Fine-Tuning (PEFT) makes this possible — and on most benchmarks it matches full fine-tuning. This post derives the math behind LoRA, Adapter, Prefix-Tuning, Prompt-Tuning, BitFit and QLoRA, and gives you a single picture for choosing among them.&lt;/p>
&lt;p>&lt;figure class="article-figure">
 &lt;img src="https://blog-pic-ck.oss-cn-beijing.aliyuncs.com/posts/en/transfer-learning/09-parameter-efficient-fine-tuning/illustration_1.png" alt="Transfer Learning (9): Parameter-Efficient Fine-Tuning — Chapter overview" loading="lazy" decoding="async" class="content-image">
 
&lt;/figure>
&lt;/p>
&lt;hr>
&lt;h2 id="what-you-will-learn" class="heading-anchor">What You Will Learn&lt;a href="#what-you-will-learn" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Why the low-rank assumption holds for weight updates&lt;/li>
&lt;li>LoRA: derivation, initialization, scaling, and weight merging&lt;/li>
&lt;li>Adapter: bottleneck architecture and where to insert it&lt;/li>
&lt;li>Prefix-Tuning vs Prompt-Tuning vs P-Tuning v2&lt;/li>
&lt;li>QLoRA: how 4-bit quantisation gets a 65B model on one GPU&lt;/li>
&lt;li>Method comparison and a selection guide grounded in GLUE numbers&lt;/li>
&lt;/ul>
&lt;h2 id="prerequisites" class="heading-anchor">Prerequisites&lt;a href="#prerequisites" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;ul>
&lt;li>Transformer architecture (attention, FFN, residual + LayerNorm)&lt;/li>
&lt;li>Matrix decomposition basics (rank, SVD)&lt;/li>
&lt;li>Transfer learning fundamentals (Parts 1-6)&lt;/li>
&lt;/ul>
&lt;hr>
&lt;h2 id="the-full-fine-tuning-problem" class="heading-anchor">The Full Fine-Tuning Problem&lt;a href="#the-full-fine-tuning-problem" class="heading-link" aria-label="Permalink to this section" title="Copy link to this section">#&lt;/a>
&lt;/h2>&lt;span class="math-block">$$\boldsymbol{\theta}^* = \arg\min_{\boldsymbol{\theta}} \mathcal{L}(\boldsymbol{\theta})$$&lt;/span>
&lt;p>
For GPT-3 (175B params) this means roughly &lt;strong>700 GB of FP32 weights&lt;/strong>, plus gradients, plus optimiser states — and one full copy per task. Even after the model fits, the per-task storage and serving cost is brutal: 100 customers means 100 copies of a 700 GB checkpoint.&lt;/p></description></item><item><title>Transfer Learning (2): Pre-training and Fine-tuning</title><link>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</link><pubDate>Wed, 07 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</guid><description>&lt;p>BERT changed NLP overnight. A model pre-trained on Wikipedia and BookCorpus could be fine-tuned on a few thousand labelled examples and beat task-specific architectures that researchers had spent years hand-crafting. The same pattern repeated in vision (ImageNet pre-training, then SimCLR, MAE), in speech (wav2vec 2.0), and in code (Codex). Today, &amp;ldquo;pre-train once, fine-tune everywhere&amp;rdquo; is the default recipe of modern deep learning.&lt;/p>
&lt;p>But &lt;em>why&lt;/em> does pre-training work? When should you freeze layers, when should you LoRA, and how small does your learning rate need to be? This article unpacks both the theory and the engineering practice behind the most successful transfer paradigm we have.&lt;/p></description></item></channel></rss>