<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Fine-Tuning on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/fine-tuning/</link><description>Recent content in Fine-Tuning on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Wed, 05 Nov 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/fine-tuning/index.xml" rel="self" type="application/rss+xml"/><item><title>NLP (8): Model Fine-tuning and PEFT</title><link>https://www.chenk.top/en/nlp/fine-tuning-peft/</link><pubDate>Wed, 05 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/fine-tuning-peft/</guid><description>&lt;p>In 2020, fine-tuning a 7-billion-parameter language model was a project budget item: eight A100s, several days, and an engineer who knew how to babysit gradient checkpointing. In 2024, a graduate student does it on a laptop. The distance between those two worlds is almost entirely covered by one paper — Hu et al.&amp;rsquo;s LoRA (ICLR 2022) — and one follow-up — Dettmers et al.&amp;rsquo;s QLoRA (NeurIPS 2023).&lt;/p>
&lt;p>The shift is not just engineering. Parameter-Efficient Fine-Tuning (PEFT) reframes what it means to &amp;ldquo;have a model.&amp;rdquo; Instead of one binary blob per task, you keep a single frozen base model and a directory of small adapter files, each a few tens of megabytes. Switching tasks becomes loading a new adapter; serving N domains becomes O(1) base + N · ε.&lt;/p></description></item><item><title>Transfer Learning (2): Pre-training and Fine-tuning</title><link>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</link><pubDate>Wed, 07 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</guid><description>&lt;p>BERT changed NLP overnight. A model pre-trained on Wikipedia and BookCorpus could be fine-tuned on a few thousand labelled examples and beat task-specific architectures that researchers had spent years hand-crafting. The same pattern repeated in vision (ImageNet pre-training, then SimCLR, MAE), in speech (wav2vec 2.0), and in code (Codex). Today, &amp;ldquo;pre-train once, fine-tune everywhere&amp;rdquo; is the default recipe of modern deep learning.&lt;/p>
&lt;p>But &lt;em>why&lt;/em> does pre-training work? When should you freeze layers, when should you LoRA, and how small does your learning rate need to be? This article unpacks both the theory and the engineering practice behind the most successful transfer paradigm we have.&lt;/p></description></item></channel></rss>