<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>GPT on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/gpt/</link><description>Recent content in GPT on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 26 Oct 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/gpt/index.xml" rel="self" type="application/rss+xml"/><item><title>NLP (6): GPT and Generative Language Models</title><link>https://www.chenk.top/en/nlp/gpt-generative-models/</link><pubDate>Sun, 26 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/gpt-generative-models/</guid><description>&lt;p>When you ask ChatGPT a question and a fluent multi-paragraph answer streams back token by token, you are watching a single deceptively simple loop: feed everything-so-far into a Transformer decoder, look at the probability distribution it produces over the vocabulary, pick one token, append it, repeat. That is &lt;em>all&lt;/em> an autoregressive language model does. The miracle is not the loop — it is what happens when you scale the network behind the loop to hundreds of billions of parameters and train it on most of the internet.&lt;/p></description></item><item><title>Transfer Learning (2): Pre-training and Fine-tuning</title><link>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</link><pubDate>Wed, 07 May 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/02-pre-training-and-fine-tuning/</guid><description>&lt;p>BERT changed NLP overnight. A model pre-trained on Wikipedia and BookCorpus could be fine-tuned on a few thousand labelled examples and beat task-specific architectures that researchers had spent years hand-crafting. The same pattern repeated in vision (ImageNet pre-training, then SimCLR, MAE), in speech (wav2vec 2.0), and in code (Codex). Today, &amp;ldquo;pre-train once, fine-tune everywhere&amp;rdquo; is the default recipe of modern deep learning.&lt;/p>
&lt;p>But &lt;em>why&lt;/em> does pre-training work? When should you freeze layers, when should you LoRA, and how small does your learning rate need to be? This article unpacks both the theory and the engineering practice behind the most successful transfer paradigm we have.&lt;/p></description></item></channel></rss>