<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Language Models on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/language-models/</link><description>Recent content in Language Models on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 26 Oct 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/language-models/index.xml" rel="self" type="application/rss+xml"/><item><title>NLP (6): GPT and Generative Language Models</title><link>https://www.chenk.top/en/nlp/gpt-generative-models/</link><pubDate>Sun, 26 Oct 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/gpt-generative-models/</guid><description>&lt;p>When you ask ChatGPT a question and a fluent multi-paragraph answer streams back token by token, you are watching a single deceptively simple loop: feed everything-so-far into a Transformer decoder, look at the probability distribution it produces over the vocabulary, pick one token, append it, repeat. That is &lt;em>all&lt;/em> an autoregressive language model does. The miracle is not the loop — it is what happens when you scale the network behind the loop to hundreds of billions of parameters and train it on most of the internet.&lt;/p></description></item></channel></rss>