<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Pretraining on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/pretraining/</link><description>Recent content in Pretraining on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Sun, 29 Mar 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/pretraining/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (3): Pretraining at Scale</title><link>https://www.chenk.top/en/llm-engineering/03-pretraining/</link><pubDate>Sun, 29 Mar 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/03-pretraining/</guid><description>&lt;p>Pretraining is where most of an LLM&amp;rsquo;s capability comes from, and it&amp;rsquo;s also where the leaderboard-vs-reality gap is widest. Most published runs are heroic engineering more than they are scientific results. This chapter is about the parts of pretraining that you actually have to get right when you&amp;rsquo;re not OpenAI: the data, the parallelism choice, and the failure modes that only show up when the cluster is large enough to make a single bad NCCL all-reduce kill a 30-day run.&lt;/p></description></item></channel></rss>