<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>XLM-R on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/xlm-r/</link><description>Recent content in XLM-R on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Mon, 30 Jun 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/xlm-r/index.xml" rel="self" type="application/rss+xml"/><item><title>Transfer Learning (11): Cross-Lingual Transfer</title><link>https://www.chenk.top/en/transfer-learning/11-cross-lingual-transfer/</link><pubDate>Mon, 30 Jun 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/transfer-learning/11-cross-lingual-transfer/</guid><description>&lt;p>English has the labels. The world has 7,000+ languages. Cross-lingual transfer is what lets a sentiment classifier trained only on English IMDB reviews score Spanish tweets, what makes a question-answering model fine-tuned on SQuAD answer Hindi questions, and what allows a model that has never seen a single labeled Swahili sentence to do passable Swahili NER.&lt;/p>
&lt;p>This post derives why that is even possible. We start from the bilingual-embedding alignment that motivated the field, walk through the multilingual pretraining recipe (mBERT, XLM-R) that made parallel data optional, and end with the practical playbook — zero-shot vs translate-train vs translate-test, when to pick which, and where the wheels come off.&lt;/p></description></item></channel></rss>