<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Embeddings on Chen Kai Blog</title><link>https://www.chenk.top/en/tags/embeddings/</link><description>Recent content in Embeddings on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Fri, 03 Apr 2026 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/tags/embeddings/index.xml" rel="self" type="application/rss+xml"/><item><title>LLM Engineering (8): Retrieval-Augmented Generation</title><link>https://www.chenk.top/en/llm-engineering/08-rag/</link><pubDate>Fri, 03 Apr 2026 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/llm-engineering/08-rag/</guid><description>&lt;p>RAG is the most over-deployed and under-engineered pattern in LLM applications. The 2024 demo loop — embed everything with &lt;code>text-embedding-3-large&lt;/code>, dump into pgvector, top-5 cosine — works for 1000 documents and a forgiving demo. It does not survive 100K real documents and a customer who notices when the answer is wrong. This chapter is what I wish more teams knew before they built their second generation of RAG.&lt;/p>
&lt;p>The original RAG paper (&lt;a href="https://arxiv.org/abs/2005.11401" target="_blank" rel="noopener noreferrer">Lewis et al., 2020 &lt;span aria-hidden="true" style="font-size:0.75em; opacity:0.55; margin-left:2px;">↗&lt;/span>&lt;/a>
) framed retrieval-augmented generation as a hybrid model: a dense retriever (DPR) trained jointly with a generator (BART) so the retrieval objective optimized end-task accuracy. Production RAG in 2026 doesn&amp;rsquo;t look much like Lewis&amp;rsquo;s RAG — modern systems use frozen pre-trained embedders, separate rerankers, and decoder-only generators that don&amp;rsquo;t train against the retriever. But the core insight (parameterize knowledge separately from reasoning) survived and became the dominant paradigm. The &lt;a href="https://arxiv.org/abs/2312.10997" target="_blank" rel="noopener noreferrer">Gao et al. (2023) RAG survey &lt;span aria-hidden="true" style="font-size:0.75em; opacity:0.55; margin-left:2px;">↗&lt;/span>&lt;/a>
 is the best comprehensive overview of the post-2020 evolution into &amp;ldquo;Naive RAG → Advanced RAG → Modular RAG.&amp;rdquo;&lt;/p></description></item><item><title>Recommendation Systems (3): Deep Learning Foundations</title><link>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</link><pubDate>Sun, 07 Dec 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/recommendation-systems/03-deep-learning-basics/</guid><description>&lt;p>In June 2016, Google published a one-page paper that quietly redrew the map of recommendation systems. The paper described &lt;strong>Wide &amp;amp; Deep Learning&lt;/strong>, the model then powering app recommendations inside Google Play — a billion-user product. Within a year, every major tech company had a deep model in production. By 2019, the industry standard had shifted: matrix factorization was a baseline, not a system.&lt;/p>
&lt;p>What changed? Multi-layer neural networks brought four capabilities classical methods could not deliver:&lt;/p></description></item><item><title>NLP (10): RAG and Knowledge Enhancement Systems</title><link>https://www.chenk.top/en/nlp/rag-knowledge-enhancement/</link><pubDate>Sat, 15 Nov 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/nlp/rag-knowledge-enhancement/</guid><description>&lt;p>A frozen language model is a confident liar. It can&amp;rsquo;t read yesterday&amp;rsquo;s incident report, your company wiki, or the patch notes that shipped this morning, so when you ask, it confabulates an answer that is grammatically perfect but factually wrong. &lt;strong>Retrieval-Augmented Generation (RAG)&lt;/strong> breaks the deadlock by separating &lt;em>memory&lt;/em> from &lt;em>reasoning&lt;/em>: keep the LLM small and stable, and put the volatile knowledge in an external store that you can update anytime. Before generating, retrieve the relevant evidence and condition the model on it.&lt;/p></description></item></channel></rss>