<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Paper on Chen Kai Blog</title><link>https://www.chenk.top/en/categories/paper/</link><description>Recent content in Paper on Chen Kai Blog</description><generator>Hugo</generator><language>en</language><lastBuildDate>Tue, 29 Jul 2025 09:00:00 +0000</lastBuildDate><atom:link href="https://www.chenk.top/en/categories/paper/index.xml" rel="self" type="application/rss+xml"/><item><title>Prefix-Tuning: Optimizing Continuous Prompts for Generation</title><link>https://www.chenk.top/en/standalone/prefix-tuning/</link><pubDate>Tue, 29 Jul 2025 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/prefix-tuning/</guid><description>&lt;p>Fine-tuning a 1.5B-parameter GPT-2 model for each downstream task means saving a fresh 1.5B-parameter checkpoint every time. Across a dozen tasks, that is a substantial storage and serving headache, and it makes sharing a single base model essentially impossible. &lt;em>Prefix-Tuning&lt;/em> (Li &amp;amp; Liang, 2021) takes the opposite stance: freeze every weight of the language model, and learn a tiny block of continuous vectors — the &lt;em>prefix&lt;/em> — that is fed into the attention layers as if it were context the model already attended to. The model never changes; only the prefix does, and a different prefix produces a different &amp;ldquo;personality&amp;rdquo; on demand.&lt;/p></description></item><item><title>MoSLoRA: Mixture-of-Subspaces in Low-Rank Adaptation</title><link>https://www.chenk.top/en/standalone/moslora/</link><pubDate>Sun, 01 Sep 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/moslora/</guid><description>&lt;p>LoRA is the default tool for adapting a frozen base model: cheap, stable, mergeable, and good enough for most single-task settings. But the moment your fine-tuning data is genuinely heterogeneous — code mixed with math, instruction following mixed with creative writing, several domains in one adapter — a single low-rank subspace starts to feel cramped. You can grow &lt;span class="math-inline">$r$&lt;/span>
, but cost grows with it and you still get &lt;em>one&lt;/em> subspace, just a fatter one.&lt;/p></description></item><item><title>HCGR: Hyperbolic Contrastive Graph Representation Learning for Session-based Recommendation</title><link>https://www.chenk.top/en/standalone/hcgr/</link><pubDate>Wed, 01 May 2024 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/hcgr/</guid><description>&lt;p>A user opens a sneaker app, taps &amp;ldquo;running shoes,&amp;rdquo; drills into a brand, then a price band, and finally a single SKU. This trajectory forms a &lt;em>tree&lt;/em>: each click narrows the candidate set roughly multiplicatively. In Euclidean space, you need many dimensions to keep all the leaves of the tree apart because the volume grows polynomially with radius. In hyperbolic space, volume grows &lt;em>exponentially&lt;/em> with radius, so the tree fits naturally — a few dimensions are enough to keep the long tail untangled.&lt;/p></description></item><item><title>paper2repo: GitHub Repository Recommendation for Academic Papers</title><link>https://www.chenk.top/en/standalone/paper2repo-github-repository-recommendation/</link><pubDate>Mon, 26 Jun 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/paper2repo-github-repository-recommendation/</guid><description>&lt;p>You read a paper, want the code, but the &amp;ldquo;code available at&amp;rdquo; link is dead, missing, or points to a stub. Search engines resort to keyword matching in the README, which works for popular repos with descriptive names but fails for others. paper2repo (WWW 2020) frames this as a cross-platform recommendation problem: learn an embedding space where a paper abstract and a GitHub repository can be compared directly using a dot product, then rank them.&lt;/p></description></item><item><title>Session-based Recommendation with Graph Neural Networks (SR-GNN)</title><link>https://www.chenk.top/en/standalone/session-based-recommendation-with-graph-neural-networks/</link><pubDate>Sun, 25 Jun 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/session-based-recommendation-with-graph-neural-networks/</guid><description>&lt;p>A user clicks &lt;strong>A, B, C, B, D&lt;/strong>. A sequence model reads this as five tokens and folds them into a hidden state. &lt;strong>SR-GNN&lt;/strong> sees a &lt;em>graph&lt;/em> in which the edge &lt;code>B -&amp;gt; C&lt;/code> survives even after the user returns to &lt;code>B&lt;/code>, the node &lt;code>B&lt;/code> is reused (so its in/out neighbours both inform its embedding), and the geometry of the click stream is preserved as adjacency. That structural insight is why &lt;a href="https://arxiv.org/abs/1811.00855" target="_blank" rel="noopener noreferrer">SR-GNN (Wu et al., AAAI 2019) &lt;span aria-hidden="true" style="font-size:0.75em; opacity:0.55; margin-left:2px;">↗&lt;/span>&lt;/a>
 outperforms purely sequential baselines such as GRU4Rec and NARM on standard session-based recommendation (SBR) benchmarks.&lt;/p></description></item><item><title>Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation</title><link>https://www.chenk.top/en/standalone/gcsan/</link><pubDate>Sun, 29 Jan 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/gcsan/</guid><description>&lt;p>In session-based recommendation you only see a short anonymous click sequence — no user profile, no long history, no demographics. Every signal you have lives inside that single window. &lt;strong>GC-SAN&lt;/strong> (IJCAI 2019) takes the strongest two ideas of the time — SR-GNN&amp;rsquo;s session graph and the Transformer&amp;rsquo;s self-attention — and stacks them: a &lt;em>graph&lt;/em> view captures local transition patterns and loops, a &lt;em>sequence&lt;/em> view captures long-range intent, and a tiny weighted sum decides how much of each to trust. The result is a clean &amp;ldquo;best of both worlds&amp;rdquo; baseline that is genuinely hard to beat at its parameter budget.&lt;/p></description></item><item><title>LLMGR: Integrating Large Language Models with Graphical Session-Based Recommendation</title><link>https://www.chenk.top/en/standalone/llmgr/</link><pubDate>Sun, 22 Jan 2023 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/llmgr/</guid><description>&lt;p>Session-based recommendation relies on the click graph. New items lack edges, and long-tail items have a few noisy ones. Each item has a title and description, but the model never uses them. &lt;strong>LLMGR&lt;/strong> addresses this by treating the LLM as a &amp;ldquo;semantic engine&amp;rdquo; that converts text into representations a graph encoder can use, then lets a GNN handle ranking. On Amazon Music/Beauty/Pantry, the results show HR@20 up ~8.68%, NDCG@20 up ~10.71%, and MRR@20 up ~11.75% over the strongest GNN baseline, with the biggest gains for cold-start items.&lt;/p></description></item><item><title>Graph Neural Networks for Learning Equivariant Representations of Neural Networks</title><link>https://www.chenk.top/en/standalone/gnn-equivariant-representations/</link><pubDate>Sun, 03 Apr 2022 09:00:00 +0000</pubDate><guid>https://www.chenk.top/en/standalone/gnn-equivariant-representations/</guid><description>&lt;p>Shuffling the hidden neurons of a trained MLP yields the exact same function, but the flat parameter vector looks entirely different. This fact ruins most attempts at &amp;ldquo;learning over neural networks&amp;rdquo;: naive representations treat two functionally identical models as unrelated points in parameter space, causing the downstream learner to waste capacity rediscovering a symmetry it should have for free. This paper, &lt;em>Graph Neural Networks for Learning Equivariant Representations of Neural Networks&lt;/em> (Kofinas et al., ICML 2024), proposes a clean fix: turn the network into a graph and use a GNN whose architecture natively respects the relevant permutation symmetry.&lt;/p></description></item></channel></rss>