Paper on Chen Kai Blog

Prefix-Tuning: Optimizing Continuous Prompts for Generation

Tue, 29 Jul 2025 09:00:00 +0000

Fine-tuning a 1.5B-parameter GPT-2 model for each downstream task means saving a fresh 1.5B-parameter checkpoint every time. Across a dozen tasks, that is a substantial storage and serving headache, and it makes sharing a single base model essentially impossible. Prefix-Tuning (Li & Liang, 2021) takes the opposite stance: freeze every weight of the language model, and learn a tiny block of continuous vectors — the prefix — that is fed into the attention layers as if it were context the model already attended to. The model never changes; only the prefix does, and a different prefix produces a different “personality” on demand.

MoSLoRA: Mixture-of-Subspaces in Low-Rank Adaptation

Sun, 01 Sep 2024 09:00:00 +0000

LoRA is the default tool for adapting a frozen base model: cheap, stable, mergeable, and good enough for most single-task settings. But the moment your fine-tuning data is genuinely heterogeneous — code mixed with math, instruction following mixed with creative writing, several domains in one adapter — a single low-rank subspace starts to feel cramped. You can grow $$r$$ , but cost grows with it and you still get one subspace, just a fatter one.

HCGR: Hyperbolic Contrastive Graph Representation Learning for Session-based Recommendation

Wed, 01 May 2024 09:00:00 +0000

A user opens a sneaker app, taps “running shoes,” drills into a brand, then a price band, and finally a single SKU. This trajectory forms a tree: each click narrows the candidate set roughly multiplicatively. In Euclidean space, you need many dimensions to keep all the leaves of the tree apart because the volume grows polynomially with radius. In hyperbolic space, volume grows exponentially with radius, so the tree fits naturally — a few dimensions are enough to keep the long tail untangled.

paper2repo: GitHub Repository Recommendation for Academic Papers

Mon, 26 Jun 2023 09:00:00 +0000

You read a paper, want the code, but the “code available at” link is dead, missing, or points to a stub. Search engines resort to keyword matching in the README, which works for popular repos with descriptive names but fails for others. paper2repo (WWW 2020) frames this as a cross-platform recommendation problem: learn an embedding space where a paper abstract and a GitHub repository can be compared directly using a dot product, then rank them.

Session-based Recommendation with Graph Neural Networks (SR-GNN)

Sun, 25 Jun 2023 09:00:00 +0000

A user clicks A, B, C, B, D. A sequence model reads this as five tokens and folds them into a hidden state. SR-GNN sees a graph in which the edge B -> C survives even after the user returns to B, the node B is reused (so its in/out neighbours both inform its embedding), and the geometry of the click stream is preserved as adjacency. That structural insight is why SR-GNN (Wu et al., AAAI 2019) outperforms purely sequential baselines such as GRU4Rec and NARM on standard session-based recommendation (SBR) benchmarks.

Graph Contextualized Self-Attention Network (GC-SAN) for Session-based Recommendation

Sun, 29 Jan 2023 09:00:00 +0000

In session-based recommendation you only see a short anonymous click sequence — no user profile, no long history, no demographics. Every signal you have lives inside that single window. GC-SAN (IJCAI 2019) takes the strongest two ideas of the time — SR-GNN’s session graph and the Transformer’s self-attention — and stacks them: a graph view captures local transition patterns and loops, a sequence view captures long-range intent, and a tiny weighted sum decides how much of each to trust. The result is a clean “best of both worlds” baseline that is genuinely hard to beat at its parameter budget.

LLMGR: Integrating Large Language Models with Graphical Session-Based Recommendation

Sun, 22 Jan 2023 09:00:00 +0000

Session-based recommendation relies on the click graph. New items lack edges, and long-tail items have a few noisy ones. Each item has a title and description, but the model never uses them. LLMGR addresses this by treating the LLM as a “semantic engine” that converts text into representations a graph encoder can use, then lets a GNN handle ranking. On Amazon Music/Beauty/Pantry, the results show HR@20 up ~8.68%, NDCG@20 up ~10.71%, and MRR@20 up ~11.75% over the strongest GNN baseline, with the biggest gains for cold-start items.

Graph Neural Networks for Learning Equivariant Representations of Neural Networks

Sun, 03 Apr 2022 09:00:00 +0000

Shuffling the hidden neurons of a trained MLP yields the exact same function, but the flat parameter vector looks entirely different. This fact ruins most attempts at “learning over neural networks”: naive representations treat two functionally identical models as unrelated points in parameter space, causing the downstream learner to waste capacity rediscovering a symmetry it should have for free. This paper, Graph Neural Networks for Learning Equivariant Representations of Neural Networks (Kofinas et al., ICML 2024), proposes a clean fix: turn the network into a graph and use a GNN whose architecture natively respects the relevant permutation symmetry.