Paper detail

Monkey Jump : MoE-Style PEFT for Efficient Multi-Task Learning

Mixture-of-experts variants of parameter-efficient fine-tuning enable per-token specialization, but they introduce additional trainable routers and expert parameters, increasing memory usage and training cost. This undermines the core goal of parameter-efficient fine-tuning. We propose Monkey Jump, a method that brings mixture-of-experts-style specialization to parameter-efficient fine-tuning without introducing extra trainable parameters for experts or routers. Instead of adding new adapters as experts, Monkey Jump treats the adapters already present in each Transformer block (such as query, key, value, up, and down projections) as implicit experts and routes tokens among them. Routing is performed using k-means clustering with exponentially moving averaged cluster centers, requiring no gradients and no learned parameters. We theoretically show that token-wise routing increases expressivity and can outperform shared adapters by avoiding cancellation effects. Across multi-task experiments covering 14 text, 14 image, and 19 video benchmarks, Monkey Jump achieves competitive performance with mixture-of-experts-based parameter-efficient fine-tuning methods while using 7 to 29 times fewer trainable parameters, up to 48 percent lower memory consumption, and 1.5 to 2 times faster training. Monkey Jump is architecture-agnostic and can be applied to any adapter-based parameter-efficient fine-tuning method.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.