Paper detail

Think When Needed: Adaptive Reasoning-Driven Multimodal Embeddings with a Dual-LoRA Architecture

Multimodal large language models (MLLMs) have emerged as a powerful backbone for multimodal embeddings. Recent methods introduce chain-of-thought (CoT) reasoning into the embedding pipeline to improve retrieval quality, but remain costly in both model size and inference cost. They typically employ separate reasoner and embedder with substantial parameter overhead, and generate CoT indiscriminately for every input. However, we observe that for simple inputs, discriminative embeddings already perform well, and redundant reasoning can even mislead the model, degrading performance. To address these limitations, we propose Think When Needed (TWN), a unified multimodal embedding framework with adaptive reasoning. TWN introduces a dual-LoRA architecture that attaches reasoning and embedding adapters to a shared frozen backbone, detaching gradients at their interface to mitigate gradient conflicts introduced by joint optimization while keeping parameters close to a single model. Building on this, an adaptive think mechanism uses a self-supervised routing gate to decide per input whether to generate CoT, skipping unnecessary reasoning to reduce inference overhead and even improve retrieval quality. We further explore embedding-guided RL to optimize CoT quality beyond supervised training. On the 78 tasks of MMEB-V2, TWN achieves state-of-the-art embedding quality while being substantially more efficient than existing generative methods, requiring only 3-5% additional parameters relative to the backbone and up to 50% fewer reasoning tokens compared to the full generative mode.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.