Paper detail

VISD: Enhancing Video Reasoning via Structured Self-Distillation

Training VideoLLMs for complex reasoning remains challenging due to sparse sequence level rewards and the lack of fine grained credit assignment over long, temporally grounded reasoning trajectories. While reinforcement learning with verifiable rewards (RLVR) provides reliable supervision, it fails to capture token level contributions, leading to inefficient learning. Conversely, existing self distillation methods offer dense supervision but lack structure and diagnostic specificity, and often interact unstably with reinforcement learning. In this work, we propose VISD, a structured self distillation framework that introduces diagnostically meaningful privileged information for video reasoning. VISD employs a video aware judge model to decompose reasoning quality into multiple dimensions, including answer correctness, logical consistency, and spatio-temporal grounding, and uses this structured feedback to guide a teacher policy for token level supervision. To stably integrate dense supervision with RL, we introduce a direction magnitude decoupling mechanism, where rollout level advantages computed from rewards determine update direction, while structured privileged signals modulate token level update magnitudes. This design enables semantically aligned and fine grained credit assignment, improving both reasoning faithfulness and training efficiency. Additionally, VISD incorporates curriculum scheduling and EMA based teacher stabilization to support robust optimization over long video sequences. Experiments on diverse benchmarks show that VISD consistently outperforms strong baselines, improving answer accuracy and spatio temporal grounding quality. Notably, VISD reaches these gains with nearly 2x faster convergence in optimization steps, highlighting the effectiveness of structured self supervision in improving both performance and sample efficiency for VideoLLMs.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.