Paper detail

Beyond Execution: Static-Analysis Rewards and Hint-Conditioned Diffusion RL for Code Generation

Reinforcement Learning (RL) is an important paradigm for aligning Diffusion Language Models (DLMs) toward functional correctness in code generation. However, these models often encounter a ``capability cliff'' on complex tasks, where execution-based semantic rewards become too low to provide a viable learning signal. In this paper, we present a systematic empirical study of RL post-training for diffusion-based code generation along three axes: reward design, hint-conditioned sampling, and task difficulty. We investigate the effectiveness of execution-free rewards as alternatives to traditional unit-test execution, the role of training-time hint-conditioned diffusion sampling in mitigating exploration bottlenecks, and the impact of these design choices varies across tasks with different difficulty levels. Across HumanEval, MBPP, and LiveCodeBench, we find that static checking is the strongest overall standalone execution-free reward in our setting, especially improving DiffuCoder from 53.9 to 67.1 on HumanEval and from 14.9 to 15.5 on LiveCodeBench while reducing rollout time by 9.4\%. We further find that moderate AST-based hinting is most useful on harder benchmarks, while the best reward design depends strongly on task difficulty: similarity-based rewards are more effective on easier subsets, whereas static checking is more reliable on harder subsets where execution rewards are low. These findings suggest that reward design and training guidance substantially affect diffusion RL performance in our evaluated code-generation setting.

preprint2026arXivOpen access
0citations
0reviews
0saves
Nocode
Nodataset
0institutions

Next steps

Decide what to do with this paper

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Log in to curate

Reading frame

Keep the important context close to the paper

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Institutions

Add specific reaction

Move through the context

Research map

Open full explorer

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Structured reviews

0 review(s)

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

Work discussion

0 comment(s)

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

No discussion yet. The first strong comment sets the tone.