Source author record

Seonjin Na

Seonjin Na appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Cryptography and Security Machine Learning

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

RL post-training of frontier language models is increasingly bottlenecked by autoregressive rollout generation, making rollout acceleration a central systems challenge. Many existing efficiency methods improve throughput by changing the rollout or optimization regime, for example, through off-policy execution, replay, or lower-precision generation. We study speculative decoding as a lossless acceleration primitive for RL rollouts that preserves the target model's output distribution. We implement speculative decoding in NeMo-RL with a vLLM backend, supporting both synchronous and asynchronous pipelines and enabling speculation during RL rollouts. This benefit is realizable across speculation mechanisms, such as pretrained MTP heads, small external draft models or even techniques such as Eagle3, which are traditionally applied after RL phase. This yields a deployment path for state-of-the-art speculative decoding inside RL training. In a reasoning post-training workload at 8B scale under synchronous RL, speculative decoding improves rollout throughput by 1.8x. Using a high-fidelity performance simulator, we project that combining speculative decoding with asynchronous RL yields up to 2.5x end-to-end training speedup at 235B scale.

preprint2026arXiv

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing

GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that CUDA's lack of memory safety can lead to serious vulnerabilities. While fuzzing is effective for finding such bugs on CPUs, equivalent tools for GPUs are lacking due to architectural differences and lack of built-in error detection. In this paper, we propose CuFuzz, a novel compiler-runtime co-design solution to extend state-of-the-art CPU fuzzing tools to GPU programs. CuFuzz transforms GPU programs into CPU programs using compiler IR-level transformations to enable effective fuzz testing. To the best of our knowledge, CuFuzz is the first mechanism to bring fuzzing support to CUDA, addressing a critical gap in GPU security research. By leveraging CPU memory error detectors such as Address Sanitizer, CuFuzz aims to uncover memory safety bugs and related correctness vulnerabilities in CUDA code, enhancing the security and reliability of GPU-accelerated applications. To ensure high fuzzing throughput, we introduce two compiler-runtime co-optimizations tailored for GPU code: Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune), achieving average throughput improvements of 32x with PREX and an additional 33% gain with AXIPrune on top of PREX-optimized code. Together, these optimizations can yield up to a 224.31x speedup. In our fuzzing campaigns, CuFuzz uncovered 122 security vulnerabilities in widely used benchmarks.

Seonjin Na

What is connected

Connect this record

See the researcher in context

Building this map preview

2 published item(s)

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing