Researcher profile

Ruobing Han

Ruobing Han contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing

GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that CUDA's lack of memory safety can lead to serious vulnerabilities. While fuzzing is effective for finding such bugs on CPUs, equivalent tools for GPUs are lacking due to architectural differences and lack of built-in error detection. In this paper, we propose CuFuzz, a novel compiler-runtime co-design solution to extend state-of-the-art CPU fuzzing tools to GPU programs. CuFuzz transforms GPU programs into CPU programs using compiler IR-level transformations to enable effective fuzz testing. To the best of our knowledge, CuFuzz is the first mechanism to bring fuzzing support to CUDA, addressing a critical gap in GPU security research. By leveraging CPU memory error detectors such as Address Sanitizer, CuFuzz aims to uncover memory safety bugs and related correctness vulnerabilities in CUDA code, enhancing the security and reliability of GPU-accelerated applications. To ensure high fuzzing throughput, we introduce two compiler-runtime co-optimizations tailored for GPU code: Partial Representative Execution (PREX) and Access-Index Preserving Pruning (AXIPrune), achieving average throughput improvements of 32x with PREX and an additional 33% gain with AXIPrune on top of PREX-optimized code. Together, these optimizations can yield up to a 224.31x speedup. In our fuzzing campaigns, CuFuzz uncovered 122 security vulnerabilities in widely used benchmarks.

preprint2022arXiv

CuPBoP: CUDA for Parallelized and Broad-range Processors

CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in heterogeneous systems. To make CUDA programs portable, some researchers have proposed using source-to-source translators to translate CUDA to portable programming languages that can be executed on non-NVIDIA devices. However, most CUDA translators require additional manual modifications on the translated code, which imposes a heavy workload on developers. In this paper, CuPBoP is proposed to execute CUDA on non-NVIDIA devices without relying on any portable programming languages. Compared with existing work that executes CUDA on non-NVIDIA devices, CuPBoP does not require manual modification of the CUDA source code, but it still achieves the highest coverage (69.6%), much higher than existing frameworks (56.6%) on the Rodinia benchmark. In particular, for CPU backends, CuPBoP supports several ISAs (e.g., X86, RISC-V, AArch64) and has close or even higher performance compared with other projects. We also compare and analyze the performance among CuPBoP, manually optimized OpenMP/MPI programs, and CUDA programs on the latest Ampere architecture GPU, and show future directions for supporting CUDA programs on non-NVIDIA devices with high performance