Source author record

Ziliang Zhang

Ziliang Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence astro-ph.HE astro-ph.IM Computation and Language physics.flu-dyn Software Engineering

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Can LLMs Generate Reliable Test Case Generators? A Study on Competition-Level Programming Problems

Large Language Models (LLMs) have demonstrated remarkable capabilities in code generation, capable of tackling complex tasks during inference. However, the extent to which LLMs can be utilized for code checking or debugging through test case generation remains largely unexplored. We investigate this problem from the perspective of competition-level programming (CP) programs and propose TCGBench, a Benchmark for (LLM generation of) Test Case Generators. This benchmark comprises two tasks, aimed at studying the capabilities of LLMs in (1) generating valid test case generators for a given CP problem, and further (2) generating targeted test case generators that expose bugs in human-written code. Experimental results indicate that while state-of-the-art LLMs can generate valid test case generators in most cases, most LLMs struggle to generate targeted test cases that reveal flaws in human code effectively. Especially, even advanced reasoning models (e.g., o3-mini) fall significantly short of human performance in the task of generating targeted generators. Furthermore, we construct a high-quality, manually curated dataset of instructions for generating targeted generators. Analysis demonstrates that the performance of LLMs can be enhanced with the aid of this dataset, by both prompting and fine-tuning.

preprint2023arXiv

Simulation of CO2 Storage using a Parameterization Method for Essential Trapping Physics: FluidFlower Benchmark Study

An efficient compositional framework is developed for simulation of CO2 storage in saline aquifers during a full-cycle injection, migration and post-migration processes. Essential trapping mechanisms, including structural, dissolution, and residual trapping, which operate at different time scales are accurately captured in the presented unified framework. In particular, a parameterization method is proposed to efficiently describe the relevant physical processes. The proposed framework is validated by comparing the dynamics of gravity-induced convective transport with that reported in the literature. Results show good agreement for both the characteristics of descending fingers and the associated dissolution rate. The developed simulator is then applied to study the FluidFlower benchmark model. An experimental setup with heterogeneous geological layers is discretized into a two-dimensional computational domain where numerical simulation is performed. Impacts of hysteresis and the diffusion of CO2 in liquid phase on the migration and trapping of CO2 plume are investigated. Inclusion of the hysteresis effect does not affect plume migration in this benchmark model, whereas diffusion plays an important role in promoting convective mixing. This work casts a promising approach to predict the migration of the CO2 plume, and to assess the amount of trapping from different mechanisms for long-term CO2 storage.

preprint2019arXiv

Overview to the Hard X-ray Modulation Telescope (Insight-HXMT) Satellite

As China's first X-ray astronomical satellite, the Hard X-ray Modulation Telescope (HXMT), which was dubbed as Insight-HXMT after the launch on June 15, 2017, is a wide-band (1-250 keV) slat-collimator-based X-ray astronomy satellite with the capability of all-sky monitoring in 0.2-3 MeV. It was designed to perform pointing, scanning and gamma-ray burst (GRB) observations and, based on the Direct Demodulation Method (DDM), the image of the scanned sky region can be reconstructed. Here we give an overview of the mission and its progresses, including payload, core sciences, ground calibration/facility, ground segment, data archive, software, in-orbit performance, calibration, background model, observations and some preliminary results.