Researcher profile

Zhenyu Chen

Zhenyu Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
11works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

11 published item(s)

preprint2026arXiv

Collision energy and system size dependence of longitudinal flow decorrelation in heavy-ion collisions at RHIC energies

In heavy-ion collisions, the initial collision geometry and its fluctuations drive the collective expansion of final-state hadrons in the transverse plane. However, longitudinal fluctuations induce event-plane twist and flow magnitude asymmetries, collectively known as longitudinal flow decorrelation. Using a multi-phase transport (AMPT) model, we systematically investigate the dependence of collision energy and system size of this phenomenon with Au+Au collisions at $\sqrt{s_{\mathrm{NN}}}$ = 19.6, 27, 54.4, 200 GeV and isobar collisions (Zr+Zr and Ru+Ru) at $\sqrt{s_{\mathrm{NN}}}$ = 200 GeV. The results reveal two distinct decorrelation components: $r_n(η)$, which includes flow magnitude asymmetry and event-plane twist, and $R_n(η)$ which arises purely from event-plane twist. Both $r_n(η)$ and $R_n(η)$ decrease linearly with $η$ and exhibit a significant dependence on collision energy and the size of the system. Through the slope parameters $F_n$ in the linear parametrization $r_n(η) = 1-2F_nη$, we can quantify the strength of decorrelation. We further observe that both $F_2$ and $F_3$ demonstrate a pronounced power-law scaling behavior with collision energy, following the relation $F_n \propto log \sqrt{s_{NN}}$. These results provide valuable insights into the three-dimensional modeling of the initial stage and the evolution of relativistic heavy-ion collisions.

preprint2026arXiv

May the Feedback Be with You! Unlocking the Power of Feedback-Driven Deep Learning Framework Fuzzing via LLMs

Deep Learning (DL) frameworks have served as fundamental components in DL systems over the last decade. However, bugs in DL frameworks could lead to catastrophic consequences in critical scenarios. A simple yet effective way to find bugs in DL frameworks is fuzz testing (Fuzzing). Existing approaches focus on test generation, leaving execution results with high semantic value (e.g., coverage information, bug reports, and exception logs) in the wild, which can serve as multiple types of feedback. To fill this gap, we propose FUEL to effectively utilize the feedback information, which comprises two Large Language Models (LLMs): analysis LLM and generation LLM. Specifically, analysis LLM infers analysis summaries from feedback information, while the generation LLM creates tests guided by these summaries. Furthermore, based on multiple feedback guidance, we design two additional components: (i) a feedback-aware simulated annealing algorithm to select operators for test generation, enriching test diversity. (ii) a program self-repair strategy to automatically repair invalid tests, enhancing test validity. We evaluate FUEL on the two most popular DL frameworks, and experiment results show that FUEL can improve line code coverage of PyTorch and TensorFlow by 4.48% and 9.14% over four state-of-the-art baselines. By the time of submission, FUEL has detected 104 previously unknown bugs for PyTorch and TensorFlow, with 93 confirmed as new bugs, 53 already fixed. 14 vulnerabilities have been assigned CVE IDs, among which 7 are rated as high-severity with a CVSS score of "7.5 HIGH". Our artifact is available at https://github.com/NJU-iSE/FUEL

preprint2026arXiv

Taming Barren Plateaus in Arbitrary Parameterized Quantum Circuits without Sacrificing Expressibility

Quantum algorithms based on parameterized quantum circuits (PQCs) have enabled a wide range of applications on near-term quantum devices. However, existing PQC architectures face several challenges, among which the ``barren plateaus" phenomenon is particularly prominent. In such cases, the loss function concentrates exponentially with increasing system size, thereby hindering effective parameter optimization. To address this challenge, we propose a general and hardware-efficient method for eliminating barren plateaus in an arbitrary PQC. Specifically, our approach achieves this by inserting a layer of easily implementable quantum channels into the original PQC, each channel requiring only one ancilla qubit and four additional gates, yielding a modified PQC (MPQC) that is provably at least as expressive as the original PQC and, under mild assumptions, is guaranteed to be free from barren plateaus. Furthermore, by appropriately adjusting the structure of MPQCs, we rigorously prove that any parameter in the original PQC can be made trainable. Importantly, the absence of barren plateaus in MPQCs is robust against realistic noise, making our approach directly applicable to near-term quantum hardware. Numerical simulations demonstrate that MPQC effectively eliminates barren plateaus in PQCs for preparing thermal states of systems with up to 100 qubits and 2400 layers. Furthermore, in end-to-end simulations, MPQC significantly outperforms PQC in finding the ground-state energy of a complex Hamiltonian.

preprint2024arXiv

Coverage Goal Selector for Combining Multiple Criteria in Search-Based Unit Test Generation

Unit testing is critical to the software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. Since combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the subsumption relationships among coverage goals, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms under the $2$-minute budget. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches. Secondly, we conduct experiments to verify our assumptions about coverage criteria relationships. Furthermore, we assess the coverage performance of smart selection under varying budgets of $5$, $8$, and $10$ minutes and explore its effect on bug detection, confirming the advantage of smart selection over combining all goals.

preprint2022arXiv

Deep Graph Learning for Spatially-Varying Indoor Lighting Prediction

Lighting prediction from a single image is becoming increasingly important in many vision and augmented reality (AR) applications in which shading and shadow consistency between virtual and real objects should be guaranteed. However, this is a notoriously ill-posed problem, especially for indoor scenarios, because of the complexity of indoor luminaires and the limited information involved in 2D images. In this paper, we propose a graph learning-based framework for indoor lighting estimation. At its core is a new lighting model (dubbed DSGLight) based on depth-augmented Spherical Gaussians (SG) and a Graph Convolutional Network (GCN) that infers the new lighting representation from a single LDR image of limited field-of-view. Our lighting model builds 128 evenly distributed SGs over the indoor panorama, where each SG encoding the lighting and the depth around that node. The proposed GCN then learns the mapping from the input image to DSGLight. Compared with existing lighting models, our DSGLight encodes both direct lighting and indirect environmental lighting more faithfully and compactly. It also makes network training and inference more stable. The estimated depth distribution enables temporally stable shading and shadows under spatially-varying lighting. Through thorough experiments, we show that our method obviously outperforms existing methods both qualitatively and quantitatively.

preprint2022arXiv

PSP: Million-level Protein Sequence Dataset for Protein Structure Prediction

Proteins are essential component of human life and their structures are important for function and mechanism analysis. Recent work has shown the potential of AI-driven methods for protein structure prediction. However, the development of new models is restricted by the lack of dataset and benchmark training procedure. To the best of our knowledge, the existing open source datasets are far less to satisfy the needs of modern protein sequence-structure related research. To solve this problem, we present the first million-level protein structure prediction dataset with high coverage and diversity, named as PSP. This dataset consists of 570k true structure sequences (10TB) and 745k complementary distillation sequences (15TB). We provide in addition the benchmark training procedure for SOTA protein structure prediction model on this dataset. We validate the utility of this dataset for training by participating CAMEO contest in which our model won the first place. We hope our PSP dataset together with the training benchmark can enable a broader community of AI/biology researchers for AI-driven protein related research.

preprint2022arXiv

Selectively Combining Multiple Coverage Goals in Search-Based Unit Test Generation

Unit testing is a critical part of software development process, ensuring the correctness of basic programming units in a program (e.g., a method). Search-based software testing (SBST) is an automated approach to generating test cases. SBST generates test cases with genetic algorithms by specifying the coverage criterion (e.g., branch coverage). However, a good test suite must have different properties, which cannot be captured by using an individual coverage criterion. Therefore, the state-of-the-art approach combines multiple criteria to generate test cases. As combining multiple coverage criteria brings multiple objectives for optimization, it hurts the test suites' coverage for certain criteria compared with using the single criterion. To cope with this problem, we propose a novel approach named \textbf{smart selection}. Based on the coverage correlations among criteria and the coverage goals' subsumption relationships, smart selection selects a subset of coverage goals to reduce the number of optimization objectives and avoid missing any properties of all criteria. We conduct experiments to evaluate smart selection on $400$ Java classes with three state-of-the-art genetic algorithms. On average, smart selection outperforms combining all goals on $65.1\%$ of the classes having significant differences between the two approaches.

preprint2021arXiv

Graph-Based Fuzz Testing for Deep Learning Inference Engine

With the wide use of Deep Learning (DL) systems, academy and industry begin to pay attention to their quality. Testing is one of the major methods of quality assurance. However, existing testing techniques focus on the quality of DL models but lacks attention to the core underlying inference engines (i.e., frameworks and libraries). Inspired by the success stories of fuzz testing, we design a graph-based fuzz testing method to improve the quality of DL inference engines. This method is naturally followed by the graph structure of DL models. A novel operator-level coverage criterion based on graph theory is introduced and six different mutations are implemented to generate diversified DL models by exploring combinations of model structures, parameters, and data inputs. The Monte Carlo Tree Search (MCTS) is used to drive DL model generation without a training process. The experimental results show that the MCTS outperforms the random method in boosting operator-level coverage and detecting exceptions. Our method has discovered more than 40 different exceptions in three types of undesired behaviors: model conversion failure, inference failure, output comparison failure. The mutation strategies are useful to generate new valid test inputs, by up to 8.2% more operator-level coverage on average and 8.6 more exceptions captured.

preprint2021arXiv

Prioritize Crowdsourced Test Reports via Deep Screenshot Understanding

Crowdsourced testing is increasingly dominant in mobile application (app) testing, but it is a great burden for app developers to inspect the incredible number of test reports. Many researches have been proposed to deal with test reports based only on texts or additionally simple image features. However, in mobile app testing, texts contained in test reports are condensed and the information is inadequate. Many screenshots are included as complements that contain much richer information beyond texts. This trend motivates us to prioritize crowdsourced test reports based on a deep screenshot understanding. In this paper, we present a novel crowdsourced test report prioritization approach, namely DeepPrior. We first represent the crowdsourced test reports with a novelly introduced feature, namely DeepFeature, that includes all the widgets along with their texts, coordinates, types, and even intents based on the deep analysis of the app screenshots, and the textual descriptions in the crowdsourced test reports. DeepFeature includes the Bug Feature, which directly describes the bugs, and the Context Feature, which depicts the thorough context of the bug. The similarity of the DeepFeature is used to represent the test reports' similarity and prioritize the crowdsourced test reports. We formally define the similarity as DeepSimilarity. We also conduct an empirical experiment to evaluate the effectiveness of the proposed technique with a large dataset group. The results show that DeepPrior is promising, and it outperforms the state-of-the-art approach with less than half the overhead.

preprint2020arXiv

DeepGini: Prioritizing Massive Tests to Enhance the Robustness of Deep Neural Networks

Deep neural networks (DNN) have been deployed in many software systems to assist in various classification tasks. In company with the fantastic effectiveness in classification, DNNs could also exhibit incorrect behaviors and result in accidents and losses. Therefore, testing techniques that can detect incorrect DNN behaviors and improve DNN quality are extremely necessary and critical. However, the testing oracle, which defines the correct output for a given input, is often not available in the automated testing. To obtain the oracle information, the testing tasks of DNN-based systems usually require expensive human efforts to label the testing data, which significantly slows down the process of quality assurance. To mitigate this problem, we propose DeepGini, a test prioritization technique designed based on a statistical perspective of DNN. Such a statistical perspective allows us to reduce the problem of measuring misclassification probability to the problem of measuring set impurity, which allows us to quickly identify possibly-misclassified tests. To evaluate, we conduct an extensive empirical study on popular datasets and prevalent DNN models. The experimental results demonstrate that DeepGini outperforms existing coverage-based techniques in prioritizing tests regarding both effectiveness and efficiency. Meanwhile, we observe that the tests prioritized at the front by DeepGini are more effective in improving the DNN quality in comparison with the coverage-based techniques.

preprint2020arXiv

Disentangle contributions to small-system collectivity via scans of light nucleus-nucleus collisions

The observation of multi-particle azimuthal correlations in high-energy small-system collisions has led to intense debate on its origin and the possible coexistence from two competing theoretical scenarios: one based on initial-state intrinsic momentum anisotropy (ISM), and the other based on final-state collective response to the collision geometry (FSM). To complement the previous scan of asymmetric collision systems ($p$+Au, $d$+Au and He+Au), we propose a scan of small symmetric collision systems at RHIC, such as C+C, O+O, Al+Al and Ar+Ar $\sqrt{s_{\mathrm{NN}}}=0.2$ TeV, to provide further insights in disentangling contributions from these two scenarios. These symmetric small systems have the advantage of providing a better controlled initial geometry dominated by the average shape of the overlap region, as opposed to fluctuation-driven geometries in asymmetric systems. A transport model is employed to investigate the expected geometry response in the FSM scenario. Different trends of elliptic flow with increasing charge particle multiplicity are observed between symmetric and asymmetric systems, while triangular flow appears to show a similar behavior. Furthermore, a comparison of O+O collisions at $\sqrt{s_{\mathrm{NN}}}=0.2$ TeV and at $\sqrt{s_{\mathrm{NN}}}=2.76-7$ TeV, as proposed at the LHC, provides a unique opportunity to disentangle the collision geometry effects at nucleon level from those arising from subnucleon fluctuations.