Source author record

Wenzhuo Yang

Wenzhuo Yang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Cryptography and Security Distributed, Parallel, and Cluster Computing Machine Learning Quantitative Methods Software Engineering

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

We present CodeEvolve, an evolutionary framework for improving program performance and code quality with Large Language Models (LLMs). CodeEvolve extends OpenEvolve with runtime-guided target selection, Monte Carlo Tree Search (MCTS), automated code refinement, and language-specific evaluation pipelines for Java and Salesforce Apex. The system uses Java Flight Recorder (JFR) profiles to build weighted component graphs and select optimization targets that account for most execution cost, reducing reliance on manual bottleneck identification. For each target, CodeEvolve generates candidate edits, evaluates them through build validation, unit tests, performance checks, static analysis, and LLM-based review, and retains only variants that preserve functional correctness. Across real-world optimization tasks, CodeEvolve improves performance and code metrics while maintaining correctness. On a large enterprise Java codebase, it achieves an average speedup of 15.22$\times$ across seven hotspot functions and outperforms single-pass LLM optimization on five of them. An ablation study on Apex optimization shows that the full MCTS-augmented configuration produces 19.5 valid programs out of 20 on average, indicating that search, filtering, and refinement each contribute to more reliable optimization.

preprint2026arXiv

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

Estimating individualized treatment effects from longitudinal observational data is central to data-driven medicine, yet existing methods face a fundamental limitation: reducing confounding bias often suppresses clinically informative heterogeneity, degrading patient-specific predictions. Here, we identify this tension as a bias-precision paradox in causal representation learning and introduce sampling-based maximum mean discrepancy (sMMD), a stochastic alignment strategy that replaces global adversarial balancing with subset-level matching. We instantiate this approach in a framework for counterfactual outcome prediction with attribution-grounded interpretability. Across two large-scale ICU cohorts (n = 27,783), our framework improves accuracy under distribution shift, reducing error by up to 11.5% and substantially increasing recall in high-risk tasks. Mechanistic analyses show that sMMD selectively preserves clinically decisive variables. In human-AI evaluation, our method outperforms clinicians-in-training and large language models, and improves clinician accuracy by 14.7% while reducing decision time, enabling interpretable, real-time clinical decision support.

preprint2022arXiv

MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation

Counterfactual explanation is an important Explainable AI technique to explain machine learning predictions. Despite being studied actively, existing optimization-based methods often assume that the underlying machine-learning model is differentiable and treat categorical attributes as continuous ones, which restricts their real-world applications when categorical attributes have many different values or the model is non-differentiable. To make counterfactual explanation suitable for real-world applications, we propose a novel framework of Model-Agnostic Counterfactual Explanation (MACE), which adopts a newly designed pipeline that can efficiently handle non-differentiable machine-learning models on a large number of feature values. in our MACE approach, we propose a novel RL-based method for finding good counterfactual examples and a gradient-less descent method for improving proximity. Experiments on public datasets validate the effectiveness with better validity, sparsity and proximity.

preprint2022arXiv

Privacy-Preserving Aggregation in Federated Learning: A Survey

Over the recent years, with the increasing adoption of Federated Learning (FL) algorithms and growing concerns over personal data privacy, Privacy-Preserving Federated Learning (PPFL) has attracted tremendous attention from both academia and industry. Practical PPFL typically allows multiple participants to individually train their machine learning models, which are then aggregated to construct a global model in a privacy-preserving manner. As such, Privacy-Preserving Aggregation (PPAgg) as the key protocol in PPFL has received substantial research interest. This survey aims to fill the gap between a large number of studies on PPFL, where PPAgg is adopted to provide a privacy guarantee, and the lack of a comprehensive survey on the PPAgg protocols applied in FL systems. In this survey, we review the PPAgg protocols proposed to address privacy and security issues in FL systems. The focus is placed on the construction of PPAgg protocols with an extensive analysis of the advantages and disadvantages of these selected PPAgg protocols and solutions. Additionally, we discuss the open-source FL frameworks that support PPAgg. Finally, we highlight important challenges and future research directions for applying PPAgg to FL systems and the combination of PPAgg with other technologies for further security improvement.

preprint2022arXiv

Understanding Security in Smart City Domains From the ANT-centric Perspective

A city is a large human settlement that serves the people who live there, and a smart city is a concept of how cities might better serve their residents through new forms of technology. In this paper, we focus on four major smart city domains according to Maslow's hierarchy of needs: smart utility, smart transportation, smart homes, and smart healthcare. Numerous IoT applications have been developed to achieve the intelligence that we desire in our smart domains, ranging from personal gadgets such as health trackers and smart watches to large-scale industrial IoT systems such as nuclear and energy management systems. However, many of the existing smart city IoT solutions can be made better by considering the suitability of their security strategies. Inappropriate system security designs generally occur in two scenarios: first, system designers recognize the importance of security but are unsure of where, when, or how to implement it; and second, system designers try to fit traditional security designs to meet the smart city security context. Thus, the objective of this paper is to provide application designers with the missing security link they may need to improve their security designs. By evaluating the specific context of each smart city domain and the context-specific security requirements, we aim to provide directions on when, where, and how they should implement security strategies and the possible security challenges they need to consider. In addition, we present a new perspective on security issues in smart cities from a data-centric viewpoint by referring to the reference architecture, the Activity-Network-Things (ANT)-centric architecture, built upon the concept of "security in a zero-trust environment". By doing so, we reduce the security risks posed by new system interactions or unanticipated user behaviors while avoiding the hassle of regularly upgrading security models.

preprint2021arXiv

Gain without Pain: Offsetting DP-injected Nosies Stealthily in Cross-device Federated Learning

Federated Learning (FL) is an emerging paradigm through which decentralized devices can collaboratively train a common model. However, a serious concern is the leakage of privacy from exchanged gradient information between clients and the parameter server (PS) in FL. To protect gradient information, clients can adopt differential privacy (DP) to add additional noises and distort original gradients before they are uploaded to the PS. Nevertheless, the model accuracy will be significantly impaired by DP noises, making DP impracticable in real systems. In this work, we propose a novel Noise Information Secretly Sharing (NISS) algorithm to alleviate the disturbance of DP noises by sharing negated noises among clients. We theoretically prove that: 1) If clients are trustworthy, DP noises can be perfectly offset on the PS; 2) Clients can easily distort negated DP noises to protect themselves in case that other clients are not totally trustworthy, though the cost lowers model accuracy. NISS is particularly applicable for FL across multiple IoT (Internet of Things) systems, in which all IoT devices need to collaboratively train a model. To verify the effectiveness and the superiority of the NISS algorithm, we conduct experiments with the MNIST and CIFAR-10 datasets. The experiment results verify our analysis and demonstrate that NISS can improve model accuracy by 21% on average and obtain better privacy protection if clients are trustworthy.

Wenzhuo Yang

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

CodeEvolve: LLM-Driven Evolutionary Optimization with Runtime-Enriched Target Selection for Multi-Language Code Enhancement

Resolving the bias-precision paradox with stochastic causal representation learning for personalized medicine

MACE: An Efficient Model-Agnostic Framework for Counterfactual Explanation

Privacy-Preserving Aggregation in Federated Learning: A Survey

Understanding Security in Smart City Domains From the ANT-centric Perspective

Gain without Pain: Offsetting DP-injected Nosies Stealthily in Cross-device Federated Learning