Researcher profile

Wenzhi Fang

Wenzhi Fang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2026arXiv

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

Multi-agent large language model (LLM) systems often rely on a controller to coordinate a pool of heterogeneous models, yet existing controllers are typically limited to one-shot routing: they select a model once and return its output directly. Such routing-only designs provide no mechanism to critique intermediate drafts or support iterative refinement. To address this limitation, we propose a critique-and-routing controller that casts multi-agent coordination as a sequential decision problem. At each turn, the controller evaluates the current draft, decides whether to stop or continue, and, if needed, selects the next agent for further refinement. We formulate this process as a finite-horizon Markov Decision Process (MDP) with explicit agent-utilization constraints, design a composite reward for controller decisions across turns, and optimize the controller via policy gradients under a Lagrangian-relaxed objective. Extensive experiments across multiple heterogeneous multi-agent systems and seven reasoning benchmarks show that our method consistently outperforms state-of-the-art baselines and substantially narrows the gap to the strongest agent, while using it for fewer than 25% of total calls.

preprint2026arXiv

Large Language Models over Networks: Collaborative Intelligence under Resource Constraints

Large language models (LLMs) are transforming society, powering applications from smartphone assistants to autonomous driving. Yet cloud-based LLM services alone cannot serve a growing class of applications, including those operating under intermittent connectivity, sub-second latency budgets, data-residency constraints, or sustained high-volume inference. On-device deployment is in turn constrained by limited computation and memory. No single endpoint can deliver high-quality service across this spectrum. This article focuses on collaborative intelligence, a paradigm in which multiple independent LLMs distributed across device and cloud endpoints collaborate at the task level through natural language or structured messages. Such collaboration strives for superior response quality under heterogeneous resource constraints spanning computation, memory, communication, and cost across network tiers. We present collaborative inference along two complementary and composable dimensions: vertical device-cloud collaboration and horizontal multi-agent collaboration, which can be combined into hybrid topologies in practice. We then examine learning to collaborate, addressing the training of routing policies and the development of cooperative capabilities among LLMs. Finally, we identify open research challenges including scaling under resource heterogeneity and trustworthy collaborative intelligence.

preprint2026arXiv

PAAC: Privacy-Aware Agentic Device-Cloud Collaboration

Large language model (LLM) agents face a structural tension: cloud agents provide strong reasoning but expose user data, while on-device agents preserve privacy at the cost of overall capability. Existing device-cloud designs treat this boundary as a compute split rather than a trust boundary suited to agentic workloads, and existing sanitizers force a choice between policy flexibility and the structural fidelity tool calls require. In this work, we develop PAAC, a privacy-aware agentic framework that aligns planner--executor decomposition with the device-cloud boundary so that role specialization itself becomes the privacy mechanism. The cloud agent reasons over typed placeholder tokens that preserve each sensitive value's reasoning role while discarding its content, while the on-device agent identifies sensitive spans and distills each step's execution outcome into compact key findings. Sanitization confines the on-device LLM to proposing which spans to mask, while a deterministic registry performs all substitution and reversal, keeping actions directly executable on device. On three agentic benchmarks under strict privacy settings, PAAC dominates the Pareto frontier of privacy and accuracy, improving average accuracy by 15-36\% and reducing average leakage by 2-6$\times$ over state-of-the-art device-cloud baselines, with the largest margins on privacy targets outside fixed entity taxonomies. We find consistent improvements on 17 additional benchmarks spanning 10 domains, including math, science, and finance.

preprint2026arXiv

Self-Play Enhancement via Advantage-Weighted Refinement in Online Federated LLM Fine-Tuning with Real-Time Feedback

Recent works have advanced feedback-based learning systems, whereby a foundation model is able to intake incoming feedback (e.g., a user) to self-improve, creating a self-loop system of training. However, existing works are limited in needing to consider an offline setup to allow for such feedback-based methods, and are further limited in the need of requiring privileged ground-truth contexts for training. Moreover, there is limited consideration of federated learning (FL), which is particularly well-suited for incorporating external feedback across large networks of end users, for example, but requires methods to be efficient for training on resource-constrained edge devices. Therefore, we introduce SPEAR (Self-Play Enhancement via Advantage-Weighted Refinement), an efficient online learning algorithm for federated LLM fine-tuning. SPEAR utilizes a feedback-guided self-play loop to construct naturally contrastive pairs per prompt which are utilized to be trained on (i) standard maximum likelihood on correct completions and (ii) confidence-weighted unlikelihood on tail tokens of incorrect completions. Without the need of expensive group generations and ground-truth contexts for training (i.e., only partial, non-answer feedback), in contrast with existing works, SPEAR can be trained both online and in a resource-efficient manner. We validate SPEAR across various benchmark datasets, demonstrating its superior performance in comparison to state-of-the-art baselines. The implementation code is publicly available at https://github.com/lee3296/SPEAR.

preprint2020arXiv

Outage Minimization for Intelligent Reflecting Surface Aided MISO Communication Systems via Stochastic Beamforming

Intelligent reflecting surface (IRS) has the potential to significantly enhance the network performance by reconfiguring the wireless propagation environments. It is however difficult to obtain the accurate downlink channel state information (CSI) for efficient beamforming design in IRS-aided wireless networks. In this article, we consider an IRS-aided downlink multiple-input single-output (MISO) network, where the base station (BS) is not required to know the underlying channel distribution. We formulate an outage probability minimization problem by jointly optimizing the beamforming vector at the BS and the phase-shift matrix at the IRS, while taking into account the transmit power and unimodular constraints. The formulated problem turns out to be a non-convex non-smooth stochastic optimization problem. To this end, we employ the sigmoid function as the surrogate to tackle the non-smoothness of the objective function. In addition, we propose a data-driven efficient alternating stochastic gradient descent (SGD) algorithm to solve the problem by utilizing the historical channel samples. Simulation results demonstrate the performance gains of the proposed algorithm over the benchmark methods in terms of minimizing the outage probability.

preprint2020arXiv

Stochastic Beamforming for Reconfigurable Intelligent Surface Aided Over-the-Air Computation

Over-the-air computation (AirComp) is a promising technology that is capable of achieving fast data aggregation in Internet of Things (IoT) networks. The mean-squared error (MSE) performance of AirComp is bottlenecked by the unfavorable channel conditions. This limitation can be mitigated by deploying a reconfigurable intelligent surface (RIS), which reconfigures the propagation environment to facilitate the receiving power equalization. The achievable performance of RIS relies on the availability of accurate channel state information (CSI), which however is generally difficult to be obtained. In this paper, we consider an RIS-aided AirComp IoT network, where an access point (AP) aggregates sensing data from distributed devices. Without assuming any prior knowledge on the underlying channel distribution, we formulate a stochastic optimization problem to maximize the probability that the MSE is below a certain threshold. The formulated problem turns out to be non-convex and highly intractable. To this end, we propose a data-driven approach to jointly optimize the receive beamforming vector at the AP and the phase-shift vector at the RIS based on historical channel realizations. After smoothing the objective function by adopting the sigmoid function, we develop an alternating stochastic variance reduced gradient (SVRG) algorithm with a fast convergence rate to solve the problem. Simulation results demonstrate the effectiveness of the proposed algorithm and the importance of deploying an RIS in reducing the MSE outage probability.