Researcher profile

Jiayu Zhang

Jiayu Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation

The advancement of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has catalyzed the development of mobile graphic user interface (GUI) AI agents, which is designed to autonomously perform tasks on mobile devices. However, a significant gap persists in mobile GUI agent evaluation, where existing benchmarks predominantly rely on either static frame assessments such as AndroidControl or offline static apps such as AndroidWorld and thus fail to capture agent performance in dynamic, real-world online mobile apps. To address this gap, we present Android Agent Arena (A3), a novel "essential-state" based procedural evaluation system for mobile GUI agents. A3 introduces a benchmark of 100 tasks derived from 20 widely-used, dynamic online apps across 20 categories from the Google Play Store, ensuring evaluation comprehension. A3 also presents a novel "essential-state" based procedural evaluation method that leverages MLLMs as reward models to progressively verify task completion and process achievement. This evaluation approach address the limitations of traditional function based evaluation methods on online dynamic apps. Furthermore, A3 includes a toolkit to streamline Android device interaction, reset online environment and apps and facilitate data collection from both human and agent demonstrations. The complete A3 system, including the benchmark and tools, will be publicly released to provide a robust foundation for future research and development in mobile GUI agents.

preprint2026arXiv

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent reasoning, planning, and execution over the extended horizons demanded by real-world repository construction. To address this gap, we present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents. Given only a single natural-language requirements document and an empty workspace, agents must autonomously design the architecture, manage dependencies, implement multi-module logic, and produce a fully installable Python library. Our experiments across state-of-the-art open- and closed-source models reveal that long-horizon repository generation remains largely unsolved: even the strongest agents achieve below 40% average test pass rates and rarely complete an entire repository correctly. Detailed analysis uncovers fundamental long-horizon failure modes, including premature termination, loss of global coherence, fragile cross-file dependencies, and inadequate planning over hundreds of interaction steps. NL2Repo Bench establishes a rigorous, verifiable testbed for measuring sustained agentic competence and highlights long-horizon reasoning as a central bottleneck for the next generation of autonomous coding agents.

preprint2026arXiv

Optimal Underreporting and Competitive Equilibrium

This paper develops a dynamic insurance market model comprising two competing insurance companies and a continuum of insureds, and examines the interaction between strategic underreporting by the insureds and competitive pricing between the insurance companies under a Bonus-Malus System (BMS) framework. For the first time in an oligopolistic setting, we establish the existence and uniqueness of the insureds' optimal reporting barrier, as well as its continuous dependence on the BMS premiums. For the 2-class BMS case, we prove the existence of Nash equilibrium premium strategies and conduct an extensive sensitivity analysis on the impact of the model parameters on the equilibrium premiums.

preprint2026arXiv

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails. These crucial observations have shaped recent algorithmic principles for training neural networks, yet their joint theoretical consequences remain underexplored. In particular, it is unclear what dimension dependence is unavoidable for scale-invariant methods with general input-output matrix norm, and whether higher-order smoothness can accelerate training under heavy-tailed noise. We study these questions through nonconvex smooth stochastic optimization over $\mathbb{R}^{m\times n}$ with general norms, where the goal is to achieve an $ε$-stationary point under $p^{\mathrm{th}}$-moment heavy-tailed noise. Our first contribution is a dimension-dependent lower bound: when $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ is large enough, any scale-invariant first-order method with spectral norm requires $Ω(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$ oracle calls. We prove that a batched Scion method with spectral norm achieves the matching upper bound of $O(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$. To exploit higher-order smoothness, we propose a transported Scion method and improve the bound to $O(\min\{m, n\}ε^{-\frac{5p-3}{2p-2}})$ when the norm is spectral and the Hessian is Lipschitz. Finally, we incorporate practical heuristics into our transported method and evaluate it across multiple architectures and model sizes, demonstrating its flexibility and compatibility in training neural networks.

preprint2020arXiv

Delegating Quantum Computation in the Quantum Random Oracle Model

A delegation scheme allows a computationally weak client to use a server's resources to help it evaluate a complex circuit without leaking any information about the input (other than its length) to the server. In this paper, we consider delegation schemes for quantum circuits, where we try to minimize the quantum operations needed by the client. We construct a new scheme for delegating a large circuit family, which we call "C+P circuits". "C+P" circuits are the circuits composed of Toffoli gates and diagonal gates. Our scheme is non-interactive, requires very little quantum computation from the client (proportional to input length but independent of the circuit size), and can be proved secure in the quantum random oracle model, without relying on additional assumptions, such as the existence of fully homomorphic encryption. In practice the random oracle can be replaced by an appropriate hash function or block cipher, for example, SHA-3, AES. This protocol allows a client to delegate the most expensive part of some quantum algorithms, for example, Shor's algorithm. The previous protocols that are powerful enough to delegate Shor's algorithm require either many rounds of interactions or the existence of FHE. The protocol requires asymptotically fewer quantum gates on the client side compared to running Shor's algorithm locally. To hide the inputs, our scheme uses an encoding that maps one input qubit to multiple qubits. We then provide a novel generalization of classical garbled circuits ("reversible garbled circuits") to allow the computation of Toffoli circuits on this encoding. We also give a technique that can support the computation of phase gates on this encoding. To prove the security of this protocol, we study key dependent message(KDM) security in the quantum random oracle model. KDM security was not previously studied in quantum settings.

preprint2020arXiv

Joint Beam Training and Data Transmission Design for Covert Millimeter-Wave Communication

Covert communication prevents legitimate transmission from being detected by a warden while maintaining certain covert rate at the intended user. Prior works have considered the design of covert communication over conventional low-frequency bands, but few works so far have explored the higher-frequency millimeter-wave (mmWave) spectrum. The directional nature of mmWave communication makes it attractive for covert transmission. However, how to establish such directional link in a covert manner in the first place remains as a significant challenge. In this paper, we consider a covert mmWave communication system, where legitimate parties Alice and Bob adopt beam training approach for directional link establishment. Accounting for the training overhead, we develop a new design framework that jointly optimizes beam training duration, training power and data transmission power to maximize the effective throughput of Alice-Bob link while ensuring the covertness constraint at warden Willie is met. We further propose a dual-decomposition successive convex approximation algorithm to solve the problem efficiently. Numerical studies demonstrate interesting tradeoff among the key design parameters considered and also the necessity of joint design of beam training and data transmission for covert mmWave communication.

preprint2020arXiv

Who Is Charging My Phone? Identifying Wireless Chargers via Fingerprinting

With the increasing popularity of the Internet of Things(IoT) devices, the demand for fast and convenient battery charging services grows rapidly. Wireless charging is a promising technology for such a purpose and its usage has become ubiquitous. However, the close distance between the charger and the device being charged not only makes proximity-based and near field communication attacks possible, but also introduces a new type of vulnerabilities. In this paper, we propose to create fingerprints for wireless chargers based on the intrinsic non-linear distortion effects of the underlying charging circuit. Using such fingerprints, we design the WirelessID system to detect potential short-range malicious wireless charging attacks. WirelessID collects signals in the standby state of the charging process and sends them to a trusted server, which can extract the fingerprint and then identify the charger.