Source author record

Jiayu Zhang

Jiayu Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

astro-ph.IM Cryptography and Security physics.ins-det Artificial Intelligence Computation and Language cond-mat.mtrl-sci eess.SP eess.SY Information Theory Machine Learning math.IT math.OC q-fin.MF quant-ph Systems and Control

Catalog footprint

What is connected

10works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation

The advancement of Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) has catalyzed the development of mobile graphic user interface (GUI) AI agents, which is designed to autonomously perform tasks on mobile devices. However, a significant gap persists in mobile GUI agent evaluation, where existing benchmarks predominantly rely on either static frame assessments such as AndroidControl or offline static apps such as AndroidWorld and thus fail to capture agent performance in dynamic, real-world online mobile apps. To address this gap, we present Android Agent Arena (A3), a novel "essential-state" based procedural evaluation system for mobile GUI agents. A3 introduces a benchmark of 100 tasks derived from 20 widely-used, dynamic online apps across 20 categories from the Google Play Store, ensuring evaluation comprehension. A3 also presents a novel "essential-state" based procedural evaluation method that leverages MLLMs as reward models to progressively verify task completion and process achievement. This evaluation approach address the limitations of traditional function based evaluation methods on online dynamic apps. Furthermore, A3 includes a toolkit to streamline Android device interaction, reset online environment and apps and facilitate data collection from both human and agent demonstrations. The complete A3 system, including the benchmark and tools, will be publicly released to provide a robust foundation for future research and development in mobile GUI agents.

preprint2026arXiv

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Recent advances in coding agents suggest rapid progress toward autonomous software development, yet existing benchmarks fail to rigorously evaluate the long-horizon capabilities required to build complete software systems. Most prior evaluations focus on localized code generation, scaffolded completion, or short-term repair tasks, leaving open the question of whether agents can sustain coherent reasoning, planning, and execution over the extended horizons demanded by real-world repository construction. To address this gap, we present NL2Repo Bench, a benchmark explicitly designed to evaluate the long-horizon repository generation ability of coding agents. Given only a single natural-language requirements document and an empty workspace, agents must autonomously design the architecture, manage dependencies, implement multi-module logic, and produce a fully installable Python library. Our experiments across state-of-the-art open- and closed-source models reveal that long-horizon repository generation remains largely unsolved: even the strongest agents achieve below 40% average test pass rates and rarely complete an entire repository correctly. Detailed analysis uncovers fundamental long-horizon failure modes, including premature termination, loss of global coherence, fragile cross-file dependencies, and inadequate planning over hundreds of interaction steps. NL2Repo Bench establishes a rigorous, verifiable testbed for measuring sustained agentic competence and highlights long-horizon reasoning as a central bottleneck for the next generation of autonomous coding agents.

preprint2026arXiv

Optimal Underreporting and Competitive Equilibrium

This paper develops a dynamic insurance market model comprising two competing insurance companies and a continuum of insureds, and examines the interaction between strategic underreporting by the insureds and competitive pricing between the insurance companies under a Bonus-Malus System (BMS) framework. For the first time in an oligopolistic setting, we establish the existence and uniqueness of the insureds' optimal reporting barrier, as well as its continuous dependence on the BMS premiums. For the 2-class BMS case, we prove the existence of Nash equilibrium premium strategies and conduct an extensive sensitivity analysis on the impact of the model parameters on the equilibrium premiums.

preprint2026arXiv

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

A growing lesson from neural network optimization is that optimizer design should respect how the model is parametrized. Scale-invariant methods become important because their normalized layerwise updates can not only support hyperparameter transfer across model sizes but exploit input-output matrix norm geometry. At the same time, stochastic gradient noises in deep learning are often far from sub-Gaussian and may exhibit heavy tails. These crucial observations have shaped recent algorithmic principles for training neural networks, yet their joint theoretical consequences remain underexplored. In particular, it is unclear what dimension dependence is unavoidable for scale-invariant methods with general input-output matrix norm, and whether higher-order smoothness can accelerate training under heavy-tailed noise. We study these questions through nonconvex smooth stochastic optimization over $\mathbb{R}^{m\times n}$ with general norms, where the goal is to achieve an $ε$-stationary point under $p^{\mathrm{th}}$-moment heavy-tailed noise. Our first contribution is a dimension-dependent lower bound: when $\frac{\max\{m,n\}}{(\min\{m,n\})^2}$ is large enough, any scale-invariant first-order method with spectral norm requires $Ω(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$ oracle calls. We prove that a batched Scion method with spectral norm achieves the matching upper bound of $O(\min\{m, n\}ε^{-\frac{3p-2}{p-1}})$. To exploit higher-order smoothness, we propose a transported Scion method and improve the bound to $O(\min\{m, n\}ε^{-\frac{5p-3}{2p-2}})$ when the norm is spectral and the Hessian is Lipschitz. Finally, we incorporate practical heuristics into our transported method and evaluate it across multiple architectures and model sizes, demonstrating its flexibility and compatibility in training neural networks.

preprint2020arXiv

Delegating Quantum Computation in the Quantum Random Oracle Model

A delegation scheme allows a computationally weak client to use a server's resources to help it evaluate a complex circuit without leaking any information about the input (other than its length) to the server. In this paper, we consider delegation schemes for quantum circuits, where we try to minimize the quantum operations needed by the client. We construct a new scheme for delegating a large circuit family, which we call "C+P circuits". "C+P" circuits are the circuits composed of Toffoli gates and diagonal gates. Our scheme is non-interactive, requires very little quantum computation from the client (proportional to input length but independent of the circuit size), and can be proved secure in the quantum random oracle model, without relying on additional assumptions, such as the existence of fully homomorphic encryption. In practice the random oracle can be replaced by an appropriate hash function or block cipher, for example, SHA-3, AES. This protocol allows a client to delegate the most expensive part of some quantum algorithms, for example, Shor's algorithm. The previous protocols that are powerful enough to delegate Shor's algorithm require either many rounds of interactions or the existence of FHE. The protocol requires asymptotically fewer quantum gates on the client side compared to running Shor's algorithm locally. To hide the inputs, our scheme uses an encoding that maps one input qubit to multiple qubits. We then provide a novel generalization of classical garbled circuits ("reversible garbled circuits") to allow the computation of Toffoli circuits on this encoding. We also give a technique that can support the computation of phase gates on this encoding. To prove the security of this protocol, we study key dependent message(KDM) security in the quantum random oracle model. KDM security was not previously studied in quantum settings.

preprint2020arXiv

Joint Beam Training and Data Transmission Design for Covert Millimeter-Wave Communication

Covert communication prevents legitimate transmission from being detected by a warden while maintaining certain covert rate at the intended user. Prior works have considered the design of covert communication over conventional low-frequency bands, but few works so far have explored the higher-frequency millimeter-wave (mmWave) spectrum. The directional nature of mmWave communication makes it attractive for covert transmission. However, how to establish such directional link in a covert manner in the first place remains as a significant challenge. In this paper, we consider a covert mmWave communication system, where legitimate parties Alice and Bob adopt beam training approach for directional link establishment. Accounting for the training overhead, we develop a new design framework that jointly optimizes beam training duration, training power and data transmission power to maximize the effective throughput of Alice-Bob link while ensuring the covertness constraint at warden Willie is met. We further propose a dual-decomposition successive convex approximation algorithm to solve the problem efficiently. Numerical studies demonstrate interesting tradeoff among the key design parameters considered and also the necessity of joint design of beam training and data transmission for covert mmWave communication.

preprint2020arXiv

Who Is Charging My Phone? Identifying Wireless Chargers via Fingerprinting

With the increasing popularity of the Internet of Things(IoT) devices, the demand for fast and convenient battery charging services grows rapidly. Wireless charging is a promising technology for such a purpose and its usage has become ubiquitous. However, the close distance between the charger and the device being charged not only makes proximity-based and near field communication attacks possible, but also introduces a new type of vulnerabilities. In this paper, we propose to create fingerprints for wireless chargers based on the intrinsic non-linear distortion effects of the underlying charging circuit. Using such fingerprints, we design the WirelessID system to detect potential short-range malicious wireless charging attacks. WirelessID collects signals in the standby state of the charging process and sends them to a trusted server, which can extract the fingerprint and then identify the charger.

preprint2016arXiv

Efficient Thermal Conductance in Organometallic Perovskite CH3NH3PbI3 Films

Perovskite-based optoelectronic devices have shown great promise for solar conversion and other optoelectronic applications, but their long-term performance instability is regarded as a major obstacle to their widespread deployment. Previous works have shown that the ultralow thermal conductivity and inefficient heat spreading might put an intrinsic limit on the lifetime of perovskite devices. Here, we report the observation of a remarkably efficient thermal conductance, with conductivity of 11.2 +/- 0.8 W m^-1 K^-1 at room temperature, in densely-packed perovskite CH3NH3PbI3 films, via noncontact time-domain thermal reflectance measurements. The temperature-dependent experiments suggest the important roles of organic cations and structural phase transitions, which are further confirmed by temperature-dependent Raman spectra. The thermal conductivity at room temperature observed here is over one order of magnitude larger than that in the early report, suggesting that perovskite device performance will not be limited by thermal stability.

preprint2015arXiv

Ground-based verification and data processing of Yutu rover Active Particle-induced X-ray Spectrometer

The Active Particle-induced X-ray Spectrometer (APXS) is one of the payloads on board the Yutu rover of Chang'E-3 mission. In order to assess the instrumental performance of APXS, a ground verification test was done for two unknown samples (basaltic rock, mixed powder sample). In this paper, the details of the experiment configurations and data analysis method are presented. The results show that the elemental abundance of major elements can be well determined by the APXS with relative deviations < 15 wt. % (detection distance = 30 mm, acquisition time = 30 min). The derived detection limit of each major element is inversely proportional to acquisition time and directly proportional to detection distance, suggesting that the appropriate distance should be < 50mm.

preprint2014arXiv

Physical Design and Monte Carlo Simulations of a Space Radiation Detector onboard the SJ-10 satellite

A radiation gene box (RGB) onboard the SJ-10 satellite is a device carrying mice and drosophila cells to determine the biological effects of space radiation environment. The shielded fluxes of different radioactive sources were calculated and the linear energy transfers of gamma-rays, electrons, protons and alpha-particles in tissue were acquired using A-150 tissue-equivalent plastic. Then, a conceptual model of a space radiation instrument employing three semiconductor sub-detectors for deriving the charged and uncharged radiation environment of the RGB was designed. The energy depositions in the three sub-detectors were classified into fifteen channels (bins) in an algorithm derived from the Monte Carlo method. The physical feasibility of the conceptual instrument was also verified by Monte Carlo simulations.

Jiayu Zhang

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

A3: Android Agent Arena for Mobile GUI Agents with Essential-State Procedural Evaluation

NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents

Optimal Underreporting and Competitive Equilibrium

Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise

Delegating Quantum Computation in the Quantum Random Oracle Model

Joint Beam Training and Data Transmission Design for Covert Millimeter-Wave Communication

Who Is Charging My Phone? Identifying Wireless Chargers via Fingerprinting

Efficient Thermal Conductance in Organometallic Perovskite CH3NH3PbI3 Films

Ground-based verification and data processing of Yutu rover Active Particle-induced X-ray Spectrometer

Physical Design and Monte Carlo Simulations of a Space Radiation Detector onboard the SJ-10 satellite