Source author record

Zhiyuan Yao

Zhiyuan Yao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.stat-mech cond-mat.str-el Artificial Intelligence cond-mat.quant-gas Machine Learning Computation and Language cond-mat.dis-nn Networking and Internet Architecture astro-ph.GA Distributed, Parallel, and Cluster Computing physics.comp-ph quant-ph

Catalog footprint

What is connected

13works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

Most scientific materials compress reasoning, presenting conclusions while omitting the derivational chains that justify them. This compression hinders verification by lacking explicit, step-wise justifications and inhibits cross-domain links by collapsing the very pathways that establish the logical and causal connections between concepts. We introduce a scalable framework that decompresses scientific reasoning, constructing a verifiable Long Chain-of-Thought (LCoT) knowledge base and projecting it into an emergent encyclopedia, SciencePedia. Our pipeline operationalizes an endpoint-driven, reductionist strategy: a Socratic agent, guided by a curriculum of around 200 courses, generates approximately 3 million first-principles questions. To ensure high fidelity, multiple independent solver models generate LCoTs, which are then rigorously filtered by prompt sanitization and cross-model answer consensus, retaining only those with verifiable endpoints. This verified corpus powers the Brainstorm Search Engine, which performs inverse knowledge search -- retrieving diverse, first-principles derivations that culminate in a target concept. This engine, in turn, feeds the Plato synthesizer, which narrates these verified chains into coherent articles. The initial SciencePedia comprises approximately 200,000 fine-grained entries spanning mathematics, physics, chemistry, biology, engineering, and computation. In evaluations across six disciplines, Plato-synthesized articles (conditioned on retrieved LCoTs) exhibit substantially higher knowledge-point density and significantly lower factual error rates than an equally-prompted baseline without retrieval (as judged by an external LLM). Built on this verifiable LCoT knowledge base, this reasoning-centric approach enables trustworthy, cross-domain scientific synthesis at scale and establishes the foundation for an ever-expanding encyclopedia.

preprint2026arXiv

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

As Large Language Models (LLMs) evolve from static dialogue interfaces to autonomous general agents, effective memory is paramount to ensuring long-term consistency. However, existing benchmarks primarily focus on casual conversation or task-oriented dialogue, failing to capture **"long-term project-oriented"** interactions where agents must track evolving goals. To bridge this gap, we introduce **RealMem**, the first benchmark grounded in realistic project scenarios. RealMem comprises over 2,000 cross-session dialogues across eleven scenarios, utilizing natural user queries for evaluation. We propose a synthesis pipeline that integrates Project Foundation Construction, Multi-Agent Dialogue Generation, and Memory and Schedule Management to simulate the dynamic evolution of memory. Experiments reveal that current memory systems face significant challenges in managing the long-term project states and dynamic context dependencies inherent in real-world projects. Our code and datasets are available at [https://github.com/AvatarMemory/RealMemBench](https://github.com/AvatarMemory/RealMemBench).

preprint2026arXiv

Self-Distilled Agentic Reinforcement Learning

Reinforcement learning (RL) has emerged as a central paradigm for post-training LLM agents, yet its trajectory-level reward signal provides only coarse supervision for long-horizon interaction. On-Policy Self-Distillation (OPSD) complements RL by introducing dense token-level guidance from a teacher branch augmented with privileged context. However, transferring OPSD to multi-turn agents proves problematic: compounding multi-turn instability destabilizes supervision, while skill-conditioned privileged guidance requires asymmetric treatment for negative teacher rejections may arise from imperfect skills retrieval or utilization. We introduce SDAR (Self-Distilled Agentic Reinforcement Learning), which treats OPSD as a gated auxiliary objective while keeping RL as the primary optimization backbone. SDAR maps detached token-level signals into a sigmoid gate, strengthening distillation on teacher-endorsed positive-gap tokens and softly attenuating negative teacher rejections. Across the Qwen2.5 and Qwen3 families on ALFWorld, WebShop, and Search-QA, SDAR substantially improves over GRPO (+9.4% on ALFWorld, +7.0% on Search-QA, +10.2% on WebShop-Acc), avoids the instability of naive GRPO+OPSD, and consistently outperforms hybrid RL--OPSD baselines across model scales.

preprint2023arXiv

Probing quantum many-body correlations by universal ramping dynamics

Ramping a physical parameter is one of the most common experimental protocols in studying a quantum system, and ramping dynamics has been widely used in preparing a quantum state and probing physical properties. Here, we present a novel method of probing quantum many-body correlation by ramping dynamics. We ramp a Hamiltonian parameter to the same target value from different initial values and with different velocities, and we show that the first-order correction on the finite ramping velocity is universal and path-independent, revealing a novel quantum many-body correlation function of the equilibrium phases at the target values. We term this method as the non-adiabatic linear response since this is the leading order correction beyond the adiabatic limit. We demonstrate this method experimentally by studying the Bose-Hubbard model with ultracold atoms in three-dimensional optical lattices. Unlike the conventional linear response that reveals whether the quasi-particle dispersion of a quantum phase is gapped or gapless, this probe is more sensitive to whether the quasi-particle lifetime is long enough such that the quantum phase possesses a well-defined quasi-particle description. In the Bose-Hubbard model, this non-adiabatic linear response is significant in the quantum critical regime where well-defined quasi-particles are absent. And in contrast, this response is vanishingly small in both superfluid and Mott insulators which possess well-defined quasi-particles. Because our proposal uses the most common experimental protocol, we envision that our method can find broad applications in probing various quantum systems.

preprint2022arXiv

Efficient Data-Driven Network Functions

Cloud environments require dynamic and adaptive networking policies. It is preferred to use heuristics over advanced learning algorithms in Virtual Network Functions (VNFs) in production becuase of high-performance constraints. This paper proposes Aquarius to passively yet efficiently gather observations and enable the use of machine learning to collect, infer, and supply accurate networking state information-without incurring additional signalling and management overhead. This paper illustrates the use of Aquarius with a traffic classifier, an autoscaling system, and a load balancer-and demonstrates the use of three different machine learning paradigms-unsupervised, supervised, and reinforcement learning, within Aquarius, for inferring network state. Testbed evaluations show that Aquarius increases network state visibility and brings notable performance gains with low overhead.

preprint2022arXiv

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.

preprint2022arXiv

Towards Modern Card Games with Large-Scale Action Spaces Through Action Representation

Axie infinity is a complicated card game with a huge-scale action space. This makes it difficult to solve this challenge using generic Reinforcement Learning (RL) algorithms. We propose a hybrid RL framework to learn action representations and game strategies. To avoid evaluating every action in the large feasible action set, our method evaluates actions in a fixed-size set which is determined using action representations. We compare the performance of our method with the other two baseline methods in terms of their sample efficiency and the winning rates of the trained models. We empirically show that our method achieves an overall best winning rate and the best sample efficiency among the three methods.

preprint2021arXiv

Noise Enhanced Neural Networks for Analytic Continuation

Analytic continuation maps imaginary-time Green's functions obtained by various theoretical/numerical methods to real-time response functions that can be directly compared with experiments. Analytic continuation is an important bridge between many-body theories and experiments but is also a challenging problem because such mappings are ill-conditioned. In this work, we develop a neural network-based method for this problem. The training data is generated either using synthetic Gaussian-type spectral functions or from exactly solvable models where the analytic continuation can be obtained analytically. Then, we applied the trained neural network to the testing data, either with synthetic noise or intrinsic noise in Monte Carlo simulations. We conclude that the best performance is always achieved when a proper amount of noise is added to the training data. Moreover, our method can successfully capture multi-peak structure in the resulting response function for the cases with the best performance. The method can be combined with Monte Carlo simulations to compare with experiments on real-time dynamics.

preprint2021arXiv

Quantum Many-Body Scars and Quantum Criticality

In this letter, we study the PXP Hamiltonian with an external magnetic field that exhibits both quantum scar states and quantum criticality. It is known that this model hosts a series of quantum many-body scar states violating quantum thermalization at zero magnetic field, and it also exhibits an Ising quantum phase transition driven by finite magnetic field. Although the former involves the properties of generic excited states and the latter concerns the low-energy physics, we discover two surprising connections between them, inspired by the observation that both states possess log-volume law entanglement entropies. First, we show that the quantum many-body scar states can be tracked to a set of quantum critical states, whose nature can be understood as pair-wisely occupied Fermi sea states. Second, we show that the partial violation of quantum thermalization diminishes in the quantum critical regime. We envision that these connections can be extended to general situations and readily verified in existing cold atom experimental platforms.

preprint2020arXiv

Many-Body Localization from Dynamical Gauge Fields

A recent experiment [Nature Physics 10, 1 (2019)] has realized a dynamical gauge system with $\mathbb{Z}_2$ gauge symmetry in a double-well potential. In this work we propose a method to generalize this model from a single double well to a one-dimensional chain. We show that although there is no disordered potential in the original model, the phenomenon of many-body localization can occur. The key ingredient is that different symmetry sectors with different local gauge charges play the role of different disorder configurations, which becomes clear after exactly mapping our model to a transverse Ising model in a random longitudinal field. We show that both the ergodic regime and the many-body localized regime exist in this model from four different metrics, which include level statistics, volume law versus area law of entanglement entropy of eigenstates, quench dynamics of entanglement entropy and physical observables.

preprint2019arXiv

Hot Gas Flows on Parsec Scale in the Low-Luminosity Active Galactic Nucleus NGC 3115

NGC 3115 is known as the low-luminosity active galactic nucleus which hosts the nearest ($z\sim0.002$) billion solar mass supermassive black hole ($\sim1.5\times10^9~M_\odot$). Its Bondi radius $r_\mathrm{B}$ ($\sim3\farcs6$) can be readily resolved with Chandra, which offers us an excellent opportunity to investigate the accretion flow onto a supermassive black hole. In this paper, we perform two-dimensional hydrodynamical numerical simulations, tailored for NGC 3115, on the mass flow across the Bondi radius. Our best fittings for the density and temperature agree well with the observations of the hot interstellar medium in the centre of NGC 3115. We find that the flow properties are solely determined by the local galaxy properties in the galaxy centre: (1) stellar winds (including supernova ejecta) supply the mass and energy sources for the accreting gas; (2) similar to the one-dimensional calculations, a stagnation radius $r_\mathrm{st}\sim0.1~r_\mathrm{B}$ is also found in the two-dimensional simulations, which divides the mass flow into an inflow-outflow structure; (3) the radiatively inefficient accretion flow theory applies well inside the stagnation radius, where the gravity is dominated by the supermassive black hole and the gas is supported by rotation; (4) beyond the stagnation radius, the stellar gravity dominates the spherical-like fluid dynamics and causes the transition from a steep density profile outside to a flat density profile inside the Bondi radius.

preprint2016arXiv

Superfluid--Insulator Transition in Strongly Disordered One-dimensional Systems

We present an asymptotically exact renormalization-group theory of the superfluid--insulator transition in one-dimensional disordered systems, with emphasis on an accurate description of the interplay between the Giamarchi--Schulz (instanton--anti-instanton) and weak-link (scratched-XY) criticalities. Combining the theory with extensive quantum Monte Carlo simulations allows us to shed new light on the ground-state phase diagram of the one-dimensional disordered Bose-Hubbard model at unit filling.

preprint2014arXiv

Critical Exponents of the Superfluid-Bose Glass Transition in Three-Dimensions

Recent experimental and numerical studies of the critical-temperature exponent $ϕ$ for the superfluid-Bose glass universality in three-dimensional systems report strong violations of the key quantum critical relation, $ϕ=νz$, where $z$ and $ν$ are the dynamic and correlation length exponents, respectively, and question the fundamental concepts underlying quantum critical phenomena. Using Monte Carlo simulations of the disordered Bose-Hubbard model, we demonstrate that previous work on the superfluid-to-normal fluid transition-temperature dependence on chemical potential (or magnetic field, in spin systems), $T_c \propto (μ-μ_c)^ϕ$, was misinterpreting transient behavior on approach to the fluctuation region with the genuine critical law. When the model parameters are modified to have a broad quantum critical region, simulations of both quantum and classical models reveal that the $ϕ=νz$ law [with $ϕ=2.7(2)$, $z=3$, and $ν= 0.88(5)$] holds true, resolving the $ϕ$-exponent "crisis".

Zhiyuan Yao

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Inverse Knowledge Search over Verifiable Reasoning: Synthesizing a Scientific Encyclopedia from a Long Chains-of-Thought Knowledge Base

RealMem: Benchmarking LLMs in Real-World Memory-Driven Interaction

Self-Distilled Agentic Reinforcement Learning

Probing quantum many-body correlations by universal ramping dynamics

Efficient Data-Driven Network Functions

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

Towards Modern Card Games with Large-Scale Action Spaces Through Action Representation

Noise Enhanced Neural Networks for Analytic Continuation

Quantum Many-Body Scars and Quantum Criticality

Many-Body Localization from Dynamical Gauge Fields

Hot Gas Flows on Parsec Scale in the Low-Luminosity Active Galactic Nucleus NGC 3115

Superfluid--Insulator Transition in Strongly Disordered One-dimensional Systems

Critical Exponents of the Superfluid-Bose Glass Transition in Three-Dimensions