Source author record

Zeyu Zheng

Zeyu Zheng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning q-fin.ST Computation and Language math.CO math.OC physics.data-an cs.CY eess.SP math.PR Networking and Internet Architecture physics.app-ph physics.geo-ph

Catalog footprint

What is connected

21works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

Automating scientific discovery requires more than generating papers from ideas. Real research is iterative: hypotheses are challenged from multiple perspectives, experiments fail and inform the next attempt, and lessons accumulate across cycles. Existing autonomous research systems often model this process as a linear pipeline: they rely on single-agent reasoning, stop when execution fails, and do not carry experience across runs. We present AutoResearchClaw, a multi-agent autonomous research pipeline built on five mechanisms: structured multi-agent debate for hypothesis generation and result analysis, a self-healing executor with a \textsc{Pivot}/\textsc{Refine} decision loop that transforms failures into information, verifiable result reporting that prevents fabricated numbers and hallucinated citations, human-in-the-loop collaboration with seven intervention modes spanning full autonomy to step-by-step oversight, and cross-run evolution that converts past mistakes into future safeguards. On ARC-Bench, a 25-topic experiment-stage benchmark, AutoResearchClaw outperforms AI Scientist v2 by 54.7%. A human-in-the-loop ablation across seven intervention modes reveals that precise, targeted collaboration at high-leverage decision points consistently outperforms both full autonomy and exhaustive step-by-step oversight. We position AutoResearchClaw as a research amplifier that augments rather than replaces human scientific judgment. Code is available at https://github.com/aiming-lab/AutoResearchClaw.

preprint2026arXiv

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

Interactive agent benchmarks face a tension between scalable construction and realistic workflow evaluation. Hand-authored tasks are expensive to extend and revise, while static prompt evaluation misses failures that only appear when agents operate over persistent state. Existing interactive benchmarks have advanced agent evaluation significantly, but most initialize tasks from clean state and do not systematically test how agents handle pre-existing partial, stale, or conflicting artifacts. We present \textbf{ClawForge}, a generator-backed benchmark framework for executable command-line workflows under state conflict. The framework compiles scenario templates, grounded slots, initialized state, reference trajectories, and validators into reproducible task specifications, and evaluates agents step by step over persistent workflow surfaces using normalized end state and observable side effects rather than exact trajectory matching. We instantiate this framework as the ClawForge-Bench (17 scenarios, 6 ability categories). Results across seven frontier models show that the best model reaches only 45.3% strict accuracy, wrong-state replacement remains below 17\% for all models, and the widest model separation (17% to 90%) is driven by whether agents inspect existing state before acting. Partial-credit and step-efficiency analyses further reveal that many failures are near-miss closures rather than early breakdowns, and that models exhibit qualitatively different failure styles under state conflict.

preprint2026arXiv

Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

Large Language Models (LLMs) are highly sensitive to their input contexts, motivating the development of automated context engineering. However, existing methods predominantly treat this as a global search problem, seeking a single context strategy that maximizes average performance across a dataset. This restrictive assumption overlooks the fact that different inputs often require distinct guidance, leaving substantial instance-level performance gains untapped. In this paper, we propose a paradigm shift by formulating context engineering as a recommendation problem. We introduce \textbf{Neural Collaborative Context Engineering (NCCE)}, a framework that transitions optimization from a static global search to dynamic, instance-wise routing. NCCE first bootstraps a diverse catalog of anchor contexts and then employs a novel \textbf{Context-CF Co-Evolution} mechanism. This stage establishes a synergistic feedback loop: a lightweight Neural Collaborative Filtering (NCF) model learns instance-context preferences to guide the generation of specialized context variants, while the newly evaluated contexts continuously refine the NCF model's understanding of latent preferences. At inference time, the trained NCF model acts as a context router, dynamically assigning the most suitable context strategy to each unseen instance. Theoretical Proofs and comprehensive experiments demonstrate that by matching individual inputs with their optimal contexts, NCCE significantly improves task accuracy, highlighting the critical importance of personalization in LLM context engineering.

preprint2026arXiv

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

Long-term memory is essential for LLM agents that operate across multiple sessions, yet existing memory systems treat retrieval infrastructure as fixed: stored content evolves while scoring functions, fusion strategies, and answer-generation policies remain frozen at deployment. We argue that truly adaptive memory requires co-evolution at two levels: the stored knowledge and the retrieval mechanism that queries it. We present EvolveMem, a self-evolving memory architecture that exposes its full retrieval configuration as a structured action space optimized by an LLM-powered diagnosis module. In each evolution round, the module reads per-question failure logs, identifies root causes, and proposes targeted configuration adjustments; a guarded meta-analyzer applies them with automatic revert-on-regression and explore-on-stagnation safeguards. This closed-loop self-evolution realizes an AutoResearch process: the system autonomously conducts iterative research cycles on its own architecture, replacing manual configuration tuning. Starting from a minimal baseline, the process converges autonomously, discovering effective retrieval strategies including entirely new configuration dimensions not present in the original action space. On LoCoMo, EvolveMem outperforms the strongest baseline by 25.7% relative and achieves a 78.0% relative improvement over the minimal baseline. On MemBench, EvolveMem exceeds the strongest baseline by 18.9% relative. Evolved configurations transfer across benchmarks with positive rather than catastrophic transfer, indicating that the self-evolution process captures universal retrieval principles rather than benchmark-specific heuristics. Code is available at https://github.com/aiming-lab/SimpleMem.

preprint2026arXiv

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

Existing reinforcement learning (RL)-based post-training methods for large language models have advanced rapidly, yet their design has largely been guided by heuristics rather than systematic theoretical principles. This gap limits our understanding of the properties of the gradient estimators and the associated optimization algorithms, thereby constraining opportunities to improve training stability and overall performance. In this work, we provide a unified theoretical framework that characterizes the statistical properties of commonly used policy-gradient estimators under mild assumptions. Our analysis establishes unbiasedness, derives exact variance expressions, and yields an optimization-loss upper bound that enables principled reasoning about learning dynamics. Building on these results, we prove convergence guarantees and derive an adaptive learning-rate schedule governed by the signal-to-noise ratio (SNR) of gradients. We further show that the variance-optimal baseline is a gradient-weighted estimator, offering a new principle for variance reduction and naturally enhancing stability beyond existing methods. These insights motivate Optimal Baseline and Learning-Rate Policy Optimization (OBLR-PO), an algorithm that jointly adapts learning rates and baselines in a theoretically grounded manner. Experiments on Qwen3-4B-Base and Qwen3-8B-Base demonstrate consistent gains over existing policy optimization methods, validating that our theoretical contributions translate into practical improvements in large-scale post-training.

preprint2026arXiv

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

The implicit policy of maintaining relatively stable acceptance rates at top AI conferences, despite exponentially growing submissions, introduces a critical structural vulnerability. This position paper characterizes a new systemic threat we term Agentic Denominator Gaming, in which a malicious actor deploys AI agents to generate and submit a large volume of superficially plausible but low-quality papers. Crucially, their objective is not the acceptance of low-quality papers, but rather to inflate the submission denominator and overwhelm reviewing capacity. Under a relatively stable acceptance rate, this dilution can systematically increase the publication probability of a small, targeted set of legitimate papers. We analyze the practical feasibility of this threat and its broader consequences, including intensified reviewer burnout, degraded review quality, and the emergence of industrialized automated agent mills. Finally, we propose and evaluate a range of mitigation strategies, and argue that durable protection will require system-level policy and incentive reforms, rather than relying primarily on technical detection alone.

preprint2026arXiv

Selection of the Best Policy under Fairness Constraints for Subpopulations

Many high-stakes decisions in health care, public policy, and clinical development require committing to a single policy that will be applied uniformly across a heterogeneous population. Regulatory and fairness standards sometime requires that the chosen policy performs adequately in every pre-specified subpopulation, not only on average. We formalize this as a Selection of the Best with Fairness Constraints (SBFC) problem, in order to identify the policy with the highest average performance among those policies that meet a minimum per-subpopulation threshold. We establish an instance-specific lower bound on sample complexity of the SBFC problem. We then develop a Track-and-Stop with Constraints on Subpopulation (T-a-S-CS) algorithm that achieves the lower bound asymptotically. We extend the framework to general closed-set and penalty-based fairness specifications with matching guarantees. Numerical experiments and a case study using the International Stroke Trial demonstrate substantial efficiency gains over policy-level allocation baselines.

preprint2022arXiv

A Short Proof of a Convex Representation for Stationary Distributions of Markov Chains with an Application to State Space Truncation

In an influential paper, Courtois and Semal (1984) establish that when $G$ is an irreducible substochastic matrix for which $\sum_{n=0}^{\infty}G^n <\infty$, then the stationary distribution of any stochastic matrix $P\ge G$ can be expressed as a convex combination of the normalized rows of $(I-G)^{-1} = \sum_{n=0}^{\infty} G^n$. In this note, we give a short proof of this result that extends the theory to the countably infinite and continuous state space settings. This result plays an important role in obtaining error bounds in algorithms involving nearly decomposable Markov chains, and also in state truncations for Markov chains. We also use the representation to establish a new total variation distance error bound for truncated Markov chains.

preprint2022arXiv

Adaptive Pairwise Weights for Temporal Credit Assignment

How much credit (or blame) should an action taken in a state get for a future reward? This is the fundamental temporal credit assignment problem in Reinforcement Learning (RL). One of the earliest and still most widely used heuristics is to assign this credit based on a scalar coefficient, $λ$ (treated as a hyperparameter), raised to the power of the time interval between the state-action and the reward. In this empirical paper, we explore heuristics based on more general pairwise weightings that are functions of the state in which the action was taken, the state at the time of the reward, as well as the time interval between the two. Of course it isn't clear what these pairwise weight functions should be, and because they are too complex to be treated as hyperparameters we develop a metagradient procedure for learning these weight functions during the usual RL training of a policy. Our empirical work shows that it is often possible to learn these pairwise weight functions during learning of the policy to achieve better performance than competing approaches.

preprint2022arXiv

Common kings of a chain of cycles in a strong tournament

It is known that every strong tournament has directed cycles of any length, and thereby strong subtournaments of any size. In this note, we prove that they also can share a common vertex which is a king of all of them. This common vertex can be any king in the whole tournament. Further, the Hamiltonian cycles in them can be recursively constructed by inserting an additional vertex to one directed edge.

preprint2022arXiv

Extremal planar graphs with no cycles of particular lengths

In this paper we estimate the planar Turán number $\mathrm{ex}_\mathcal{P}(n,H)$ of some graphs $H$, i.e., the maximum number of edges in a planar graph $G$ of $n$ vertices not containing $H$ as a subgraph. We give a new, short proof when $H=C_5$, and study the cases when $G$ is bipartite or triangle-free and $H$ is a short even cycle. The proofs are mostly new applications or variants of the "contribution method" introduced by Ghosh, Győri, Martin, Paulos and Xiao in arXiv:2004.14094.

preprint2022arXiv

Gradient-based Algorithms for Convex Discrete Optimization via Simulation

We propose new sequential simulation-optimization algorithms for general convex optimization via simulation problems with high-dimensional discrete decision space. The performance of each choice of discrete decision variables is evaluated via stochastic simulation replications. If an upper bound on the overall level of uncertainties is known, our proposed simulation-optimization algorithms utilize the discrete convex structure and are guaranteed with high probability to find a solution that is close to the best within any given user-specified precision level. The proposed algorithms work for any general convex problem and the efficiency is demonstrated by proven upper bounds on simulation costs. The upper bounds demonstrate a polynomial dependence on the dimension and scale of the decision space. For some discrete optimization via simulation problems, a gradient estimator may be available at low costs along with a single simulation replication. By integrating gradient estimators, which are possibly biased, we propose simulation-optimization algorithms to achieve optimality guarantees with a reduced dependence on the dimension under moderate assumptions on the bias.

preprint2022arXiv

GrASP: Gradient-Based Affordance Selection for Planning

Planning with a learned model is arguably a key component of intelligence. There are several challenges in realizing such a component in large-scale reinforcement learning (RL) problems. One such challenge is dealing effectively with continuous action spaces when using tree-search planning (e.g., it is not feasible to consider every action even at just the root node of the tree). In this paper we present a method for selecting affordances useful for planning -- for learning which small number of actions/options from a continuous space of actions/options to consider in the tree-expansion process during planning. We consider affordances that are goal-and-state-conditional mappings to actions/options as well as unconditional affordances that simply select actions/options available in all states. Our selection method is gradient based: we compute gradients through the planning procedure to update the parameters of the function that represents affordances. Our empirical work shows that it is feasible to learn to select both primitive-action and option affordances, and that simultaneously learning to select affordances and planning with a learned value-equivalent model can outperform model-free RL.

preprint2022arXiv

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

We develop and analyze a set of new sequential simulation-optimization algorithms for large-scale multi-dimensional discrete optimization via simulation problems with a convexity structure. The "large-scale" notion refers to that the decision variable has a large number of values to choose from on each dimension. The proposed algorithms are targeted to identify a solution that is close to the optimal solution given any precision level with any given probability. To achieve this target, utilizing the convexity structure, our algorithm design does not need to scan all the choices of the decision variable, but instead sequentially draws a subset of choices of the decision variable and uses them to "localize" potentially near-optimal solutions to an adaptively shrinking region. To show the power of the localization operation, we first consider one-dimensional large-scale problems. We propose the shrinking uniform sampling algorithm, which is proved to achieve the target with an optimal expected simulation cost under an asymptotic criterion. For multi-dimensional problems, we combine the idea of localization with subgradient information and propose a framework to design stochastic cutting-plane methods and the dimension reduction algorithm, whose expected simulation cost have a low dependence on the scale and the dimension of the problems. The proposed algorithms do not require prior information about the Lipschitz constant of the objective function and the simulation costs are upper bounded by a value that is independent of the Lipschitz constant. Finally, we propose an adaptive algorithm to deal with the unknown noise variance case under the assumption that the randomness of the system is Gaussian. We implement the proposed algorithms on both synthetic and queueing simulation optimization problems, and demonstrate better performances compared to benchmark methods.

preprint2020arXiv

Automated Optical Multi-layer Design via Deep Reinforcement Learning

Optical multi-layer thin films are widely used in optical and energy applications requiring photonic designs. Engineers often design such structures based on their physical intuition. However, solely relying on human experts can be time-consuming and may lead to sub-optimal designs, especially when the design space is large. In this work, we frame the multi-layer optical design task as a sequence generation problem. A deep sequence generation network is proposed for efficiently generating optical layer sequences. We train the deep sequence generation network with proximal policy optimization to generate multi-layer structures with desired properties. The proposed method is applied to two energy applications. Our algorithm successfully discovered high-performance designs, outperforming structures designed by human experts in task 1, and a state-of-the-art memetic algorithm in task 2.

preprint2020arXiv

What Can Learned Intrinsic Rewards Capture?

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should strive to do.

preprint2014arXiv

Predicting market instability: New dynamics between volume and volatility

Econophysics and econometrics agree that there is a correlation between volume and volatility in a time series. Using empirical data and their distributions, we further investigate this correlation and discover new ways that volatility and volume interact, particularly when the levels of both are high. We find that the distribution of the volume-conditional volatility is well fit by a power-law function with an exponential cutoff. We find that the volume-conditional volatility distribution scales with volume, and collapses these distributions to a single curve. We exploit the characteristics of the volume-volatility scatter plot to find a strong correlation between logarithmic volume and a quantity we define as local maximum volatility (LMV), which indicates the largest volatility observed in a given range of trading volumes. This finding supports our empirical analysis showing that volume is an excellent predictor of the maximum value of volatility for both same-day and near-future time periods. We also use a joint conditional probability that includes both volatility and volume to demonstrate that invoking both allows us to better predict the largest next-day volatility than invoking either one alone.

preprint2013arXiv

Analysis of Realized Volatility in Two Trading Sessions of the Japanese Stock Market

We analyze realized volatilities constructed using high-frequency stock data on the Tokyo Stock Exchange. In order to avoid non-trading hours issue in volatility calculations we define two realized volatilities calculated separately in the two trading sessions of the Tokyo Stock Exchange, i.e. morning and afternoon sessions. After calculating the realized volatilities at various sampling frequencies we evaluate the bias from the microstructure noise as a function of sampling frequency. Taking into account of the bias to realized volatility we examine returns standardized by realized volatilities and confirm that price returns on the Tokyo Stock Exchange are described approximately by Gaussian time series with time-varying volatility, i.e. consistent with a mixture of distributions hypothesis.

preprint2013arXiv

Carbon-dioxide emissions trading and hierarchical structure in worldwide finance and commodities markets

In a highly interdependent economic world, the nature of relationships between financial entities is becoming an increasingly important area of study. Recently, many studies have shown the usefulness of minimal spanning trees (MST) in extracting interactions between financial entities. Here, we propose a modified MST network whose metric distance is defined in terms of cross-correlation coefficient absolute values, enabling the connections between anticorrelated entities to manifest properly. We investigate 69 daily time series, comprising three types of financial assets: 28 stock market indicators, 21 currency futures, and 20 commodity futures. We show that though the resulting MST network evolves over time, the financial assets of similar type tend to have connections which are stable over time. In addition, we find a characteristic time lag between the volatility time series of the stock market indicators and those of the EU CO2 emission allowance (EUA) and crude oil futures (WTI). This time lag is given by the peak of the cross-correlation function of the volatility time series EUA (or WTI) with that of the stock market indicators, and is markedly different (>20 days) from 0, showing that the volatility of stock market indicators today can predict the volatility of EU emissions allowances and of crude oil in the near future.

preprint2012arXiv

Service Composition in Service-Oriented Wireless Sensor Networks with Persistent Queries

Service-oriented wireless sensor network(WSN) has been recently proposed as an architecture to rapidly develop applications in WSNs. In WSNs, a query task may require a set of services and may be carried out repetitively with a given frequency during its lifetime. A service composition solution shall be provided for each execution of such a persistent query task. Due to the energy saving strategy, some sensors may be scheduled to be in sleep mode periodically. Thus, a service composition solution may not always be valid during the lifetime of a persistent query. When a query task needs to be conducted over a new service composition solution, a routing update procedure is involved which consumes energy. In this paper, we study service composition design which minimizes the number of service composition solutions during the lifetime of a persistent query. We also aim to minimize the total service composition cost when the minimum number of required service composition solutions is derived. A greedy algorithm and a dynamic programming algorithm are proposed to complete these two objectives respectively. The optimality of both algorithms provides the service composition solutions for a persistent query with minimum energy consumption.

preprint2011arXiv

Scaling of Seismic Memory with Earthquake Size

It has been observed that the earthquake events possess short-term memory, i.e. that events occurring in a particular location are dependent on the short history of that location. We conduct an analysis to see whether real-time earthquake data also possess long-term memory and, if so, whether such autocorrelations depend on the size of earthquakes within close spatiotemporal proximity. We analyze the seismic waveform database recorded by 64 stations in Japan, including the 2011 "Great East Japan Earthquake", one of the five most powerful earthquakes ever recorded which resulted in a tsunami and devastating nuclear accidents. We explore the question of seismic memory through use of mean conditional intervals and detrended fluctuation analysis (DFA). We find that the waveform sign series show long-range power-law anticorrelations while the interval series show long-range power-law correlations. We find size-dependence in earthquake auto-correlations---as earthquake size increases, both of these correlation behaviors strengthen. We also find that the DFA scaling exponent $α$ has no dependence on earthquake hypocenter depth or epicentral distance.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Artificial Intelligence Machine Learning q-fin.ST Computation and Language math.CO math.OC physics.data-an cs.CY eess.SP math.PR Networking and Internet Architecture physics.app-ph physics.geo-ph

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09945:author:3:zeyu-zheng

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.13941:author:4:zeyu-zheng

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.09915:author:6:zeyu-zheng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.14133:author:9:zeyu-zheng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.15721:author:5:zeyu-zheng

Imported May 20, 2026Synced May 20, 2026

arxivconfidence 95%

external id: arxiv:2605.20025:author:32:zeyu-zheng

Imported May 20, 2026Synced May 20, 2026

3 works

Cihang Xie

Researcher

Cihang Xie contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

H. Eugene Stanley

Researcher

H. Eugene Stanley contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Huaxiu Yao

Researcher

Huaxiu Yao contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Jiaqi Liu

Researcher

Jiaqi Liu contributes to research discovery and scholarly infrastructure.

Open to collaborate

Zeyu Zheng

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

AutoResearchClaw: Self-Reinforcing Autonomous Research with Human-AI Collaboration

ClawForge: Generating Executable Interactive Benchmarks for Command-Line Agents

Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

EvolveMem:Self-Evolving Memory Architecture via AutoResearch for LLM Agents

OBLR-PO: A Theoretical Framework for Stable Reinforcement Learning

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

Selection of the Best Policy under Fairness Constraints for Subpopulations

A Short Proof of a Convex Representation for Stationary Distributions of Markov Chains with an Application to State Space Truncation

Adaptive Pairwise Weights for Temporal Credit Assignment

Common kings of a chain of cycles in a strong tournament

Extremal planar graphs with no cycles of particular lengths

Gradient-based Algorithms for Convex Discrete Optimization via Simulation

GrASP: Gradient-Based Affordance Selection for Planning

Stochastic Localization Methods for Convex Discrete Optimization via Simulation

Automated Optical Multi-layer Design via Deep Reinforcement Learning

What Can Learned Intrinsic Rewards Capture?

Predicting market instability: New dynamics between volume and volatility

Analysis of Realized Volatility in Two Trading Sessions of the Japanese Stock Market

Carbon-dioxide emissions trading and hierarchical structure in worldwide finance and commodities markets

Service Composition in Service-Oriented Wireless Sensor Networks with Persistent Queries

Scaling of Seismic Memory with Earthquake Size