Researcher profile

Qingyun Wu

Qingyun Wu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
10works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

10 published item(s)

preprint2026arXiv

A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence

Large Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledge domains, or dynamic interaction contexts. As LLMs are increasingly deployed in open-ended, interactive environments, this static nature has become a critical bottleneck, necessitating agents that can adaptively reason, act, and evolve in real time. This paradigm shift -- from scaling static models to developing self-evolving agents -- has sparked growing interest in architectures and methods enabling continual learning and adaptation from data, interactions, and experiences. This survey provides the first systematic and comprehensive review of self-evolving agents, organizing the field around three foundational dimensions: what, when, and how to evolve. We examine evolutionary mechanisms across agent components (e.g., models, memory, tools, architecture), categorize adaptation methods by stages (e.g., intra-test-time, inter-test-time), and analyze the algorithmic and architectural designs that guide evolutionary adaptation (e.g., scalar rewards, textual feedback, single-agent and multi-agent systems). Additionally, we analyze evaluation metrics and benchmarks tailored for self-evolving agents, highlight applications in domains such as coding, education, and healthcare, and identify critical challenges and research directions in safety, scalability, and co-evolutionary dynamics. By providing a structured framework for understanding and designing self-evolving agents, this survey establishes a roadmap for advancing more adaptive, robust, and versatile agentic systems in both research and real-world deployments, and ultimately sheds light on the realization of Artificial Super Intelligence (ASI) where agents evolve autonomously and perform beyond human-level intelligence across tasks.

preprint2026arXiv

EVOCHAMBER: Test-Time Co-evolution of Multi-Agent System at Individual, Team, and Population Scales

We argue that multi-agent test-time evolution is not single-agent evolution replicated N times. A single-agent learner can only evolve its own context and memory. A multi-agent system additionally evolves who collaborates, how they collaborate, and how knowledge flows across the population. These components have no single-agent counterpart and can produce phenomena such as emergent specialization. Yet prior test-time methods either confine experiences to individual agents, forfeiting cross-agent learning, or broadcast symmetrically to all agents, erasing the specialization that makes collaboration valuable. We present EVOCHAMBER, a training-free framework that instantiates test-time evolution at three levels over a coevolving agent pool. At its core is CODREAM (Collaborative Dreaming), a post-task protocol triggered on team failure or disagreement, in which agents collaboratively reflect, distill insights, and route them asymmetrically from strong to weak agents on the failed niche, preserving specialization while filling knowledge gaps. Team-level operators assemble niche-conditioned teams and select collaboration structures online. Population-level lifecycle operators fork, merge, prune, and seed agents under performance pressure. On three heterogeneous task streams with Qwen3-8B, EVOCHAMBER reaches 63.9% on competition math, 75.7% on code, and 87.1% on multi-domain reasoning, outperforming the best baseline by 32% relative on math and confirming asymmetric cross-agent transfer as the primary driver in ablation. Starting from several identically initialized agents, four to five stable niche specialists spontaneously emerge, a structural signature of multi-agent evolution that no single-agent learner can express. See our code at: https://github.com/Mercury7353/EvoChamber

preprint2026arXiv

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning

Automatic multi-agent systems aim to instantiate agent workflows without relying on manually designed or fixed orchestration. However, existing automatic MAS approaches remain only partially adaptive: they either perform training-free test-time search or optimize the meta-level designer while keeping downstream execution agents frozen, which creating a frozen-executor ceiling and leaving the end-to-end training of self-designing and self-executing agentic models unexplored. To address this, we introduce MetaAgent-X, an end-to-end reinforcement learning framework that jointly optimizes automatic MAS design and execution. MetaAgent-X enables script-based MAS generation, execution rollout collection, and credit assignment for both designer and executor trajectories. To support stable and scalable optimization, we propose Executor Designer Hierarchical Rollout and Stagewise Co-evolution to improve training stability and expose the dynamics of designer-executor co-evolution. MetaAgent-X consistently outperforms existing automatic MAS baselines, achieving up to 21.7% gains. Comprehensive ablations show that both designer and executor improve throughout training, and that effective automatic MAS learning follows a stagewise co-evolution process. These results establish end-to-end trainable automatic MAS as a practical paradigm for building self-designing and self-executing agentic models.

preprint2022arXiv

Cataloguing MoSi$_2$N$_4$ and WSi$_2$N$_4$ van der Waals Heterostructures: An Exceptional Material Platform for Excitonic Solar Cell Applications

Two-dimensional (2D) materials van der Waals heterostructures (vdWHs) provides a revolutionary route towards high-performance solar energy conversion devices beyond the conventional silicon-based pn junction solar cells. Despite tremendous research progress accomplished in recent years, the searches of vdWHs with exceptional excitonic solar cell conversion efficiency and optical properties remain an open theoretical and experimental quest. Here we show that the vdWH family composed of MoSi$_2$N$_4$ and WSi$_2$N$_4$ monolayers provides a compelling material platform for developing high-performance ultrathin excitonic solar cells and photonics devices. Using first-principle calculations, we construct and classify 51 types of MoSi$_2$N$_4$ and WSi$_2$N$_4$-based [(Mo,W)Si$_2$N$_4$] vdWHs composed of various metallic, semimetallic, semiconducting, insulating and topological 2D materials. Intriguingly, MoSi$_2$N$_4$/(InSe, WSe$_2$) are identified as Type-II vdWHs with exceptional excitonic solar cell power conversion efficiency reaching well over 20%, which are competitive to state-of-art silicon solar cells. The (Mo,W)Si$_2$N$_4$ vdWH family exhibits strong optical absorption in both the visible and ultraviolet regimes. Exceedingly large peak ultraviolet absorptions over 40%, approaching the maximum absorption limit of a free-standing 2D material, can be achieved in (Mo,W)Si$_2$N$_4$/$α_2$-(Mo,W)Ge$_2$P$_4$ vdWHs. Our findings unravel the enormous potential of (Mo,W)Si$_2$N$_4$ vdWHs in designing ultimately compact excitonic solar cell device technology.

preprint2022arXiv

Provably Efficient Reinforcement Learning for Online Adaptive Influence Maximization

Online influence maximization aims to maximize the influence spread of a content in a social network with unknown network model by selecting a few seed nodes. Recent studies followed a non-adaptive setting, where the seed nodes are selected before the start of the diffusion process and network parameters are updated when the diffusion stops. We consider an adaptive version of content-dependent online influence maximization problem where the seed nodes are sequentially activated based on real-time feedback. In this paper, we formulate the problem as an infinite-horizon discounted MDP under a linear diffusion process and present a model-based reinforcement learning solution. Our algorithm maintains a network model estimate and selects seed users adaptively, exploring the social network while improving the optimal policy optimistically. We establish $\widetilde O(\sqrt{T})$ regret bound for our algorithm. Empirical evaluations on synthetic network demonstrate the efficiency of our algorithm.

preprint2022arXiv

Tunable electronic properties and band alignments of MoSi$_2$N$_4$/GaN and MoSi$_2$N$_4$/ZnO van der Waals heterostructures

Van de Waals heterostructures (VDWH) is an emerging strategy to engineer the electronic properties of two-dimensional (2D) material systems. Motivated by the recent discovery of MoSi$_2$N$_4$ - a synthetic septuple-layered 2D semiconductor with exceptional mechanical and electronic properties, we investigate the synergy of \ce{MoSi2N4} with wide band gap (WBG) 2D monolayers of GaN and ZnO using first-principle calculations. We find that MoSi$_2$N$_4$/GaN is a direct band gap Type-I VDWH while MoSi$_2$N$_4$/ZnO is an indirect band gap Type-II VDWH. Intriguingly, by applying an electric field or mechanical strain along the out-of-plane direction, the band structures of MoSi$_2$N$_4$/GaN and MoSi$_2$N$_4$/ZnO can be substantially modified, exhibiting rich transitional behaviors, such as the Type-I-to-Type-II band alignment and the direct-to-indirect band gap transitions. These findings reveal the potentials of MoSi$_2$N$_4$-based WBG VDWH as a tunable hybrid materials with enormous design flexibility in ultracompact optoelectronic applications.

preprint2020arXiv

Electrical Contact between an Ultrathin Topological Dirac Semimetal and a Two-Dimensional Material

Ultrathin films of topological Dirac semimetal, Na$_3$Bi, has recently been revealed as an unusual electronic materials with field-tunable topological phases. Here we investigate the electronic and transport properties of ultrathin Na$_3$Bi as an electrical contact to two-dimensional (2D) metal, i.e. graphene, and 2D semiconductor, i.e. MoS$_2$ and WS$_2$ monolayers. Using combined first-principle density functional theory and nonequilibrium Green's function simulation, we show that the electrical coupling between Na$_3$Bi bilayer thin film and graphene results in a notable interlayer charge transfer, thus inducing sizable $n$-type doping in the Na$_3$Bi/graphene heterostructures. In the case of MoS$_2$ and WS$_2$ monolayers, the lateral Schottky transport barrier is significantly lower than many commonly studied bulk metals, thus unraveling Na$_3$Bi bilayer as a high-efficiency electrical contact material for 2D semiconductors. These findings opens up an avenue of utilizing topological semimetal thin film as electrical contact to 2D materials, and further expands the family of 2D heterostructure devices into the realm of topological materials.

preprint2020arXiv

Estimation-Action-Reflection: Towards Deep Interaction Between Conversational and Recommender Systems

Recommender systems are embracing conversational technologies to obtain user preferences dynamically, and to overcome inherent limitations of their static models. A successful Conversational Recommender System (CRS) requires proper handling of interactions between conversation and recommendation. We argue that three fundamental problems need to be solved: 1) what questions to ask regarding item attributes, 2) when to recommend items, and 3) how to adapt to the users' online feedback. To the best of our knowledge, there lacks a unified framework that addresses these problems. In this work, we fill this missing interaction framework gap by proposing a new CRS framework named Estimation-Action-Reflection, or EAR, which consists of three stages to better converse with users. (1) Estimation, which builds predictive models to estimate user preference on both items and item attributes; (2) Action, which learns a dialogue policy to determine whether to ask attributes or recommend items, based on Estimation stage and conversation history; and (3) Reflection, which updates the recommender model when a user rejects the recommendations made by the Action stage. We present two conversation scenarios on binary and enumerated questions, and conduct extensive experiments on two datasets from Yelp and LastFM, for each scenario, respectively. Our experiments demonstrate significant improvements over the state-of-the-art method CRM [32], corresponding to fewer conversation turns and a higher level of recommendation hits.

preprint2020arXiv

Fast Distributed Bandits for Online Recommendation Systems

Contextual bandit algorithms are commonly used in recommender systems, where content popularity can change rapidly. These algorithms continuously learn latent mappings between users and items, based on contexts associated with them both. Recent recommendation algorithms that learn clustering or social structures between users have exhibited higher recommendation accuracy. However, as the number of users and items in the environment increases, the time required to generate recommendations deteriorates significantly. As a result, these cannot be deployed in practice. The state-of-the-art distributed bandit algorithm - DCCB - relies on a peer-to-peer net-work to share information among distributed workers. However, this approach does not scale well with the increasing number of users. Furthermore, it suffers from slow discovery of clusters, resulting in accuracy degradation. To address the above issues, this paper proposes a novel distributed bandit-based algorithm called DistCLUB. This algorithm lazily creates clusters in a distributed manner, and dramatically reduces the network data sharing requirement, achieving high scalability. Additionally, DistCLUB finds clusters much faster, achieving better accuracy than the state-of-the-art algorithm. Evaluation over both real-world benchmarks and synthetic datasets shows that DistCLUB is on average 8.87x faster than DCCB, and achieves 14.5% higher normalized prediction performance.

preprint2020arXiv

Unifying Clustered and Non-stationary Bandits

Non-stationary bandits and online clustering of bandits lift the restrictive assumptions in contextual bandits and provide solutions to many important real-world scenarios. Though the essence in solving these two problems overlaps considerably, they have been studied independently. In this paper, we connect these two strands of bandit research under the notion of test of homogeneity, which seamlessly addresses change detection for non-stationary bandit and cluster identification for online clustering of bandit in a unified solution framework. Rigorous regret analysis and extensive empirical evaluations demonstrate the value of our proposed solution, especially its flexibility in handling various environment assumptions.