Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems

Large language model (LLM)-based multi-agent systems have shown strong potential on complex tasks through agent specialization, tool use, and collaborative reasoning. However, most automated multi-agent system design methods still follow a one-shot paradigm: a workflow is optimized or selected before execution and then reused unchanged throughout the task. This static coordination strategy is ill-suited for long-horizon tasks whose subgoals, intermediate evidence, and information needs evolve over multiple execution stages. We propose EvoMAS, a framework for execution-time multi-agent workflow construction. EvoMAS formulates workflow construction as a meta-level sequential decision problem along a single task trajectory. At each stage, it constructs an explicit task state through a Planner-Evaluator-Updater pipeline and uses a learned Workflow Adapter to instantiate a stage-specific layered workflow from a fixed pool of candidate agents. The adapter is trained with policy gradients using sparse, verifiable terminal task success as the main supervision signal, while evaluator-based process reward is analyzed separately under very-hard sparse-reward settings. Experiments on GAIA, HLE, and DeepResearcher show that EvoMAS outperforms single-agent baselines and recent automated multi-agent workflow design methods. Our analyses further show that explicit task-state construction and learned workflow adaptation provide complementary benefits. Additional results indicate that process reward is most useful when terminal success is extremely sparse, and qualitative case studies illustrate that EvoMAS adapts agent coordination as the task state evolves.

preprint2024arXiv

LLM-Powered Hierarchical Language Agent for Real-time Human-AI Coordination

AI agents powered by Large Language Models (LLMs) have made significant advances, enabling them to assist humans in diverse complex tasks and leading to a revolution in human-AI coordination. LLM-powered agents typically require invoking LLM APIs and employing artificially designed complex prompts, which results in high inference latency. While this paradigm works well in scenarios with minimal interactive demands, such as code generation, it is unsuitable for highly interactive and real-time applications, such as gaming. Traditional gaming AI often employs small models or reactive policies, enabling fast inference but offering limited task completion and interaction abilities. In this work, we consider Overcooked as our testbed where players could communicate with natural language and cooperate to serve orders. We propose a Hierarchical Language Agent (HLA) for human-AI coordination that provides both strong reasoning abilities while keeping real-time execution. In particular, HLA adopts a hierarchical framework and comprises three modules: a proficient LLM, referred to as Slow Mind, for intention reasoning and language interaction, a lightweight LLM, referred to as Fast Mind, for generating macro actions, and a reactive policy, referred to as Executor, for transforming macro actions into atomic actions. Human studies show that HLA outperforms other baseline agents, including slow-mind-only agents and fast-mind-only agents, with stronger cooperation abilities, faster responses, and more consistent language communications.

preprint2024arXiv

Policy-regularized Offline Multi-objective Reinforcement Learning

In this paper, we aim to utilize only offline trajectory data to train a policy for multi-objective RL. We extend the offline policy-regularized method, a widely-adopted approach for single-objective offline RL problems, into the multi-objective setting in order to achieve the above goal. However, such methods face a new challenge in offline MORL settings, namely the preference-inconsistent demonstration problem. We propose two solutions to this problem: 1) filtering out preference-inconsistent demonstrations via approximating behavior preferences, and 2) adopting regularization techniques with high policy expressiveness. Moreover, we integrate the preference-conditioned scalarized update method into policy-regularized offline RL, in order to simultaneously learn a set of policies using a single policy network, thus reducing the computational cost induced by the training of a large number of individual policies for various preferences. Finally, we introduce Regularization Weight Adaptation to dynamically determine appropriate regularization weights for arbitrary target preferences during deployment. Empirical results on various multi-objective datasets demonstrate the capability of our approach in solving offline MORL problems.

preprint2023arXiv

Plan To Predict: Learning an Uncertainty-Foreseeing Model for Model-Based Reinforcement Learning

In Model-based Reinforcement Learning (MBRL), model learning is critical since an inaccurate model can bias policy learning via generating misleading samples. However, learning an accurate model can be difficult since the policy is continually updated and the induced distribution over visited states used for model learning shifts accordingly. Prior methods alleviate this issue by quantifying the uncertainty of model-generated samples. However, these methods only quantify the uncertainty passively after the samples were generated, rather than foreseeing the uncertainty before model trajectories fall into those highly uncertain regions. The resulting low-quality samples can induce unstable learning targets and hinder the optimization of the policy. Moreover, while being learned to minimize one-step prediction errors, the model is generally used to predict for multiple steps, leading to a mismatch between the objectives of model learning and model usage. To this end, we propose \emph{Plan To Predict} (P2P), an MBRL framework that treats the model rollout process as a sequential decision making problem by reversely considering the model as a decision maker and the current policy as the dynamics. In this way, the model can quickly adapt to the current policy and foresee the multi-step future uncertainty when generating trajectories. Theoretically, we show that the performance of P2P can be guaranteed by approximately optimizing a lower bound of the true environment return. Empirical results demonstrate that P2P achieves state-of-the-art performance on several challenging benchmark tasks.

preprint2022arXiv

Constrained Sequence-to-Tree Generation for Hierarchical Text Classification

Hierarchical Text Classification (HTC) is a challenging task where a document can be assigned to multiple hierarchically structured categories within a taxonomy. The majority of prior studies consider HTC as a flat multi-label classification problem, which inevitably leads to "label inconsistency" problem. In this paper, we formulate HTC as a sequence generation task and introduce a sequence-to-tree framework (Seq2Tree) for modeling the hierarchical label structure. Moreover, we design a constrained decoding strategy with dynamic vocabulary to secure the label consistency of the results. Compared with previous works, the proposed approach achieves significant and consistent improvements on three benchmark datasets.

preprint2022arXiv

ESCM$^2$: Entire Space Counterfactual Multi-Task Model for Post-Click Conversion Rate Estimation

Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage the sequential pattern of user actions, i.e. $impression\rightarrow click \rightarrow conversion$ to address data sparsity issue. However, they still fail to ensure the unbiasedness of CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB), where the estimated CVR of ESMM is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where there is a risk that the ESMM overlooks the causality from click to conversion. To this end, we devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$^2$), which employs a counterfactual risk miminizer as a regularizer in ESMM to address both IEB and PIP issues simultaneously. Extensive experiments on offline datasets and online environments demonstrate that our proposed ESCM$^2$ can largely mitigate the inherent IEB and PIP issues and achieve better performance than baseline models.

preprint2022arXiv

High-harmonic generation approaching the quantum critical point of strongly correlated systems

By employing the exact diagonalization method, we investigate the high-harmonic generation (HHG) of the correlated systems under the strong laser irradiation. For the extended Hubbard model on a periodic chain, HHG close to the quantum critical point (QCP) is more significant compared to two neighboring gapped phases (i.e., charge-density-wave and spin-density wave states), especially in low-frequencies. We confirm that the systems in the vicinity of the QCP are supersensitive to the external field and more optical-transition channels via excited states are responsible for HHG. This feature holds the potential of obtaining high-efficiency harmonics by making use of materials approaching to QCP. Based on two-dimensional Haldane model, we further propose that the even- or odd-order components of generated harmonics can be promisingly regarded as spectral signals to distinguish the topologically ordered phases from locally ordered ones. Our findings in this work pave the way to achieve ultrafast light source from HHG in strongly correlated materials and to study quantum phase transition by nonlinear optics in strong laser fields.

preprint2022arXiv

Malliavin calculus and its application to robust optimal portfolio for an insider

Insider information and model uncertainty are two unavoidable problems for the portfolio selection theory in reality. This paper studies the robust optimal portfolio strategy for an investor who owns general insider information under model uncertainty. On the aspect of the mathematical theory, we improve some properties of the forward integral and use Malliavin calculus to derive the anticipating Itô formula . Then we use forward integrals to formulate the insider-trading problem with model uncertainty. We give the half characterization of the robust optimal portfolio and obtain the semimartingale decomposition of the driving noise $W$ with respect to the insider information filtration, which turns the problem turns to the nonanticipative stochastic differential game problem. We give the total characterization by the stochastic maximum principle. When considering two typical situations where the insider is `small' and `large', we give the corresponding BSDEs to characterize the robust optimal portfolio strategy, and derive the closed form of the portfolio and the value function in the case of the small insider by the Donsker $δ$ functional. We present the simulation result and give the economic analysis of optimal strategies under different situations.

preprint2022arXiv

Note on surface growth approach for bulk reconstruction

In a recent paper, a novel surface growth approach for reconstructing bulk geometry and matter fields was proposed, it was shown that this picture can be explicitly realized by the one-shot entanglement distillation tensor network and the surface state correspondence. In the present paper, we give direct analysis for the growth of the bulk minimal surfaces in asymptotically AdS 3 spacetime and show that bulk geometry can be efficiently reproduced in this way, which provides further support for the surface growth approach in entanglement wedge reconstruction.

preprint2022arXiv

Passive Motion Detection via mmWave Communication System

In this paper, an integrated passive sensing and communication system working in 60 GHz band is elaborated, and the sensing performance is investigated in an application of hand gesture recognition. Specifically, in this integrated system, there are two radio frequency (RF) chains at the receiver and one at the transmitter. Each RF chain is connected with one phased array for analog beamforming. To facilitate simultaneous sensing and communication, the transmitter delivers one stream of information-bearing signals via two beam lobes, one is aligned with the main signal propagation path and the other is directed to the sensing target. Signals from the two lobes are received by the two RF chains at the receiver, respectively. By cross ambiguity coherent processing, the time-Doppler spectrograms of hand gestures can be obtained. Relying on the passive sensing system, a dataset of received signals, where three types of hand gestures are sensed, is collected by using Line-of-Sight (LoS) and Non-Line-of-Sight (NLoS) paths as the reference channel respectively. Then a neural network is trained by the dataset for motion detection. It is shown that the classification accuracy rate is high as long as sufficient sensing time is assured. Finally, an empirical model characterizing the relation between the classification accuracy and sensing duration is derived analytically.

preprint2022arXiv

Quasi-Monte Carlo-Based Conditional Malliavin Method for Continuous-Time Asian Option Greeks

Although many methods for computing the Greeks of discrete-time Asian options are proposed, few methods to calculate the Greeks of continuous-time Asian options are known. In this paper, we develop an integration by parts formula in the multi-dimensional Malliavin calculus, and apply it to obtain the Greeks formulae for continuous-time Asian options in the multi-asset situation. We combine the Malliavin method with the quasi-Monte Carlo method to calculate the Greeks in simulation. We discuss the asymptotic convergence of simulation estimates for the continuous-time Asian option Greeks obtained by Malliavin derivatives. We propose to use the conditional quasi-Monte Carlo method to smooth Malliavin Greeks, and show that the calculation of conditional expectations analytically is viable for many types of Asian options. We prove that the new estimates for Greeks have good smoothness. For binary Asian options, Asian call options and up-and-out Asian call options, for instance, our estimates are infinitely times differentiable. We take the gradient principal component analysis method as a dimension reduction technique in simulation. Numerical experiments demonstrate the large efficiency improvement of the proposed method, especially for Asian options with discontinuous payoff functions.

preprint2022arXiv

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl.

preprint2022arXiv

Robust optimal investment and risk control for an insurer with general insider information

In this paper, we study the robust optimal investment and risk control problem for an insurer who owns the insider information about the financial market and the insurance market under model uncertainty. Both financial risky asset process and insurance risk process are assumed to be very general jump diffusion processes. The insider information is of the most general form rather than the initial enlargement type. We use the theory of forward integrals to give the first half characterization of the robust optimal strategy and transform the anticipating stochastic differential game problem into the nonanticipative stochastic differential game problem. Then we adopt the stochastic maximum principle to obtain the total characterization of the robust strategy. We discuss the two typical situations when the insurer is `small' and `large' by Malliavin calculus. For the `small' insurer, we obtain the closed-form solution in the continuous case and the half closed-form solution in the case with jumps. For the `large' insurer, we reduce the problem to the quadratic backward stochastic differential equation (BSDE) and obtain the closed-form solution in the continuous case without model uncertainty. We discuss some impacts of the model uncertainty, insider information and the `large' insurer on the optimal strategy.

preprint2022arXiv

Solid-like high harmonic generation from rotationally periodic systems

High harmonic generation (HHG) from crystals in strong laser fields has been understood by the band theory of solid, which is based on the periodic boundary condition (PBC) of translational invariant. For systems having PBC of rotational invariant, in principles an analogous Bloch theorem can be developed and applied. Taking a ring-type cluster of cyclo[18]carbon as a representative, we theoretically suggest a quasi-band model and study its HHG by solving time-dependent Liouville-von Neumann equation. Under the irradiation of circularly polarized laser, explicit selection rules for left-handed and right-handed harmonics are observed, while in linearly polarized laser field, cyclo[18]carbon exhibits solid-like HHG originated from intra-band oscillations and inter-band transitions, which in turn is promising to optically detect the symmetry and geometry of controversial structures. In a sense, this work presents a connection linking the high harmonics of gases and solids.

preprint2022arXiv

The Role of Shift Vector in High-Harmonic Generation from Non-Centrosymmetric Topological Insulators under Strong Laser Fields

As a promising avenue to obtain new extreme ultraviolet light source and detect electronic properties, high-harmonic generation (HHG) has been actively developed in both theory and experiment. In solids lacking inversion symmetry, when electrons undergo a nonadiabatic transition, a directional charge shift occurs and is characterized by shift vector, which measures the real-space shift of the photoexcited electron and hole. For the first time, we have revealed that shift vector plays prominent roles in the real-space tunneling mechanism of three-step model for electrons under strong laser fields. Since shift vector is determined by the topological properties of related wave functions, we expect HHG with its contribution can provide direct knowledge on the band topology in noncentrosymmetric topological insulators (TIs). In both Kane-Mele model and realistic material BiTeI, we have found that the shift vector reverses when band inversion happens during the topological phase transition between normal and topological insulators. Under oscillating strong laser fields, the reversal of shift vector leads to completely opposite radiation time of high-order harmonics. This makes HHG a feasible all-optical strong-field method to directly identify the band inversion in non-centrosymmetric TIs.

preprint2020arXiv

Electron-Backscattering-Assisted High Harmonic Generation from Bilayer Nanostructures

In the framework of time-dependent density functional theory, we obtain high-order harmonics of photon energies up to 10 Up from bilayer crystals with an interlayer spacing d = 70 Å. At grazing incidence, a clear double-plateau structure is observed in the harmonic spectrum. The photon energy of the second plateau far beyond atomic-like harmonics can be well explained by the inclusion of backscattering of ionized electrons. Ab initio simulations reveal that the cutoff of the second plateau is continuously extended with an increasing d. Our classical calculations predict that the maximum electronic kinetic energy is linearly dependent on d over a wide range. Moreover, the harmonic yield in the second plateau is significantly enhanced by increases in the wavelength of the driving laser. Owing to the confined spreading of the electronic wave packet, a beneficial wavelength scaling of λ2.85 is obtained. This study therefore establishes a novel and efficient way of producing high-energy light source based on layered nanostructures.

preprint2020arXiv

Multi-IF : An Approach to Anomaly Detection in Self-Driving Systems

Autonomous driving vehicles (ADVs) are implemented with rich software functions and equipped with many sensors, which in turn brings broad attack surface. Moreover, the execution environment of ADVs is often open and complex. Hence, ADVs are always at risk of safety and security threats. This paper proposes a fast method called Multi-IF, using multiple invocation features of system calls to detect anomalies in self-driving systems. Since self-driving functions take most of the computation resources and upgrade frequently, Multi-IF is designed to work under such resource constraints and support frequent updates. Given the collected sequences of system calls, the combination of different syntax patterns is used to analyze and construct feature vectors of those sequences. By taking the feature vectors as inputs, one-class support vector machine is adopted to determine whether the current sequence of system calls is abnormal, which is trained with the feature vectors from the normal sequences. The evaluations on both simulated and real data prove that the proposed method is effective in identifying the abnormal behavior after minutes of feature extraction and training. Further comparisons with the existing methods on the ADFA-LD data set also validate that the proposed approach achieves a higher accuracy with less time overhead.

preprint2020arXiv

Reinforcement Learning in Healthcare: A Survey

As a subfield of machine learning, reinforcement learning (RL) aims at empowering one's capabilities in behavioural decision making by using interaction experience with the world and an evaluative feedback. Unlike traditional supervised learning methods that usually rely on one-shot, exhaustive and supervised reward signals, RL tackles with sequential decision making problems with sampled, evaluative and delayed feedback simultaneously. Such distinctive features make RL technique a suitable candidate for developing powerful solutions in a variety of healthcare domains, where diagnosing decisions or treatment regimes are usually characterized by a prolonged and sequential procedure. This survey discusses the broad applications of RL techniques in healthcare domains, in order to provide the research community with systematic understanding of theoretical foundations, enabling methods and techniques, existing challenges, and new insights of this emerging paradigm. By first briefly examining theoretical foundations and key techniques in RL research from efficient and representational directions, we then provide an overview of RL applications in healthcare domains ranging from dynamic treatment regimes in chronic diseases and critical care, automated medical diagnosis from both unstructured and structured clinical data, as well as many other control or scheduling domains that have infiltrated many aspects of a healthcare system. Finally, we summarize the challenges and open issues in current research, and point out some potential solutions and directions for future research.

preprint2020arXiv

Super-resolution single-photon imaging at 8.2 kilometers

Single-photon light detection and ranging (LiDAR), offering single-photon sensitivity and picosecond time resolution, has been widely adopted for active imaging applications. Long-range active imaging is a great challenge, because the spatial resolution degrades significantly with the imaging range due to the diffraction limit of the optics, and only weak echo signal photons can return but mixed with a strong background noise. Here we propose and demonstrate a photon-efficient LiDAR approach that can achieve sub-Rayleigh resolution imaging over long ranges. This approach exploits fine sub-pixel scanning and a deconvolution algorithm tailored to this long-range application. Using this approach, we experimentally demonstrated active three-dimensional (3D) single-photon imaging by recognizing different postures of a mannequin model at a stand-off distance of 8.2 km in both daylight and night. The observed spatial (transversal) resolution is about 5.5 cm at 8.2 km, which is about twice of the system's resolution. This also beats the optical system's Rayleigh criterion. The results are valuable for geosciences and target recognition over long ranges.

preprint2019arXiv

Single-photon computational 3D imaging at 45 km

Long-range active imaging has a variety of applications in remote sensing and target recognition. Single-photon LiDAR (light detection and ranging) offers single-photon sensitivity and picosecond timing resolution, which is desirable for high-precision three-dimensional (3D) imaging over long distances. Despite important progress, further extending the imaging range presents enormous challenges because only weak echo photons return and are mixed with strong noise. Herein, we tackled these challenges by constructing a high-efficiency, low-noise confocal single-photon LiDAR system, and developing a long-range-tailored computational algorithm that provides high photon efficiency and super-resolution in the transverse domain. Using this technique, we experimentally demonstrated active single-photon 3D-imaging at a distance of up to 45 km in an urban environment, with a low return-signal level of $\sim$1 photon per pixel. Our system is feasible for imaging at a few hundreds of kilometers by refining the setup, and thus represents a significant milestone towards rapid, low-power, and high-resolution LiDAR over extra-long ranges.