Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2026arXiv

Cavity Multimodes as an Array for High-Frequency Gravitational Waves

Microwave cavities operated in the presence of a background magnetic field provide a promising avenue for detecting high-frequency gravitational waves (HFGWs). We demonstrate for the first time that the distinct antenna patterns of multiple electromagnetic modes within a single cavity enable localization and reconstruction of key properties of an incoming HFGW signal, including its polarization ratio and frequency drift rate. Using a 9-cell cavity commonly employed in particle accelerators as a representative example, we analyze the time-domain response of 18 nearly degenerate modes, which can be sequentially excited by a frequency-drifting signal. The sensitivity is further enhanced by the number of available modes, in close analogy to the scaling achieved by a network of independent detectors, enabling sensitivity to astrophysically plausible binary sources.

preprint2026arXiv

Look Before You Leap: Autonomous Exploration for LLM Agents

Large language model based agents often fail in unfamiliar environments due to premature exploitation: a tendency to act on prior knowledge before acquiring sufficient environment-specific information. We identify autonomous exploration as a critical yet underexplored capability for building adaptive agents. To formalize and quantify this capability, we introduce Exploration Checkpoint Coverage, a verifiable metric that measures how broadly an agent discovers key states, objects, and affordances. Our systematic evaluation reveals that agents trained with standard task-oriented reinforcement learning consistently exhibit narrow and repetitive behaviors that impede downstream performance. To address this limitation, we develop a training strategy that interleaves task-execution rollouts and exploration rollouts, with each type of rollout optimized by its corresponding verifiable reward. Building on this training strategy, we propose the Explore-then-Act paradigm, which decouples information-gathering from task execution: agents first utilize an interaction budget to acquire grounded environmental knowledge, then leverage it for task resolution. Our results demonstrate that learning to systematically explore is imperative for building generalizable and real-world-ready agents.

preprint2026arXiv

MAP-Law: Coverage-Driven Retrieval Control for Multi-Turn Legal Consultation

Legal consultation is a high-stakes, knowledge-intensive task that requires agents to identify relevant legal issues, retrieve authoritative support, and determine when evidence is sufficient for a recommendation. Although retrieval-augmented generation has improved grounding in legal question answering, many multi-turn legal agents still rely on fixed retrieval depth or coarse heuristic control. This often leads to either insufficient support for key legal elements or excessive retrieval that increases context burden and weakens answer focus. We propose MAP-Law, a coverage-driven framework for retrieval control in multi-turn legal consultation. MAP-Law models consultation as a controlled retrieval process over a joint structured state consisting of issue nodes, legal element nodes, and evidence nodes. After each retrieval round, the agent computes Element Coverage, Evidence Coverage, and Marginal Gain, and uses these signals to decide whether to continue retrieval, redirect the search, or generate the final response. In this way, MAP-Law turns stopping from a fixed hyperparameter into an interpretable and auditable decision aligned with legal argumentative structure. Experiments on a self-constructed dataset of 50 cases across eight labor-law scenarios show that MAP-Law with DeepSeek as the action selector achieves an Element Coverage of 0.860 using only 2.9 retrieval rounds and 5.8 evidence pieces on average. Compared with a fixed seven-round baseline, it reduces evidence volume by over 80% and retrieval rounds by 58%. Ablation results further confirm the independent contributions of coverage-driven stopping, joint graph representation, and LLM-based action selection.

preprint2026arXiv

MAP: A Map-then-Act Paradigm for Long-Horizon Interactive Agent Reasoning

Current interactive LLM agents rely on goal-conditioned stepwise planning, where environmental understanding is acquired reactively during execution rather than established beforehand. This temporal inversion leads to Delayed Environmental Perception: agents must infer environmental constraints through trial-and-error, resulting in an Epistemic Bottleneck that traps them in inefficient failure cycles. Inspired by human affordance perception and cognitive map theory, we propose the Map-then-Act Paradigm (MAP), a plug-and-play framework that shifts environment understanding before execution. MAP consists of three stages: (1) Global Exploration, acquiring environment-general priors; (2) Task-Specific Mapping, constructing a structured cognitive map; and (3) Knowledge-Augmented Execution, solving tasks grounded on the map. Experiments show consistent gains across benchmarks and LLMs. On ARC-AGI-3, MAP enables frontier models to surpass near-zero baseline performance in 22 of 25 game environments. We further introduce MAP-2K, a dataset of map-then-act trajectories, and show that training on it outperforms expert execution traces, suggesting that understanding environments is more fundamental than imitation.

preprint2026arXiv

VeriContest: A Competitive-Programming Benchmark for Verifiable Code Generation

Large language models can generate useful code from natural language, but their outputs come without correctness guarantees. Verifiable code generation offers a path beyond testing by requiring models to produce not only executable code, but also formal specifications and machine-checkable proofs. Progress in this direction, however, is difficult to measure: existing benchmarks are often small, focus on only one part of the pipeline, lack ground-truth proofs or rigorous specification validation, or target verification settings far from mainstream software development. We present VeriContest, a benchmark of 946 competitive-programming problems from LeetCode and Codeforces for verifiable code generation in Rust with Verus. Each problem pairs a natural language description with expert-validated formal specifications, judge-accepted Rust code, Verus-checked proofs, and positive and negative test suites. VeriContest is constructed through a three-phase pipeline that scales from manually verified seed problems to semi-automated expansion with human-in-the-loop review. To further strengthen benchmark quality, we use testing as an additional quality-assurance layer for validating postcondition completeness. VeriContest supports isolated and compositional evaluation of specification generation, code generation, proof generation, and end-to-end verified program synthesis. Evaluating ten state-of-the-art models reveals a sharp gap between coding ability and verifiable code generation: the strongest model reaches 92.18% on natural-language-to-code generation, but only 48.31% on specification generation, 13.95% on proof generation, and 5.29% end-to-end. These results identify proof and specification generation as the central bottlenecks for models and establish VeriContest as a rigorous platform for measuring and training future systems that generate code with machine-checkable correctness.

preprint2025arXiv

AgentTutor: Empowering Personalized Learning with Multi-Turn Interactive Teaching in Intelligent Education Systems

The rapid advancement of large-scale language models (LLMs) has shown their potential to transform intelligent education systems (IESs) through automated teaching and learning support applications. However, current IESs often rely on single-turn static question-answering, which fails to assess learners' cognitive levels, cannot adjust teaching strategies based on real-time feedback, and is limited to providing simple one-off responses. To address these issues, we introduce AgentTutor, a multi-turn interactive intelligent education system to empower personalized learning. It features an LLM-powered generative multi-agent system and a learner-specific personalized learning profile environment that dynamically optimizes and delivers teaching strategies based on learners' learning status, personalized goals, learning preferences, and multimodal study materials. It includes five key modules: curriculum decomposition, learner assessment, dynamic strategy, teaching reflection, and knowledge & experience memory. We conducted extensive experiments on multiple benchmark datasets, AgentTutor significantly enhances learners' performance while demonstrating strong effectiveness in multi-turn interactions and competitiveness in teaching quality among other baselines.

preprint2023arXiv

Teacher Forcing Recovers Reward Functions for Text Generation

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

preprint2022arXiv

Attack detection based on machine learning algorithms for different variants of Spectre attacks and different Meltdown attack implementations

To improve the overall performance of processors, computer architects use various performance optimization techniques in modern processors, such as speculative execution, branch prediction, and chaotic execution. Both now and in the future, these optimization techniques are critical for improving the execution speed of processor instructions. However, researchers have discovered that these techniques introduce hidden inherent security flaws, such as meltdown and ghost attacks in recent years. They exploit techniques such as chaotic execution or speculative execution combined with cache-based side-channel attacks to leak protected data. The impact of these vulnerabilities is enormous because they are prevalent in existing or future processors. However, until today, meltdown and ghost have not been effectively addressed, but instead, multiple attack variants and different attack implementations have evolved from them. This paper proposes to optimize four different hardware performance events through feature selection and use machine learning algorithms to build a real-time detection mechanism for Spectre v1,v2,v4, and different implementations of meltdown attacks, ultimately achieving an accuracy rate of over 99\%. In order to verify the practicality of the attack detection model, this paper is tested with a variety of benign programs and different implementations of Spectre attacks different from the modeling process, and the absolute accuracy also exceeds 99\%, showing that this paper can cope with different attack variants and different implementations of the same attack that may occur daily.

preprint2022arXiv

Shifts in BCFW method for QED

We study the application of BCFW recursion relations to the QED processes $0\to e^- e^+ n γ$. Based on 6-point amplitudes (both MHVA and NMHVA) computed from Feynman diagrams in the Berends-Giele gauge, we conduct a comprehensive study on all different shifts. Then we propose a new shift (LLYZ shift) which can lead to the full amplitudes for these processes and can have some realistic computation advantages. We compare the number of terms and the independent amplitudes of this novel shift with a few typical shifts.

preprint2022arXiv

Stringent axion constraints with Event Horizon Telescope polarimetric measurements of M87$^\star$

The hitherto unprecedented angular resolution of the Event Horizon Telescope (EHT) has created exciting opportunities in the search for new physics. Recently, the linear polarization of radiation emitted near the supermassive black hole M87$^\star$ was measured on four separate days, precisely enabling tests of the existence of a dense axion cloud produced by a spinning black hole. The presence of an axion cloud leads to a frequency-independent oscillation in the electric vector position angle (EVPA) of this linear polarization. For a nearly face-on M87$^\star$, this oscillation in the EVPA appears as a propagating wave along the photon ring. In this paper, we leverage the azimuthal distribution of EVPA measured by the EHT to study the axion-photon coupling. We propose a novel differential analysis procedure to reduce the astrophysical background, and derive stringent constraints on the existence of axions in the previously unexplored mass window $\sim (10^{-21}-10^{-20})$~eV.

preprint2020arXiv

Data-driven surrogate modelling and benchmarking for process equipment

In chemical process engineering, surrogate models of complex systems are often necessary for tasks of domain exploration, sensitivity analysis of the design parameters, and optimization. A suite of computational fluid dynamics (CFD) simulations geared toward chemical process equipment modeling has been developed and validated with experimental results from the literature. Various regression-based active learning strategies are explored with these CFD simulators in-the-loop under the constraints of a limited function evaluation budget. Specifically, five different sampling strategies and five regression techniques are compared, considering a set of four test cases of industrial significance and varying complexity. Gaussian process regression was observed to have a consistently good performance for these applications. The present quantitative study outlines the pros and cons of the different available techniques and highlights the best practices for their adoption. The test cases and tools are available with an open-source license to ensure reproducibility and engage the wider research community in contributing to both the CFD models and developing and benchmarking new improved algorithms tailored to this field.

preprint2020arXiv

Finite-Blocklength and Error-Exponent Analyses for LDPC Codes in Point-to-Point and Multiple Access Communication

This paper applies error-exponent and dispersion-style analyses to derive finite-blocklength achievability bounds for low-density parity-check (LDPC) codes over the point-to-point channel (PPC) and multiple access channel (MAC). The error-exponent analysis applies Gallager's error exponent to bound achievable symmetrical and asymmetrical rates in the MAC. The dispersion-style analysis begins with a generalization of the random coding union (RCU) bound from random code ensembles with i.i.d. codewords to random code ensembles in which codewords may be statistically dependent; this generalization is useful since the codewords of random linear codes such as random LDPC codes are dependent. Application of the RCU bound yields improved finite-blocklength error bounds and asymptotic achievability results for i.i.d. random codes and new finite-blocklength error bounds and achievability results for LDPC codes. For discrete, memoryless channels, these results show that LDPC codes achieve first- and second-order performance that is optimal for the PPC and identical to the best-prior results for the MAC.