Source author record

Yang Liu

Yang Liu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

37works

33topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A deep learning approach for predicting multiple observables in Au+Au collisions at RHIC

We develop a neural network model, based on the processes of high-energy heavy-ion collisions, to study and predict several experimental observables in Au+Au collisions. We present a data-driven deep learning framework for predicting multiple bulk observables in Au+Au collisions at RHIC energies. A single neural network is trained exclusively on experimental measurements of charged-particle pseudorapidity density distributions, transverse-momentum spectra and elliptic flow coefficients over a broad range of collision energies and centralities. The network architecture is inspired by the stages of a heavy-ion collision, from the quark-gluon plasma to chemical and kinetic freeze-out, and employs locally connected hidden layers and a structured input design that encodes basic geometric and kinematic features of the system. We demonstrate that these physics-motivated choices significantly improve test performance compared to purely fully connected baselines. The trained model is then used to predict the above observables at collision energies not yet explored experimentally at RHIC, and the results are validated using the energy dependence of the total charged-particle multiplicity per participant pair as well as comparisons to a CLVisc hydrodynamic calculation with TRENTo initial conditions. Our findings indicate that such physics-guided neural networks can serve as efficient surrogates to fill critical data gaps at RHIC and to support further phenomenological studies of QGP properties.

preprint2026arXiv

A MINRES-based Linesearch Algorithm for Nonconvex Optimization with Non-positive Curvature Detection

We propose a MINRES-based Newton-type algorithm for solving unconstrained nonconvex optimization problems. Our approach uses the minimal residual method (MINRES), a well-known solver for indefinite symmetric linear systems, to compute descent directions that leverage second-order and non-positive curvature (NPC) information. Comprehensive asymptotic convergence properties are derived under standard assumptions. In particular, under the Kurdyka-Łojasiewicz inequality and a mild NPC-detectability condition, we prove that our algorithm can avoid strict saddle points and converge to second-order critical points. This is primarily achieved by integrating proper regularization techniques and forward linesearch mechanisms along NPC directions. Furthermore, fast local superlinear convergence to potentially non-isolated minima is established, when the local Polyak-Łojasiewicz condition is satisfied. Numerical experiments on the CUTEst test collection and on a deep auto-encoder problem illustrate the efficiency of the proposed method.

preprint2026arXiv

A Parity-Consistent Decomposition Method for the Weight Distribution of Pre-Transformed Polar Codes

This paper introduces an efficient algorithm based on the Parity-Consistent Decomposition (PCD) method to determine the WD of pre-transformed polar codes. First, to address the bit dependencies introduced by the pre-transformation matrix, we propose an iterative algorithm to construct an \emph{Expanded Information Set}. By expanding the information bits within this set into 0s and 1s, we eliminate the correlations among information bits, thereby enabling the recursive calculation of the Hamming weight distribution using the \emph{PCD method}. Second, to further reduce computational complexity, we establish the theory of equivalence classes for pre-transformed polar codes. Codes within the same equivalence class share an identical weight distribution but correspond to different \emph{Expanded Information Set} sizes. By selecting the pre-transformation matrix that minimizes the \emph{Expanded Information Set} size within an equivalence class, we optimize the computation process. Numerical results demonstrate that the proposed method significantly reduces computational complexity compared to existing deterministic algorithms.

preprint2026arXiv

AgentOrchestra: Orchestrating Multi-Agent Intelligence with the Tool-Environment-Agent(TEA) Protocol

Recent advances in LLM-based agent systems have shown promise in tackling complex, long-horizon tasks. However, existing LLM-based agentprotocols (e.g., A2A and MCP) under-specify cross-entity lifecycle and context management, version tracking, and ad-hoc environment integration, which in turn encourages fixed, monolithic agent compositions and brittle glue code. To address these limitations, we introduce the Tool-Environment-Agent (TEA) protocol, a unified abstraction that models environments, agents, and tools as first-class resources with explicit lifecycles and versioned interfaces. TEA provides a principled foundation for end-to-end lifecycle and version management, and for associating each run with its context and outputs across components, improving traceability and reproducibility. Moreover, TEA enables continual self-evolution of agent-associated components through a closed feedback loop, producing improved versions while supporting version selection and rollback. Building on TEA, we present AgentOrchestra, a hierarchical multi-agent framework in which a central planner orchestrates specialized sub-agents for web navigation, data analysis, and file operations, and supports continual adaptation by dynamically instantiating, retrieving, and refining tools online during execution. We evaluate AgentOrchestra on three challenging benchmarks, where it consistently outperforms strong baselines and achieves 89.04% on GAIA, establishing state-of-the-art performance to the best of our knowledge. Overall, our results provide evidence that TEA and hierarchical orchestration improve scalability and generality in multi-agent systems.

preprint2026arXiv

AliCPT Sensitivity to Cosmic Reheating

We present the first assessment of the Ali Cosmic Microwave Background Polarization Telescope's (AliCPT) sensitivity to the reheating epoch after cosmic inflation, based on its ability to detect primordial gravitational waves. We consider three models of inflation, an $α$-attractor T-model, RGI inflation and QCD-driven warm inflation. Assuming a fiducial value of $r=0.01$, we find that AliCPT-1, in its fully loaded focal plane detector configuration and combined with Planck, can provide measurements of the order of magnitude of the reheating temperature with an accuracy around $10\%$. For QCD-driven warm inflation this can be translated into a constraint on the inflaton coupling to gluons, which can be probed independently in axion search experiments. Our results constitute the first demonstration of AliCPT's ability to probe the initial temperature of the hot big bang and the microphysical parameter connecting cosmic inflation and particle physics.

preprint2026arXiv

BabyVision: Visual Reasoning Beyond Language

While humans develop core visual skills long before acquiring language, contemporary Multimodal LLMs (MLLMs) still rely heavily on linguistic priors to compensate for their fragile visual understanding. We uncovered a crucial fact: state-of-the-art MLLMs consistently fail on basic visual tasks that humans, even 3-year-olds, can solve effortlessly. To systematically investigate this gap, we introduce BabyVision, a benchmark designed to assess core visual abilities independent of linguistic knowledge for MLLMs. BabyVision spans a wide range of tasks, with 388 items divided into 22 subclasses across four key categories. Empirical results and human evaluation reveal that leading MLLMs perform significantly below human baselines. Gemini3-Pro-Preview scores 49.7, lagging behind 6-year-old humans and falling well behind the average adult score of 94.1. These results show despite excelling in knowledge-heavy evaluations, current MLLMs still lack fundamental visual primitives. Progress in BabyVision represents a step toward human-level visual perception and reasoning capabilities. We also explore solving visual reasoning with generation models by proposing BabyVision-Gen and automatic evaluation toolkit. Our code and benchmark data are released at https://github.com/UniPat-AI/BabyVision for reproduction.

preprint2026arXiv

Browse and Concentrate: Comprehending Multimodal Content via prior-LLM Context Fusion

With the bloom of Large Language Models (LLMs), Multimodal Large Language Models (MLLMs) that incorporate LLMs with pre-trained vision models have recently demonstrated impressive performance across diverse vision-language tasks. However, they fall short to comprehend context involving multiple images. A primary reason for this shortcoming is that the visual features for each images are encoded individually by frozen encoders before feeding into the LLM backbone, lacking awareness of other images and the multimodal instructions. We term this issue as prior-LLM modality isolation and propose a two phase paradigm, browse-and-concentrate, to enable in-depth multimodal context fusion prior to feeding the features into LLMs. This paradigm initially "browses" through the inputs for essential insights, and then revisits the inputs to "concentrate" on crucial details, guided by these insights, to achieve a more comprehensive understanding of the multimodal inputs. Additionally, we develop training strategies specifically to enhance the understanding of multi-image inputs. Our method markedly boosts the performance on 7 multi-image scenarios, contributing to increments on average accuracy by 2.13% and 7.60% against strong MLLMs baselines with 3B and 11B LLMs, respectively.

preprint2026arXiv

Controllable Financial Market Generation with Diffusion Guided Meta Agent

Generative modeling has transformed many fields, such as language and visual modeling, while its application in financial markets remains under-explored. As the minimal unit within a financial market is an order, order-flow modeling represents a fundamental generative financial task. However, current approaches often yield unsatisfactory fidelity in generating order flow, and their generation lacks controllability, thereby limiting their practical applications. In this paper, we formulate the challenge of controllable financial market generation, and propose a Diffusion Guided Meta Agent (DigMA) model to address it. Specifically, we employ a conditional diffusion model to capture the dynamics of the market state represented by time-evolving distribution parameters of the mid-price return rate and the order arrival rate, and we define a meta agent with financial economic priors to generate orders from the corresponding distributions. Extensive experimental results show that DigMA achieves superior controllability and generation fidelity. Moreover, we validate its effectiveness as a generative environment for downstream high-frequency trading tasks and its computational efficiency.

preprint2026arXiv

FLOP-Efficient Training: Early Stopping Based on Test-Time Compute Awareness

Scaling training compute, measured in FLOPs, has long been shown to improve the accuracy of large language models, yet training remains resource-intensive. Prior work shows that increasing test-time compute (TTC)-for example through iterative sampling-can allow smaller models to rival or surpass much larger ones at lower overall cost. We introduce TTC-aware training, where an intermediate checkpoint and a corresponding TTC configuration can together match or exceed the accuracy of a fully trained model while requiring substantially fewer training FLOPs. Building on this insight, we propose an early stopping algorithm that jointly selects a checkpoint and TTC configuration to minimize training compute without sacrificing accuracy. To make this practical, we develop an efficient TTC evaluation method that avoids exhaustive search, and we formalize a break-even bound that identifies when increased inference compute compensates for reduced training compute. Experiments demonstrate up to 92\% reductions in training FLOPs while maintaining and sometimes remarkably improving accuracy. These results highlight a new perspective for balancing training and inference compute in model development, enabling faster deployment cycles and more frequent model refreshes. Codes will be publicly released.

preprint2026arXiv

From Failure to Mastery: Generating Hard Samples for Tool-use Agents

The advancement of LLM agents with tool-use capabilities requires diverse and complex training corpora. Existing data generation methods, which predominantly follow a paradigm of random sampling and shallow generation, often yield simple and homogeneous trajectories that fail to capture complex, implicit logical dependencies. To bridge this gap, we introduce HardGen, an automatic agentic pipeline designed to generate hard tool-use training samples with verifiable reasoning. Firstly, HardGen establishes a dynamic API Graph built upon agent failure cases, from which it samples to synthesize hard traces. Secondly, these traces serve as conditional priors to guide the instantiation of modular, abstract advanced tools, which are subsequently leveraged to formulate hard queries. Finally, the advanced tools and hard queries enable the generation of verifiable complex Chain-of-Thought (CoT), with a closed-loop evaluation feedback steering the continuous refinement of the process. Extensive evaluations demonstrate that a 4B parameter model trained with our curated dataset achieves superior performance compared to several leading open-source and closed-source competitors (e.g., GPT-5.2, Gemini-3-Pro and Claude-Opus-4.5). Our code, models, and dataset will be open-sourced to facilitate future research.

preprint2026arXiv

GCR: Geometry-Consistent Routing for Task-Agnostic Continual Anomaly Detection

Feature-based anomaly detection is widely adopted in industrial inspection due to the strong representational power of large pre-trained vision encoders. While most existing methods focus on improving within-category anomaly scoring, practical deployments increasingly require task-agnostic operation under continual category expansion, where the category identity is unknown at test time. In this setting, overall performance is often dominated by expert selection, namely routing an input to an appropriate normality model before any head-specific scoring is applied. However, routing rules that compare head-specific anomaly scores across independently constructed heads are unreliable in practice, as score distributions can differ substantially across categories in scale and tail behavior. We propose GCR, a lightweight mixture-of-experts framework for stabilizing task-agnostic continual anomaly detection through geometry-consistent routing. GCR routes each test image directly in a shared frozen patch-embedding space by minimizing an accumulated nearest-prototype distance to category-specific prototype banks, and then computes anomaly maps only within the routed expert using a standard prototype-based scoring rule. By separating cross-head decision making from within-head anomaly scoring, GCR avoids cross-head score comparability issues without requiring end-to-end representation learning. Experiments on MVTec AD and VisA show that geometry-consistent routing substantially improves routing stability and mitigates continual performance collapse, achieving near-zero forgetting while maintaining competitive detection and localization performance. These results indicate that many failures previously attributed to representation forgetting can instead be explained by decision-rule instability in cross-head routing. Code is available at https://github.com/jw-chae/GCR

preprint2026arXiv

Hardwired-Neurons Language Processing Units as General-Purpose Cognitive Substrates

The rapid advancement of Large Language Models (LLMs) has established language as a core general-purpose cognitive substrate, driving the demand for specialized Language Processing Units (LPUs) tailored for LLM inference. To overcome the growing energy consumption of LLM inference systems, this paper proposes a Hardwired-Neurons Language Processing Unit (HNLPU), which physically hardwires LLM weight parameters into the computational fabric, achieving several orders of magnitude computational efficiency improvement by extreme specialization. However, a significant challenge still lies in the scale of modern LLMs. A straightforward hardwiring of gpt-oss 120 B would require fabricating photomask sets valued at over 6 billion dollars, rendering this straightforward solution economically impractical. Addressing this challenge, we propose the novel Metal-Embedding methodology. Instead of embedding weights in a 2D grid of silicon device cells, Metal-Embedding embeds weight parameters into the 3D topology of metal wires. This brings two benefits: (1) a 15x increase in density, and (2) 60 out of 70 photomask layers are homogeneous across chips, including all EUV photomasks. In total, Metal-Embedding reduced the photomask cost by 112x, bringing the Non-Recurring Engineering (NRE) cost of HNLPU into an economically viable range. Experimental results show that HNLPU achieved 249,960 tokens/s (5,555x/85x that of GPU/WSE), 36 tokens/J (1,047x/283x that of GPU/WSE), 13,232 mm2 total die area, $59.46 M-123.5 M estimated NRE at 5 nm technology. Analysis shows that HNLPU achieved 41.7-80.4x improvement in cost-effectiveness and 357x reduction in carbon footprint compared to OpenAI-scale H100 clusters, under an annual weight updating assumption.

preprint2026arXiv

Image-Text Knowledge Modeling for Unsupervised Multi-Scenario Person Re-Identification

We propose unsupervised multi-scenario (UMS) person re-identification (ReID) as a new task that expands ReID across diverse scenarios (cross-resolution, clothing change, etc.) within a single coherent framework. To tackle UMS-ReID, we introduce image-text knowledge modeling (ITKM) -- a three-stage framework that effectively exploits the representational power of vision-language models. We start with a pre-trained CLIP model with an image encoder and a text encoder. In Stage I, we introduce a scenario embedding in the image encoder and fine-tune the encoder to adaptively leverage knowledge from multiple scenarios. In Stage II, we optimize a set of learned text embeddings to associate with pseudo-labels from Stage I and introduce a multi-scenario separation loss to increase the divergence between inter-scenario text representations. In Stage III, we first introduce cluster-level and instance-level heterogeneous matching modules to obtain reliable heterogeneous positive pairs (e.g., a visible image and an infrared image of the same person) within each scenario. Next, we propose a dynamic text representation update strategy to maintain consistency between text and image supervision signals. Experimental results across multiple scenarios demonstrate the superiority and generalizability of ITKM; it not only outperforms existing scenario-specific methods but also enhances overall performance by integrating knowledge from multiple scenarios.

preprint2026arXiv

Investigating the Anisotropy of Dispersion Measure Contribution from the Galactic Halo by Using Fast Radio Bursts

We propose a data-driven approach to reconstruct the all-sky distribution of the dispersion measure contribution from the Galactic halo ($\mathrm{DM_{halo}}$) through a spherical harmonic expansion, enabling an investigation of its possible anisotropies. Based on the NE2001 model and using 92 localized and 574 unlocalized non-repeating fast radio bursts (FRBs) at Galactic latitudes $|b|>15^\circ$, we find a significant dipole anisotropy in $\mathrm{DM_{halo}}$, pointing toward $(l=130^\circ,\, b=+5^\circ)$ with a $1σ$ uncertainty of approximately $28^\circ$. The $\mathrm{DM_{halo}}$ value in this direction is $63\pm9~\mathrm{pc~cm^{-3}}$, exceeding the all-sky mean by about $2.6σ$. This result is not significantly affected by the choice of Galactic ISM models. Furthermore, even when using a refined sample of 62 localized FRBs (excluding CHIME detections, repeaters, and unlocalized events), the dipole anisotropic structure persists, with a direction of $(l=141^\circ,\, b=+51^\circ)$ and a larger 1$σ$ uncertainty of $\sim 44^\circ$. Model comparisons using the Akaike Information Criterion and Bayesian evidence yield consistent preferences, and together they suggest that current FRB data slightly favor the existence of a dipole structure in $\mathrm{DM_{halo}}$. If this feature is not a statistical fluctuation or systematic error, its physical origin requires further investigation. Future FRB samples with larger sizes and more complete sky coverage will be essential to confirm or refute this possible anisotropic structure.

preprint2026arXiv

Large Language Models for Limited Noisy Data: A Gravitational Wave Identification Study

This work investigates whether large language models (LLMs) offer advantages over traditional neural networks for astronomical data processing, in regimes with non-Gaussian, non-stationary noise and limited labeled samples. Gravitational wave observations provide an suitable test case, using only 90 LIGO events, finetuned LLMs achieve 97.4\% accuracy for identifying signals. Further experiments show that, in contrast to traditional networks that rely on large simulated datasets, additional simulated samples do not improve LLM performance, while scaling studies reveal predictable gains with increasing model size and dataset size. These results indicate that LLMs can extract discriminative structure directly from observational data and provide an efficient assessment for gravitational wave identification. The same strategy may extend to other astronomical domains with similar noise properties, such as radio or pulsar observations.

preprint2026arXiv

Laughlin pumping assisted by surface acoustic waves

The quantum Hall effect is a fascinating electrical transport phenomenon signified by precise quantization of Hall conductivity $σ_\mathrm{xy}$ and vanishing longitudinal conductivity $σ_\mathrm{xx}$. Laughlin proposed an elegant explanation in which adiabatic insertion of a flux tube pumps charge through the system. This analysis unveils the fundamental role of gauge invariance and provides a compelling argument about the fractional charge of fractional quantum Hall states. While it has been used extensively as a theoretical tool, a quantitative experimental investigation is lacking despite multiple attempts. Here we report successful realizations of Laughlin pumping in several integer and fractional quantum Hall states. One essential technical innovation is using surface acoustic waves to periodically clear the charges accumulated during the pumping process. Magnetic fluxes are inserted at a constant rate so there is no need to perform complicated data fitting. Furthermore, our setting can reliably extract $σ_\mathrm{xx}$ that is several orders of magnitude lower than the limit of conventional techniques. Effective energy gaps can be deduced from the temperature dependence of $σ_\mathrm{xx}$, which are drastically different from those provided by conventional transport data. This work not only brings a famous gedanken experiment to reality but also serves as a portal for many future investigations.

preprint2026arXiv

Learn Like Humans: Use Meta-cognitive Reflection for Efficient Self-Improvement

While Large Language Models (LLMs) enable complex autonomous behavior, current agents remain constrained by static, human-designed prompts that limit adaptability. Existing self-improving frameworks attempt to bridge this gap but typically rely on inefficient, multi-turn recursive loops that incur high computational costs. To address this, we propose Metacognitive Agent Reflective Self-improvement (MARS), a framework that achieves efficient self-evolution within a single recurrence cycle. Inspired by educational psychology, MARS mimics human learning by integrating principle-based reflection (abstracting normative rules to avoid errors) and procedural reflection (deriving step-by-step strategies for success). By synthesizing these insights into optimized instructions, MARS allows agents to systematically refine their reasoning logic without continuous online feedback. Extensive experiments on six benchmarks demonstrate that MARS outperforms state-of-the-art self-evolving systems while significantly reducing computational overhead.

preprint2026arXiv

LIME: Link-based user-item Interaction Modeling with decoupled xor attention for Efficient test time scaling

Scaling large recommendation systems requires advancing three major frontiers: processing longer user histories, expanding candidate sets, and increasing model capacity. While promising, transformers' computational cost scales quadratically with the user sequence length and linearly with the number of candidates. This trade-off makes it prohibitively expensive to expand candidate sets or increase sequence length at inference, despite the significant performance improvements. We introduce \textbf{LIME}, a novel architecture that resolves this trade-off. Through two key innovations, LIME fundamentally reduces computational complexity. First, low-rank ``link embeddings" enable pre-computation of attention weights by decoupling user and candidate interactions, making the inference cost nearly independent of candidate set size. Second, a linear attention mechanism, \textbf{LIME-XOR}, reduces the complexity with respect to user sequence length from quadratic ($O(N^2)$) to linear ($O(N)$). Experiments on public and industrial datasets show LIME achieves near-parity with state-of-the-art transformers but with a 10$\times$ inference speedup on large candidate sets or long sequence lengths. When tested on a major recommendation platform, LIME improved user engagement while maintaining minimal inference costs with respect to candidate set size and user history length, establishing a new paradigm for efficient and expressive recommendation systems.

preprint2026arXiv

Mitigating Measurement Crosstalk via Pulse Shaping

Quantum error correction protocols require rapid and repeated qubit measurements. While multiplexed readout in superconducting quantum systems improves efficiency, fast probe pulses introduce spectral broadening, leading to signal leakage into neighboring readout resonators. This crosstalk results in qubit dephasing and degraded readout fidelity. Here, we introduce a pulse shaping technique inspired by the derivative removal by adiabatic gate (DRAG) protocol to suppress measurement crosstalk during fast readout. By engineering a spectral notch at neighboring resonator frequencies, the method effectively mitigates spurious signal interference. Our approach integrates seamlessly with existing readout architectures, enabling fast, low-crosstalk multiplexed measurements without additional hardware overhead - a critical advancement for scalable quantum computing.

preprint2026arXiv

Multigap nodeless superconductivity in Dirac semimetal PdTe

PdTe has recently been reported to be a type-II Dirac semimetal while a bulk nodal and surface nodeless superconductivity (SC) has been claimed to coexist. In this work, we applied point-contact spectroscopy (PCS) method to systematically study the superconducting gap in PdTe single crystals with a SC transition temperature $T_{c}=4.3$ K. The obtained differential conductance curves show a common deviation from a single-gap superconducting behavior and can be better fitted by a two-gap Blonder-Tinkham-Klapwijk model, suggesting the larger gap $Δ_{L}$ with $2Δ_{L}$=3.7 $k_{B}T_{c}$ and the smaller gap $Δ_S$ yielding $2Δ_{S}$=1.1-2.2 $k_{B}T_{c}$ with a weak interband scattering. The variations of conductance spectra among different contacts are proposed to be caused by the anisotropy of Fermi surface topology associated with different gaps.

preprint2026arXiv

NEWTON: Agentic Planning for Physically Grounded Video Generation

Video generation models produce visually compelling results but systematically violate physical commonsense -- on VideoPhy-2, the best model achieves only 32.6% joint accuracy. We identify a specification bottleneck: text prompts are lossy compression of the physical world, omitting the parameters that fully determine dynamics, and no amount of model scaling can recover what was never specified. From this diagnosis we derive three properties that physics conditioning must satisfy -- sufficiency, dynamism, and verifiability -- and show that no existing approach satisfies all three. We present NEWTON, in which video generation is demoted from the system output to one action inside an agent's toolbox: a learned planner orchestrates physics-aware tools (keyframe generation, scientific computation, prompt refinement) to construct rich conditioning, and a verifier closes the loop for iterative re-planning. The planner is the sole trainable component, optimized on-policy via Flow-GRPO inside the live multi-turn loop. On VideoPhy-2, NEWTON improves joint accuracy from 21.4% to 29.7% on LTX-Video and from 30.7% to 37.4% on Veo-3.1, without modifying either generator. Our project page: https://Newton026.github.io/newton

preprint2026arXiv

Observations and Remedies for Large Language Model Bias in Self-Consuming Performative Loop

The rapid advancement of large language models (LLMs) has led to growing interest in using synthetic data to train future models. However, this creates a self-consuming retraining loop, where models are trained on their own outputs and may cause performance drops and induce emerging biases. In real-world applications, previously deployed LLMs may influence the data they generate, leading to a dynamic system driven by user feedback. For example, if a model continues to underserve users from a group, less query data will be collected from this particular demographic of users. In this study, we introduce the concept of \textbf{S}elf-\textbf{C}onsuming \textbf{P}erformative \textbf{L}oop (\textbf{SCPL}) and investigate the role of synthetic data in shaping bias during these dynamic iterative training processes under controlled performative feedback. This controlled setting is motivated by the inaccessibility of real-world user preference data from dynamic production systems, and enables us to isolate and analyze feedback-driven bias evolution in a principled manner. We focus on two types of loops, including the typical retraining setting and the incremental fine-tuning setting, which is largely underexplored. Through experiments on three real-world tasks, we find that the performative loop increases preference bias and decreases disparate bias. We design a reward-based rejection sampling strategy to mitigate the bias, moving towards more trustworthy self-improving systems.

preprint2026arXiv

Patient-Zero: Scaling Synthetic Patient Agents to Real-World Distributions without Real Patient Data

Synthetic data generation with Large Language Models (LLMs) has emerged as a promising solution in the medical domain to mitigate data scarcity and privacy constraints. However, existing approaches remain constrained by their derivative nature, relying on real-world records, which pose privacy risks and distribution biases. Furthermore, current patient agents face the Stability-Plasticity Dilemma, struggling to maintain clinical consistency during dynamic inquiries. To address these challenges, we introduce Patient-Zero, a novel framework for ab initio patient simulation that requires no real medical records. Our Medically-Aligned Hierarchical Synthesis framework generates comprehensive and diverse patient records from abstract clinical guidelines via stratified attribute permutation. To support rigorous clinical interaction, we design a Dual-Track Cognitive Memory System to enable agents dynamically update memory while preserving logical consistency and persona adherence. Extensive evaluations show that Patient-Zero establishes a new state-of-the-art in both data quality and interaction fidelity. In human expert evaluations, senior licensed physicians judge our synthetic data to be statistically indistinguishable from real human-authored data and higher in clinical quality. Furthermore, downstream medical reasoning model trained on our synthetic dataset shows substantial performance gains (MedQA +24.0%; MMLU +14.5%), demonstrating the practical utility of our framework.

preprint2026arXiv

Performance Test and Circuit Simulation for R12699-406-M4 Photomultiplier Tube Base

The next-generation liquid xenon experiments like PandaX-xT target an energy range from sub-keV to multi-MeV to address the requirement of multiple physics searches. The Hamamatsu R12699-406-M4 photomultiplier tubes (PMTs) were developed and selected as photon sensors for PandaX-xT. Their voltage-divider base is optimized for a broad dynamic range, from single-photoelectron (SPE) sensitivity to 30~nC collected charge (matching the 2.5~MeV Q-value of $^{136}$Xe neutrinoless double beta decay~(NLDBD)). Using a dedicated test bench, we characterize the saturation and suppression responses of R12699-406-M4 PMTs with this base design. Based on measured PMT-base responses, we develop a circuit simulation model that accurately reproduces the physical mechanisms underlying these effects with key parameters tuned via experimental data. The combined simulation and bench-test approach guides base design and optimization, enabling improved detector dynamic range and supporting future saturation and suppression correction studies in data analysis.

preprint2026arXiv

ProtSAE: Disentangling and Interpreting Protein Language Models via Semantically-Guided Sparse Autoencoders

Sparse Autoencoder (SAE) has emerged as a powerful tool for mechanistic interpretability of large language models. Recent works apply SAE to protein language models (PLMs), aiming to extract and analyze biologically meaningful features from their latent spaces. However, SAE suffers from semantic entanglement, where individual neurons often mix multiple nonlinear concepts, making it difficult to reliably interpret or manipulate model behaviors. In this paper, we propose a semantically-guided SAE, called ProtSAE. Unlike existing SAE which requires annotation datasets to filter and interpret activations, we guide semantic disentanglement during training using both annotation datasets and domain knowledge to mitigate the effects of entangled attributes. We design interpretability experiments showing that ProtSAE learns more biologically relevant and interpretable hidden features compared to previous methods. Performance analyses further demonstrate that ProtSAE maintains high reconstruction fidelity while achieving better results in interpretable probing. We also show the potential of ProtSAE in steering PLMs for downstream generation tasks.

preprint2026arXiv

Rethinking Soft Interference Cancellation (IC) for MIMO: A Hard-Decision IC Inspired Recursive Scheme

Multiple-input multiple-output (MIMO) technology has been regarded as one of the most important technologies to enable emerging applications in current and next generation wireless communication systems, for which signal detection methods have been endowed with higher requirements, such as finer bit-error ratio (BER) performance, lower complexity, and smaller memory. Existing detectors mainly include hard-decision-based ordered successive interference cancellation (HD-OSIC) schemes with relatively simple implementation, and linear-minimum-mean-squareerror-based iterative soft interference cancellation (LMMSE-ISIC) schemes exhibiting near-optimal BER performance, whose advantages are combined by the detector developed in this paper. Specifically, we first elaborate that the LMMSE-ISIC scheme is the extension of the HD-OSIC counterpart, via comparing our proposed reordered description based on the equivalent channel matrix for the LMMSE-ISIC detection process with the other. Then, we propose a recursive scheme with speed advantage and memory saving for LMMSE-ISIC by extending that for HDOSIC, where the LMMSE-ISIC estimate and the filtering bias are updated highly efficiently. Compared to the existing best low-complexity LMMSE-ISIC scheme, theoretically, the required computations and memory units in each iteration of our proposed scheme decrease by at least 87.50% and 80.00%, respectively, and simulation results demonstrate that our proposed scheme always yields identical BER performance.

preprint2026arXiv

SLAP: Scalable Language-Audio Pretraining with Variable-Duration Audio and Multi-Objective Training

Contrastive language-audio pretraining (CLAP) has achieved notable success in learning semantically rich audio representations and is widely adopted for various audio-related tasks. However, current CLAP models face several key limitations. First, they are typically trained on relatively small datasets, often comprising a few million audio samples. Second, existing CLAP models are restricted to short and fixed duration, which constrains their usage in real-world scenarios with variable-duration audio. Third, the standard contrastive training objective operates on global representations, which may hinder the learning of dense, fine-grained audio features. To address these challenges, we introduce Scalable Language-Audio Pretraining (SLAP), which scales language-audio pretraining to 109 million audio-text pairs with variable audio durations and incorporates multiple training objectives. SLAP unifies contrastive loss with additional self-supervised and captioning losses in a single-stage training, facilitating the learning of richer dense audio representations. The proposed SLAP model achieves new state-of-the-art performance on audio-text retrieval and zero-shot audio classification tasks, demonstrating its effectiveness across diverse benchmarks.

preprint2026arXiv

Spatial Multi-Task Learning for Breast Cancer Molecular Subtype Prediction from Single-Phase DCE-MRI

Accurate molecular subtype classification is essential for personalized breast cancer treatment, yet conventional immunohistochemical analysis relies on invasive biopsies and is prone to sampling bias. Although dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) enables non-invasive tumor characterization, clinical workflows typically acquire only single-phase post-contrast images to reduce scan time and contrast agent dose. In this study, we propose a spatial multi-task learning framework for breast cancer molecular subtype prediction from clinically practical single-phase DCE-MRI. The framework simultaneously predicts estrogen receptor (ER), progesterone receptor (PR), human epidermal growth factor receptor 2 (HER2) status, and the Ki-67 proliferation index -- biomarkers that collectively define molecular subtypes. The architecture integrates a deep feature extraction network with multi-scale spatial attention to capture intratumoral and peritumoral characteristics, together with a region-of-interest weighting module that emphasizes the tumor core, rim, and surrounding tissue. Multi-task learning exploits biological correlations among biomarkers through shared representations with task-specific prediction branches. Experiments on a dataset of 960 cases (886 internal cases split 7:1:2 for training/validation/testing, and 74 external cases evaluated via five-fold cross-validation) demonstrate that the proposed method achieves an AUC of 0.893, 0.824, and 0.857 for ER, PR, and HER2 classification, respectively, and a mean absolute error of 8.2\% for Ki-67 regression, significantly outperforming radiomics and single-task deep learning baselines. These results indicate the feasibility of accurate, non-invasive molecular subtype prediction using standard imaging protocols.

preprint2026arXiv

SPEC-RL: Accelerating On-Policy Reinforcement Learning with Speculative Rollouts

Large Language Models (LLMs) increasingly rely on reinforcement learning with verifiable rewards (RLVR) to elicit reliable chain-of-thought reasoning. However, the training process remains bottlenecked by the computationally expensive rollout stage. Existing acceleration methods-such as parallelization, objective- and data-driven modifications, and replay buffers-either incur diminishing returns, introduce bias, or overlook redundancy across iterations. We identify that rollouts from consecutive training epochs frequently share a large portion of overlapping segments, wasting computation. To address this, we propose SPEC-RL, a novel framework that integrates SPECulative decoding with the RL rollout process. SPEC-RL reuses prior trajectory segments as speculative prefixes and extends them via a draft-and-verify mechanism, avoiding redundant generation while ensuring policy consistency. Experiments on diverse math reasoning and generalization benchmarks, including AIME24, MATH-500, OlympiadBench, MMLU-STEM, and others, demonstrate that SPEC-RL reduces rollout time by 2-3x without compromising policy quality. As a purely rollout-stage enhancement, SPEC-RL integrates seamlessly with mainstream algorithms (e.g., PPO, GRPO, DAPO), offering a general and practical path to scale RLVR for large reasoning models. Our code is available at https://github.com/ShopeeLLM/Spec-RL

preprint2026arXiv

Stephanie2: Thinking, Waiting, and Making Decisions Like Humans in Step-by-Step AI Social Chat

Instant-messaging human social chat typically progresses through a sequence of short messages. Existing step-by-step AI chatting systems typically split a one-shot generation into multiple messages and send them sequentially, but they lack an active waiting mechanism and exhibit unnatural message pacing. In order to address these issues, we propose Stephanie2, a novel next-generation step-wise decision-making dialogue agent. With active waiting and message-pace adaptation, Stephanie2 explicitly decides at each step whether to send or wait, and models latency as the sum of thinking time and typing time to achieve more natural pacing. We further introduce a time-window-based dual-agent dialogue system to generate pseudo dialogue histories for human and automatic evaluations. Experiments show that Stephanie2 clearly outperforms Stephanie1 on metrics such as naturalness and engagement, and achieves a higher pass rate on human evaluation with the role identification Turing test.

preprint2026arXiv

Streaming Hallucination Detection in Long Chain-of-Thought Reasoning

Long chain-of-thought (CoT) reasoning improves the performance of large language models, yet hallucinations in such settings often emerge subtly and propagate across reasoning steps. We suggest that hallucination in long CoT reasoning is better understood as an evolving latent state rather than a one-off erroneous event. Accordingly, we treat step-level hallucination judgments as local observations and introduce a cumulative prefix-level hallucination signal that tracks the global evolution of the reasoning state over the entire trajectory. Overall, our approach enables streaming hallucination detection in long CoT reasoning, providing real-time, interpretable evidence.

preprint2026arXiv

TokenSeg: Efficient 3D Medical Image Segmentation via Hierarchical Visual Token Compression

Three-dimensional medical image segmentation is a fundamental yet computationally demanding task due to the cubic growth of voxel processing and the redundant computation on homogeneous regions. To address these limitations, we propose \textbf{TokenSeg}, a boundary-aware sparse token representation framework for efficient 3D medical volume segmentation. Specifically, (1) we design a \emph{multi-scale hierarchical encoder} that extracts 400 candidate tokens across four resolution levels to capture both global anatomical context and fine boundary details; (2) we introduce a \emph{boundary-aware tokenizer} that combines VQ-VAE quantization with importance scoring to select 100 salient tokens, over 60\% of which lie near tumor boundaries; and (3) we develop a \emph{sparse-to-dense decoder} that reconstructs full-resolution masks through token reprojection, progressive upsampling, and skip connections. Extensive experiments on a 3D breast DCE-MRI dataset comprising 960 cases demonstrate that TokenSeg achieves state-of-the-art performance with 94.49\% Dice and 89.61\% IoU, while reducing GPU memory and inference latency by 64\% and 68\%, respectively. To verify the generalization capability, our evaluations on MSD cardiac and brain MRI benchmark datasets demonstrate that TokenSeg consistently delivers optimal performance across heterogeneous anatomical structures. These results highlight the effectiveness of anatomically informed sparse representation for accurate and efficient 3D medical image segmentation.

preprint2025arXiv

An AI-Driven Thermal-Fluid Testbed for Advanced Small Modular Reactors: Integration of Digital Twin and Large Language Models

This paper presents a multipurpose artificial intelligence (AI)-driven thermal-fluid testbed designed to advance Small Modular Reactor technologies by seamlessly integrating physical experimentation with advanced computational intelligence. The platform uniquely combines a versatile three-loop thermal-fluid facility with a high-fidelity digital twin and sophisticated AI frameworks for real-time prediction, control, and operational assistance. Methodologically, the testbed's digital twin, built upon the System Analysis Module code, is coupled with a Gated Recurrent Unit (GRU) neural network. This machine learning model, trained on experimental data, enables faster-than-real-time simulation, providing predictive insights into the system's dynamic behavior. The practical application of this AI integration is showcased through case studies. An AI-driven control framework where the GRU model accurately forecasts future system states and the corresponding control actions required to meet operational demands. Furthermore, an intelligent assistant, powered by a large language model, translates complex sensor data and simulation outputs into natural language, offering operators actionable analysis and safety recommendations. Comprehensive validation against experimental transients confirms the platform's high fidelity, with the GRU model achieving a temperature prediction root mean square error of 1.42 K. This work establishes an integrated research environment at the intersection of AI and thermal-fluid science, showcasing how AI-driven methodologies in modeling, control, and operator support can accelerate the innovation and deployment of next-generation nuclear systems.

preprint2025arXiv

Chiral Integrable Boundary States in the SU(4) Alternating Spin Chain

Previously identified integrable boundary states in ABJM theory are exclusively achiral. This paper presents the first chiral integrable boundary states in the $SU(4)$ alternating spin chain from the planar two-loop dilatation operator in the scalar sector. Utilizing a sufficient condition for the untwisted integrable condition, we identify specific two-site and four-site basis boundary states as chiral integrable states. Numerical evidence indicates that other basis states are unlikely to be chiral integrable. Furthermore, we compute the overlaps between these chiral integrable basis states and on-shell Bethe eigenstates.

preprint2025arXiv

Robust Bayesian Dynamic Programming for On-policy Risk-sensitive Reinforcement Learning

We propose a novel framework for risk-sensitive reinforcement learning (RSRL) that incorporates robustness against transition uncertainty. We define two distinct yet coupled risk measures: an inner risk measure addressing state and cost randomness and an outer risk measure capturing transition dynamics uncertainty. Our framework unifies and generalizes most existing RL frameworks by permitting general coherent risk measures for both inner and outer risk measures. Within this framework, we construct a risk-sensitive robust Markov decision process (RSRMDP), derive its Bellman equation, and provide error analysis under a given posterior distribution. We further develop a Bayesian Dynamic Programming (Bayesian DP) algorithm that alternates between posterior updates and value iteration. The approach employs an estimator for the risk-based Bellman operator that combines Monte Carlo sampling with convex optimization, for which we prove strong consistency guarantees. Furthermore, we demonstrate that the algorithm converges to a near-optimal policy in the training environment and analyze both the sample complexity and the computational complexity under the Dirichlet posterior and CVaR. Finally, we validate our approach through two numerical experiments. The results exhibit excellent convergence properties while providing intuitive demonstrations of its advantages in both risk-sensitivity and robustness. Empirically, we further demonstrate the advantages of the proposed algorithm through an application on option hedging.

preprint2025arXiv

SLM-TTA: A Framework for Test-Time Adaptation of Generative Spoken Language Models

Spoken Language Models (SLMs) are increasingly central to modern speech-driven applications, but performance degrades under acoustic shift - real-world noise, reverberation, and microphone variation. Prior solutions rely on offline domain adaptation, which is post-hoc, data-intensive, and slow. We introduce the first test-time adaptation (TTA) framework for generative SLMs that process interleaved audio-text prompts. Our method updates a small, targeted subset of parameters during inference using only the incoming utterance, requiring no source data or labels. This stabilizes token distributions and improves robustness to acoustic variability without degrading core task accuracy. Evaluated on automatic speech recognition, speech translation, and 19 audio understanding tasks from AIR-Bench, our approach yields consistent gains under diverse corruptions. Because adaptation touches only a small fraction of weights, it is both compute- and memory-efficient, supporting deployment on resource-constrained platforms. This work enhances the robustness and adaptability of generative SLMs for real-world speech-driven applications.

preprint2024arXiv

Nonasymptotic Convergence Rate of Quasi-Monte Carlo: Applications to Linear Elliptic PDEs with Lognormal Coefficients and Importance Samplings

This study analyzes the nonasymptotic convergence behavior of the quasi-Monte Carlo (QMC) method with applications to linear elliptic partial differential equations (PDEs) with lognormal coefficients. Building upon the error analysis presented in (Owen, 2006), we derive a nonasymptotic convergence estimate depending on the specific integrands, the input dimensionality, and the finite number of samples used in the QMC quadrature. We discuss the effects of the variance and dimensionality of the input random variable. Then, we apply the QMC method with importance sampling (IS) to approximate deterministic, real-valued, bounded linear functionals that depend on the solution of a linear elliptic PDE with a lognormal diffusivity coefficient in bounded domains of $\mathbb{R}^d$, where the random coefficient is modeled as a stationary Gaussian random field parameterized by the trigonometric and wavelet-type basis. We propose two types of IS distributions, analyze their effects on the QMC convergence rate, and observe the improvements.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Source provenance