Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
17topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2026arXiv

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

We introduce RFC Bench, a benchmark for evaluating large language models on financial misinformation under realistic news. RFC Bench operates at the paragraph level and captures the contextual complexity of financial news where meaning emerges from dispersed cues. The benchmark defines two complementary tasks: reference free misinformation detection and comparison based diagnosis using paired original perturbed inputs. Experiments reveal a consistent pattern: performance is substantially stronger when comparative context is available, while reference free settings expose significant weaknesses, including unstable predictions and elevated invalid outputs. These results indicate that current models struggle to maintain coherent belief states without external grounding. By highlighting this gap, RFC Bench provides a structured testbed for studying reference free reasoning and advancing more reliable financial misinformation detection in real world settings.

preprint2026arXiv

An AI-guided mechanotyping instrument for fully automated oocyte quality assessment

The mechanical properties of oocytes are regarded as important indicators of their developmental potential. During fertilization, deviations from the normal mechanical range can hinder sperm penetration, ultimately reducing fertilization efficiency and compromising embryo quality. However, current methods for measuring oocyte mechanics often suffer from serious cellular damage, low automation levels, and large measurement errors. To address these limitations, we developed an AI-guided micronewton-scale mechanical measurement system for safe and automated oocyte quality assessment. The system integrates voice interaction with automated experimental workflows to control a magnetically actuated microgripper, which applies defined loading forces to induce micron-scale compressive deformation of the oocyte. Combined with AI-assisted object detection and image segmentation algorithms, the system captures cellular deformation in real time, enabling precise calculation of the oocyte's compressive modulus. This measurement system enables automated, quantitative, and non-destructive evaluation of oocyte mechanical properties, providing an effective approach for oocyte quality screening in in vitro fertilization (IVF) and other assisted reproductive technologies (ART).

preprint2026arXiv

Attention Sinks and Outliers in Attention Residuals

We propose OASIS, an outlier- and sink-aware technique built on inter-layer null signaling. As AttnResidual architectures introduce an additional depth-wise normalization channel, they improve inter-layer routing flexibility but also exacerbate attention sinks, activation outliers, and the resulting degradation in inference stability and quantization robustness. OASIS addresses this issue by introducing a Softmax1-based null space and coupling token-level null evidence to depth routing through an inter-layer null signal, thereby reducing sink-dominated routing and improving structural robustness. Theoretically, we show that the dual-normalization design of AttnResidual intensifies sink formation and quantization brittleness. Experimentally, we compare OASIS against five baselines on three real-world datasets and observe consistent improvements in both attention sink and post-quantization performance. Notably, OASIS achieves an average reduction of 9.26% in maximum infinity norm and 2.60% in average kurtosis across the evaluated settings, while lowering perplexity by 75.85% under W8A8 and improving GSM8K Pass@1 by 12.42% under W4A4.

preprint2026arXiv

Constant Inapproximability of Pacing Equilibria in Second-Price Auctions

In this paper, we revisit the problem of approximating a pacing equilibrium in second-price auctions, introduced by Conitzer, Kroer, Sodomka, and Moses [Oper. Res. 2022]. We show that finding a constant-factor approximation of a pacing equilibrium is PPAD-hard, thereby strengthening previous results of Chen, Kroer, and Kumar [Math. Oper. Res. 2024], which established PPAD-hardness only for inverse-polynomial approximations.

preprint2026arXiv

Counter-diabatic driving for fast spin control in a two-electron double quantum dot

The techniques of shortcuts to adiabaticity have been proposed to accelerate the "slow" adiabatic processes in various quantum systems with the applications in quantum information processing. In this paper, we study the counter-diabatic driving for fast adiabatic spin manipulation in a two-electron double quantum dot by designing time-dependent electric fields in the presence of spin-orbit coupling. To simplify implementation and find an alternative shortcut, we further transform the Hamiltonian in term of Lie algebra, which allows one to use a single Cartesian component of electric fields. In addition, the relation between energy and time is quantified to show the lower bound for the operation time when the maximum amplitude of electric fields is given. Finally, the fidelity is discussed with respect to noise and systematic errors, which demonstrates that the decoherence effect induced by stochastic environment can be avoided in speeded-up adiabatic control.

preprint2026arXiv

Data-Driven Knowledge Transfer in Batch $Q^*$ Learning

In data-driven decision-making in marketing, healthcare, and education, it is desirable to utilize a large amount of data from existing ventures to navigate high-dimensional feature spaces and address data scarcity in new ventures. We explore knowledge transfer in dynamic decision-making by concentrating on batch stationary environments and formally defining task discrepancies through the lens of Markov decision processes (MDPs). We propose a framework of Transferred Fitted $Q$-Iteration algorithm with general function approximation, enabling the direct estimation of the optimal action-state function $Q^*$ using both target and source data. We establish the relationship between statistical performance and MDP task discrepancy under sieve approximation, shedding light on the impact of source and target sample sizes and task discrepancy on the effectiveness of knowledge transfer. We show that the final learning error of the $Q^*$ function is significantly improved from the single task rate both theoretically and empirically.

preprint2026arXiv

Evaluating the Diagnostic Classification Ability of Multimodal Large Language Models: Insights from the Osteoarthritis Initiative

Multimodal large language models (MLLMs) show promising performance on medical visual question answering (VQA) and report generation, but these generation and explanation abilities do not reliably transfer to disease-specific classification. We evaluated MLLM architectures on knee osteoarthritis (OA) radiograph classification, which remains underrepresented in existing medical MLLM benchmarks, even though knee OA affects an estimated 300 to 400 million people worldwide. Through systematic ablation studies manipulating the vision encoder, the connector, and the large language model (LLM) across diverse training strategies, we measured each component's contribution to diagnostic accuracy. In our classification task, a trained vision encoder alone could outperform full MLLM pipelines in classification accuracy and fine-tuning the LLM provided no meaningful improvement over prompt-based guidance. And LoRA fine-tuning on a small, class-balanced dataset (500 images) gave better results than training on a much larger but class-imbalanced set (5,778 images), indicating that data balance and quality can matter more than raw scale for this task. These findings suggest that for domain-specific medical classification, LLMs are more effective as interpreters and report generators rather than as primary classifiers. Therefore, the MLLM architecture appears less suitable for medical image diagnostic classification tasks that demand high certainty. We recommend prioritizing vision encoder optimization and careful dataset curation when developing clinically applicable systems.

preprint2026arXiv

GDRO: Group-level Reward Post-training Suitable for Diffusion Models

Recent advancements adopt online reinforcement learning (RL) from LLMs to text-to-image rectified flow diffusion models for reward alignment. The use of group-level rewards successfully aligns the model with the targeted reward. However, it faces challenges including low efficiency, dependency on stochastic samplers, and reward hacking. The problem is that rectified flow models are fundamentally different from LLMs: 1) For efficiency, online image sampling takes much more time and dominates the time of training. 2) For stochasticity, rectified flow is deterministic once the initial noise is fixed. Aiming at these problems and inspired by the effects of group-level rewards from LLMs, we design Group-level Direct Reward Optimization (GDRO). GDRO is a new post-training paradigm for group-level reward alignment that combines the characteristics of rectified flow models. Through rigorous theoretical analysis, we point out that GDRO supports full offline training that saves the large time cost for image rollout sampling. Also, it is diffusion-sampler-independent, which eliminates the need for the ODE-to-SDE approximation to obtain stochasticity. We also empirically study the reward hacking trap that may mislead the evaluation, and involve this factor in the evaluation using a corrected score that not only considers the original evaluation reward but also the trend of reward hacking. Extensive experiments demonstrate that GDRO effectively and efficiently improves the reward score of the diffusion model through group-wise offline optimization across the OCR and GenEval tasks, while demonstrating strong stability and robustness in mitigating reward hacking.

preprint2026arXiv

Graphene-assisted resonant transmission and enhanced Goos-Hänchen shift in a frustrated total internal reflection configuration

Graphene-assisted resonant transmission and enhanced Goos-Hänchen shift are investigated in a two-prism frustrated-total-internal-reflection configuration. Due to the excitation of surface plasmons induced by graphene in low terahertz frequency range, there exist the resonant transmission and anomalous Goos-Hänchen shifts in such optical tunneling configuration. As compared to the case of quantum well, graphene sheet with unique optical properties can enhance the resonant transmission with relatively low loss, and modulate the large negative and positive Goos-Hänchen shifts by adjusting chemical potential or electron relaxation time. These intriguing phenomena may lead to some potential applications in graphene-based electro-optic devices.

preprint2026arXiv

Knowledge Distillation for LLM-Based Human Activity Recognition in Homes

Human Activity Recognition (HAR) is a central problem for context-aware applications, especially for smart homes and assisted living. A few very recent studies have shown that Large Language Models (LLMs) can be used for HAR at home, reaching high performance and addressing key challenges. In this paper, we provide new experimental results regarding the use of LLMs for HAR, on two state-of-the-art datasets. More specifically, we show how recognition performance evolves depending on the size of the LLM used. Moreover, we experiment on the use of knowledge distillation techniques to fine-tune smaller LLMs with HAR reasoning examples generated by larger LLMs. We show that such fine-tuned models can perform almost as well as the largest LLMs, while having 50 times less parameters.

preprint2026arXiv

On the Limitations of Rank-One Model Editing in Answering Multi-hop Questions

Recent advances in Knowledge Editing (KE), particularly Rank-One Model Editing (ROME), show superior efficiency over fine-tuning and in-context learning for updating single-hop facts in transformers. However, these methods face significant challenges when applied to multi-hop reasoning tasks requiring knowledge chaining. In this work, we study the effect of editing knowledge with ROME on different layer depths and identify three key failure modes. First, the "hopping-too-late" problem occurs as later layers lack access to necessary intermediate representations. Second, generalization ability deteriorates sharply when editing later layers. Third, the model overfits to edited knowledge, incorrectly prioritizing edited-hop answers regardless of context. To mitigate the issues of "hopping-too-late" and generalisation decay, we propose Redundant Editing, a simple yet effective strategy that enhances multi-hop reasoning. Our experiments demonstrate that this approach can improve accuracy on 2-hop questions by at least 15.5 percentage points, representing a 96% increase over the previous single-edit strategy, while trading off some specificity and language naturalness.

preprint2026arXiv

Persistent Sheaf Laplacian Analysis of Protein Stability and Solubility Changes upon Mutation

Genetic mutations frequently disrupt protein structure, stability, and solubility, acting as primary drivers for a wide spectrum of diseases. Despite the critical importance of these molecular alterations, existing computational models often lack interpretability, and fail to integrate essential physicochemical interaction. To overcome these limitations, we propose SheafLapNet, a unified predictive framework grounded in the mathematical theory of Topological Deep Learning (TDL) and Persistent Sheaf Laplacian (PSL). Unlike standard Topological Data Analysis (TDA) tools such as persistent homology, which are often insensitive to heterogeneous information, PSL explicitly encodes specific physical and chemical information such as partial charges directly into the topological analysis. SheafLapNet synergizes these sheaf-theoretic invariants with advanced protein transformer features and auxiliary physical descriptors to capture intrinsic molecular interactions in a multiscale and mechanistic manner. To validate our framework, we employ rigorous benchmarks for both regression and classification tasks. For stability prediction, we utilize the comprehensive S2648 and S350 datasets. For solubility prediction, we employ the PON-Sol2 dataset, which provides annotations for increased, decreased, or neutral solubility changes. By integrating these multi-perspective features, SheafLapNet achieves state-of-the-art performance across these diverse benchmarks, demonstrating that sheaf-theoretic modeling significantly enhances both interpretability and generalizability in predicting mutation-induced structural and functional changes.

preprint2026arXiv

PICABench: How Far Are We from Physically Realistic Image Editing?

Image editing has achieved remarkable progress recently. Modern editing models could already follow complex instructions to manipulate the original content. However, beyond completing the editing instructions, the accompanying physical effects are the key to the generation realism. For example, removing an object should also remove its shadow, reflections, and interactions with nearby objects. Unfortunately, existing models and benchmarks mainly focus on instruction completion but overlook these physical effects. So, at this moment, how far are we from physically realistic image editing? To answer this, we introduce PICABench, which systematically evaluates physical realism across eight sub-dimension (spanning optics, mechanics, and state transitions) for most of the common editing operations (add, remove, attribute change, etc.). We further propose the PICAEval, a reliable evaluation protocol that uses VLM-as-a-judge with per-case, region-level human annotations and questions. Beyond benchmarking, we also explore effective solutions by learning physics from videos and construct a training dataset PICA-100K. After evaluating most of the mainstream models, we observe that physical realism remains a challenging problem with large rooms to explore. We hope that our benchmark and proposed solutions can serve as a foundation for future work moving from naive content editing toward physically consistent realism.

preprint2026arXiv

Quantum state engineering of spin-orbit coupled ultracold atoms in a Morse potential

Achieving full control of a Bose-Einstein condensate can have valuable applications in metrology, quantum information processing, and quantum condensed matter physics. We propose protocols to simultaneously control the internal (related to its pseudospin-1/2) and motional (position-related) states of a spin-orbit-coupled Bose-Einstein condensate confined in a Morse potential. In the presence of synthetic spin-orbit coupling, the state transition of a noninteracting condensate can be implemented by Raman coupling and detuning terms designed by invariant-based inverse engineering. The state transfer may also be driven by tuning the direction of the spin-orbit-coupling field and modulating the magnitude of the effective synthetic magnetic field. The results can be generalized for interacting condensates by changing the time-dependent detuning to compensate for the interaction. We find that a two-level algorithm for the inverse engineering remains numerically accurate even if the entire set of possible states is considered. The proposed approach is robust against the laser-field noise and systematic device-dependent errors.

preprint2026arXiv

SfMamba: Efficient Source-Free Domain Adaptation via Selective Scan Modeling

Source-free domain adaptation (SFDA) tackles the critical challenge of adapting source-pretrained models to unlabeled target domains without access to source data, overcoming data privacy and storage limitations in real-world applications. However, existing SFDA approaches struggle with the trade-off between perception field and computational efficiency in domain-invariant feature learning. Recently, Mamba has offered a promising solution through its selective scan mechanism, which enables long-range dependency modeling with linear complexity. However, the Visual Mamba (i.e., VMamba) remains limited in capturing channel-wise frequency characteristics critical for domain alignment and maintaining spatial robustness under significant domain shifts. To address these, we propose a framework called SfMamba to fully explore the stable dependency in source-free model transfer. SfMamba introduces Channel-wise Visual State-Space block that enables channel-sequence scanning for domain-invariant feature extraction. In addition, SfMamba involves a Semantic-Consistent Shuffle strategy that disrupts background patch sequences in 2D selective scan while preserving prediction consistency to mitigate error accumulation. Comprehensive evaluations across multiple benchmarks show that SfMamba achieves consistently stronger performance than existing methods while maintaining favorable parameter efficiency, offering a practical solution for SFDA. Our code is available at https://github.com/chenxi52/SfMamba.

preprint2026arXiv

Video Detective: Seek Critical Clues Recurrently to Answer Question from Long Videos

Long Video Question-Answering (LVQA) presents a significant challenge for Multi-modal Large Language Models (MLLMs) due to immense context and overloaded information, which could also lead to prohibitive memory consumption. While existing methods attempt to address these issues by reducing visual tokens or extending model's context length, they may miss useful information or take considerable computation. In fact, when answering given questions, only a small amount of crucial information is required. Therefore, we propose an efficient question-aware memory mechanism, enabling MLLMs to recurrently seek these critical clues. Our approach, named VideoDetective, simplifies this task by iteratively processing video sub-segments. For each sub-segment, a question-aware compression strategy is employed by introducing a few special memory tokens to achieve purposefully compression. This allows models to effectively seek critical clues while reducing visual tokens. Then, due to history context could have a significant impact, we recurrently aggregate and store these memory tokens to update history context, which would be reused for subsequent sub-segments. Furthermore, to more effectively measure model's long video understanding ability, we introduce GLVC (Grounding Long Video Clues), a long video question-answering dataset, which features grounding critical and concrete clues scattered throughout entire videos. Experimental results demonstrate our method enables MLLMs with limited context length of 32K to efficiently process 100K tokens (3600 frames, an hour-long video sampled at 1fps), requiring only 2 minutes and 37GB GPU memory usage. Evaluation results across multiple long video benchmarks illustrate our method can more effectively seek critical clues from massive information.

preprint2025arXiv

Dynamite: Real-Time Debriefing Slide Authoring through AI-Enhanced Multimodal Interaction

Facilitating class-wide debriefings after small-group discussions is a common strategy in ethics education. Instructor interviews revealed that effective debriefings should highlight frequently discussed themes and surface underrepresented viewpoints, making accurate representations of insight occurrence essential. Yet authoring presentations in real time is cognitively overwhelming due to the volume of data and tight time constraints. We present Dynamite, an AI-assisted system that enables semantic updates to instructor-authored slides during live classroom discussions. These updates are powered by semantic data binding, which links slide content to evolving discussion data, and semantic suggestions, which offer revision options aligned with pedagogical goals. In a within-subject in-lab study with 12 participants, Dynamite outperformed a text-based AI baseline in content accuracy and quality. Participants used voice and sketch input to quickly organize semantic blocks, then applied suggestions to accelerate refinement as data stabilized.