Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
15topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Enhancing Visual Question Answering with Multimodal LLMs via Chain-of-Question Guided Retrieval-Augmented Generation

With advances in multimodal research and deep learning, Multimodal Large Language Models (MLLMs) have emerged as a powerful paradigm for a wide range of multimodal tasks. As a core problem in vision-language research, Visual Question Answering (VQA) has increasingly employed MLLMs to improve performance, particularly in open-domain settings where external knowledge is essential. In this work, we aim to further enhance retrieval-based VQA by more effectively integrating MLLMs with structured reasoning and knowledge acquisition. We introduce a logical prompting strategy that fuses Chain-of-Thought (CoT) reasoning with Visual Question Decomposition (VQD), termed CoVQD, to guide retrieval toward more accurate and relevant knowledge for MLLM inference. Building on this idea, we propose a new framework, CoVQD-guided RAG (CgRAG), which enables MLLMs to access more comprehensive and coherent external knowledge while benefiting from structured visual-text reasoning guidance, thereby improving generalization and reliability in complex cross-domain VQA scenarios. Extensive experiments on E-VQA, InfoSeek, and OKVQA benchmarks demonstrate the effectiveness of the proposed method.

preprint2026arXiv

MetaRA: Metamorphic Robustness Assessment for Multimodal Large Language Model-based Visual Question Answering Systems

Visual Question Answering (VQA), as the representative multimodal task, serves as a key benchmark for evaluating the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, existing evaluations largely rely on static datasets and accuracy-based metrics, which fail to capture robustness, consistency, and generalization. Inspired by Metamorphic Testing (MT), we propose Metamorphic Robustness Assessment (MetaRA), a testing framework that employs Metamorphic Relations (MRs) to systematically probe vulnerabilities in MLLM-based VQA systems. MetaRA generates controlled variations of image-question inputs based on specific MRs and evaluates models across diverse conditions. Applying MetaRA to multiple MLLM-based VQA models across different tasks reveals nuanced failure patterns, including sensitivity to linguistic perturbations, over-reliance on superficial visual cues, and deeper weaknesses in multimodal reasoning. Experimental results demonstrate that MetaRA provides richer diagnostic insights than conventional accuracy metrics, exposing failure modes that remain hidden under standard benchmarks. Overall, this work highlights the need for systematic robustness evaluation in VQA and positions metamorphic assessment as a scalable, model-agnostic approach toward trustworthy multimodal AI.

preprint2026arXiv

Model-Assisted Causal Inference for the Treatment Effect on Recurrent Events in the Presence of Terminal Events

This paper is motivated by evaluating the benefits of patients receiving mechanical circulatory support (MCS) devices in end-stage heart failure management inference, in which hypothesis testing for a treatment effect on the risk of recurrent events is challenged in the presence of terminal events. Existing methods based on cumulative frequency unreasonably disadvantage longer survivors as they tend to experience more recurrent events. The While-Alive-based (WA) test has provided a solution to address this survival-length-bias problem, and it performs well when the recurrent event rate holds constant over time. However, if such a constant-rate assumption is violated, the WA test can exhibit an inflated type I error and inaccurate estimation of treatment effects. To fill this methodological gap, we propose a Proportional Rate Marginal Structural Model-assisted Test (PR-MSMaT) in the causal inference framework of separable treatment effects for recurrent and terminal events. Using the simulation study, we demonstrate that our PR-MSMaT can properly control type I error while gaining power comparable to the WA test under time-varying recurrent event rates. We employ PR-MSMaT to compare different MCS devices with the postoperative risk of gastrointestinal bleeding among patients enrolled in the Interagency Registry of Mechanically Assisted Circulatory Support program.

preprint2026arXiv

PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

Multimodal sarcasm detection, which aims to precisely identify pragmatic incongruities between literal text and nonverbal cues, has gained substantial attention in multimodal understanding. Recent advancements have predominantly relied on naïve similarity-based attention mechanisms and uniform late fusion strategies.Furthermore, given that functional entanglement restricts traditional late fusions, we incorporate a scalar congruity routing mechanism and a prior-guided contextual graph. This mechanism anchors a generalized incongruity manifold through a two-stage asymmetric optimization driven by inconsistency-aware contrastive learning, selectively fusing only the most discriminative multi-granularity evidence. Extensive experiments on the \texttt{MUStARD} benchmark and its spurious-correlation-mitigated balanced datasets demonstrate that our approach achieves new state-of-the-art performance, surpassing the strongest multimodal baseline by a substantial 3.14\% improvement in Macro-F1. By architecturally isolating atomic, composition, and contextual conflicts. This work provides a robust, decoupled paradigm for modeling subtle pragmatic incongruities in human communication.

preprint2026arXiv

Short-term electricity load forecasting with multi-frequency reconstruction diffusion

Diffusion models have emerged as a powerful method in various applications. However, their application to Short-Term Electricity Load Forecasting (STELF) -- a typical scenario in energy systems -- remains largely unexplored. Considering the nonlinear and fluctuating characteristics of the load data, effectively utilizing the powerful modeling capabilities of diffusion models to enhance STELF accuracy remains a challenge. This paper proposes a novel diffusion model with multi-frequency reconstruction for STELF, referred to as the Multi-Frequency-Reconstruction-based Diffusion (MFRD) model. The MFRD model achieves accurate load forecasting through four key steps: (1) The original data is combined with the decomposed multi-frequency modes to form a new data representation; (2) The diffusion model adds noise to the new data, effectively reducing and weakening the noise in the original data; (3) The reverse process adopts a denoising network that combines Long Short-Term Memory (LSTM) and Transformer to enhance noise removal; and (4) The inference process generates the final predictions based on the trained denoising network. To validate the effectiveness of the MFRD model, we conducted experiments on two data platforms: Australian Energy Market Operator (AEMO) and Independent System Operator of New England (ISO-NE). The experimental results show that our model consistently outperforms the compared models.

preprint2022arXiv

A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources

Accurately estimating personalized treatment effects within a study site (e.g., a hospital) has been challenging due to limited sample size. Furthermore, privacy considerations and lack of resources prevent a site from leveraging subject-level data from other sites. We propose a tree-based model averaging approach to improve the estimation accuracy of conditional average treatment effects (CATE) at a target site by leveraging models derived from other potentially heterogeneous sites, without them sharing subject-level data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Specifically, under distributed data networks, our framework provides an interpretable tree-based ensemble of CATE estimators that joins models across study sites, while actively modeling the heterogeneity in data sources through site partitioning. The performance of this approach is demonstrated by a real-world study of the causal effects of oxygen therapy on hospital survival rate and backed up by comprehensive simulation results.

preprint2022arXiv

Persistent Homotopy Groups of Metric Spaces

We study notions of persistent homotopy groups of compact metric spaces together with their stability properties in the Gromov-Hausdorff sense. We pay particular attention to the case of fundamental groups, for which we obtain a more precise description. Under fairly mild assumptions on the spaces, we proved that the classical fundamental group has an underlying tree-like structure (i.e. a dendrogram) and an associated ultra-metric.

preprint2022arXiv

Supervised Homogeneity Fusion: a Combinatorial Approach

Fusing regression coefficients into homogenous groups can unveil those coefficients that share a common value within each group. Such groupwise homogeneity reduces the intrinsic dimension of the parameter space and unleashes sharper statistical accuracy. We propose and investigate a new combinatorial grouping approach called $L_0$-Fusion that is amenable to mixed integer optimization (MIO). On the statistical aspect, we identify a fundamental quantity called grouping sensitivity that underpins the difficulty of recovering the true groups. We show that $L_0$-Fusion achieves grouping consistency under the weakest possible requirement of the grouping sensitivity: if this requirement is violated, then the minimax risk of group misspecification will fail to converge to zero. Moreover, we show that in the high-dimensional regime, one can apply $L_0$-Fusion coupled with a sure screening set of features without any essential loss of statistical efficiency, while reducing the computational cost substantially. On the algorithmic aspect, we provide a MIO formulation for $L_0$-Fusion along with a warm start strategy. Simulation and real data analysis demonstrate that $L_0$-Fusion exhibits superiority over its competitors in terms of grouping accuracy.

preprint2022arXiv

Unconventional Excitonic States with Phonon Sidebands in Layered Silicon Diphosphide

Many-body interactions between quasiparticles (electrons, excitons, and phonons) have led to the emergence of new complex correlated states and are at the core of condensed matter physics and material science. In low-dimensional materials, unique electronic properties for these correlated states could significantly affect their optical properties. Herein, combining photoluminescence, optical reflection measurements and theoretical calculations, we demonstrate an unconventional excitonic state and its bound phonon sideband in layered silicon diphosphide (SiP$_2$), in which the bound electron-hole pair is composed of electrons confined within one-dimensional phosphorus$-$phosphorus chains and holes extended in two-dimensional SiP$_2$ layers. The excitonic state and the emergent phonon sideband show linear dichroism and large energy redshifts with increasing temperature. Within the $GW$ plus Bethe$-$Salpeter equation calculations and solving the generalized Holstein model non-perturbatively, we confirm that the observed sideband feature results from the correlated interaction between excitons and optical phonons. Such a layered material provides a new platform to study excitonic physics and many-particle effects.

preprint2021arXiv

Feature refinement: An expression-specific feature learning and fusion method for micro-expression recognition

Micro-Expression Recognition has become challenging, as it is extremely difficult to extract the subtle facial changes of micro-expressions. Recently, several approaches proposed several expression-shared features algorithms for micro-expression recognition. However, they do not reveal the specific discriminative characteristics, which lead to sub-optimal performance. This paper proposes a novel Feature Refinement ({FR}) with expression-specific feature learning and fusion for micro-expression recognition. It aims to obtain salient and discriminative features for specific expressions and also predict expression by fusing the expression-specific features. FR consists of an expression proposal module with attention mechanism and a classification branch. First, an inception module is designed based on optical flow to obtain expression-shared features. Second, in order to extract salient and discriminative features for specific expression, expression-shared features are fed into an expression proposal module with attention factors and proposal loss. Last, in the classification branch, labels of categories are predicted by a fusion of the expression-specific features. Experiments on three publicly available databases validate the effectiveness of FR under different protocol. Results on public benchmarks demonstrate that our FR provides salient and discriminative information for micro-expression recognition. The results also show our FR achieves better or competitive performance with the existing state-of-the-art methods on micro-expression recognition.

preprint2020arXiv

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed entirely by a single centralized computer, or when datasets are already stored in distributed database systems, the strategy of divide-and-combine has been the method of choice for scalability. Due to partition, the sub-dataset sample sizes may be uneven and some possibly close to p, which calls for regularization techniques to improve numerical stability. However, there is a lack of clear theoretical justification and practical guidelines to combine results obtained from separate regularized estimators, especially when the final objective is simultaneous inference for a group of regression parameters. In this paper, we develop a strategy to combine bias-corrected lasso-type estimates by using confidence distributions. We show that the resulting combined estimator achieves the same estimation efficiency as that of the maximum likelihood estimator using the centralized data. As demonstrated by simulated and real data examples, our divide-and-combine method yields nearly identical inference as the centralized benchmark.

preprint2020arXiv

Reducing Parameter Space for Neural Network Training

For neural networks (NNs) with rectified linear unit (ReLU) or binary activation functions, we show that their training can be accomplished in a reduced parameter space. Specifically, the weights in each neuron can be trained on the unit sphere, as opposed to the entire space, and the threshold can be trained in a bounded interval, as opposed to the real line. We show that the NNs in the reduced parameter space are mathematically equivalent to the standard NNs with parameters in the whole space. The reduced parameter space shall facilitate the optimization procedure for the network training, as the search space becomes (much) smaller. We demonstrate the improved training performance using numerical examples.

preprint2020arXiv

Remote weak signal measurement via bound states in optomechanical system

A scheme for remote weak signal sensor is proposed in which a coupled resonator optical waveguide~(CROW), as a transmitter, couples to a hybrid optomechanical cavity and an observing cavity, respectively. The non-Markovian theory is employed to study the weak force sensor by treating the CROW as a non-Markovian reservoir of the cavity fields, and the negative-effective-mass~(NEM) oscillator is introduced to cancel the back-action noise. Under certain conditions, dissipationless bound states can be formed such that weak signal can be transferred in the CROW without dissipation. Our results show that ultrahigh sensitivity can be achieved with the assistance of the bound states under certain parameters regime.

preprint2020arXiv

Simultaneous blockade of a photon phonon, and magnon induced by a two-level atom

The hybrid microwave optomechanical-magnetic system has recently emerged as a promising candidate for coherent information processing because of the ultrastrong microwave photon-magnon coupling and the longlife of the magnon and phonon. As a quantum information processing device, the realization of a single excitation holds special meaning for the hybrid system. In this paper, we introduce a single two-level atom into the optomechanical-magnetic system and show that an unconventional blockade due to destructive interference cannot offer a blockade of both the photon and magnon. Meanwhile, under the condition of single excitation resonance, the blockade of the photon, phonon, and magnon can be achieved simultaneously even in a weak optomechanical region, but the phonon blockade still requires the cryogenic temperature condition.