Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
22works
0followers
20topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

22 published item(s)

preprint2026arXiv

CUDAHercules: Benchmarking Hardware-Aware Expert-level CUDA Optimization for LLMs

Large language models show promise for automated CUDA programming, however even the strongest coding models (e.g., Claude-Opus-4.6) may still fall short of expert-level, architecture-aware optimization. We introduce CUDAHercules, a benchmark that evaluates generated CUDA against end-to-end human-expert SOTA systems. It spans single kernels, module-level operators, full applications, and unsolved challenge tasks across Ampere, Hopper, and Blackwell GPUs, with end-to-end tasks gated by domain-specific semantic validators. Evaluating models such as Claude-Opus-4.6 and GPT-5.4 shows a large gap between runnable CUDA and expert CUDA engineering: models often compile and pass tests, but rarely recover the optimization strategies needed to match expert performance. Application semantics further reduce success, and iterative or tool-augmented feedback can improve correctness while drifting toward slow fallback implementations. These results show that automated CUDA programming remains far from fully solved and requires stronger hardware reasoning, better tool use, and training objectives that connect code understanding to hardware architecture-grounded intelligence.

preprint2026arXiv

Exploring Recommender System Evaluation: A Multi-Modal User Agent Framework for A/B Testing

In recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.

preprint2026arXiv

Jailbreaking Large Language Models through Iterative Tool-Disguised Attacks via Reinforcement Learning

Large language models (LLMs) have demonstrated remarkable capabilities across diverse applications, however, they remain critically vulnerable to jailbreak attacks that elicit harmful responses violating human values and safety guidelines. Despite extensive research on defense mechanisms, existing safeguards prove insufficient against sophisticated adversarial strategies. In this work, we propose iMIST (\underline{i}nteractive \underline{M}ulti-step \underline{P}rogre\underline{s}sive \underline{T}ool-disguised Jailbreak Attack), a novel adaptive jailbreak method that synergistically exploits vulnerabilities in current defense mechanisms. iMIST disguises malicious queries as normal tool invocations to bypass content filters, while simultaneously introducing an interactive progressive optimization algorithm that dynamically escalates response harmfulness through multi-turn dialogues guided by real-time harmfulness assessment. Our experiments on widely-used models demonstrate that iMIST achieves higher attack effectiveness, while maintaining low rejection rates. These results reveal critical vulnerabilities in current LLM safety mechanisms and underscore the urgent need for more robust defense strategies.

preprint2026arXiv

JWST Insights into Narrow-line Little Red Dots

James Webb Space Telescope (JWST) has revealed a population of red and compact objects with a unique V-shape SED at z >= 4 known as Little Red Dots (LRDs). Most of the LRDs with existing spectral observations exhibit broad Balmer lines and are thus likely to host active galactic nuclei (AGNs). Here we present a study of LRDs with no broad H-alpha component. Our sample consists of five LRDs at z~5 with H-alpha line widths of about 250 km/s. They are selected from 32 LRDs that have NIRSpec high- or medium-resolution grating spectra covering H-alpha. During our construction of the sample, we find that approximately 20 percent of the LRD candidates previously selected do not show red continuum emission but resemble the V-shape spectra due to strong line emission. Compared to normal star-forming galaxies, narrow-line LRDs tend to have relatively higher H-alpha line widths and luminosities. If these LRDs are dominated by galaxies, our SED modeling suggests that they are dusty, compact star-forming galaxies with high stellar masses and star formation rates (SFRs). Alternatively, if their SEDs are produced by AGNs, the inferred central black hole masses (MBH) are in the range of 10^5 to 10^6 solar masses, placing them at the low-mass end of the AGN population. They may represent an early stage of super-Eddington growth, where the black holes have yet to accumulate significant masses. With large measurement uncertainties, these black holes appear slightly overmassive relative to the local MBH-Mstar relation, but consistent or undermassive with respect to the MBH-sigma and MBH-Mdyn relations. We further find that nearly half of the high-redshift broad-line AGNs exhibit V-shape SEDs. (abridged)

preprint2026arXiv

Noise Reduction for Pufferfish Privacy: A Practical Noise Calibration Method

This paper introduces a relaxed noise calibration method to enhance data utility while attaining pufferfish privacy. This work builds on the existing $1$-Wasserstein (Kantorovich) mechanism by alleviating the existing overly strict condition that leads to excessive noise, and proposes a practical mechanism design algorithm as a general solution. We prove that a strict noise reduction by our approach always exists compared to $1$-Wasserstein mechanism for all privacy budgets $ε$ and prior beliefs, and the noise reduction (also represents improvement on data utility) gains increase significantly for low privacy budget situations--which are commonly seen in real-world deployments. We also analyze the variation and optimality of the noise reduction with different prior distributions. Moreover, all the properties of the noise reduction still exist in the worst-case $1$-Wasserstein mechanism we introduced, when the additive noise is largest. We further show that the worst-case $1$-Wasserstein mechanism is equivalent to the $\ell_1$-sensitivity method. Experimental results on three real-world datasets demonstrate $47\%$ to $87\%$ improvement in data utility.

preprint2026arXiv

QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum calibration plots: 243 samples across 87 scenario types from 22 experiment families, spanning superconducting qubits and neutral atoms, evaluated on six question types in both zero-shot and in-context learning settings. The best general-purpose zero-shot model reaches a mean score of 72.3, and many open-weight models degrade under multi-image in-context learning, whereas frontier closed models improve substantially. A supervised fine-tuning ablation at the 9-billion-parameter scale shows that SFT improves zero-shot performance but cannot close the multimodal in-context learning gap. As a reference case study, we release NVIDIA Ising Calibration 1, an open-weight model based on Qwen3.5-35B-A3B that reaches 74.7 zero-shot average score.

preprint2026arXiv

Reading the Cell, Designing the Cure: Perturbation-Conditioned Molecular Diffusion for Function-Oriented Drug Design

When reliable target structures are unavailable at scale or phenotypes arise from dysregulated pathways, transcriptomic perturbations provide a system-level functional readout for drug action. In this work, we formalize \emph{Transcriptome-based Drug Design (TBDD)} as a generative inverse problem: designing drug molecules conditioned on desired transcriptomic state transitions. We analyze the inherently ill-posed nature of this task, which is further complicated by the profound domain gap between biology and chemistry and by the sparsity of transcriptomic signals. To address these challenges, we propose \textbf{\themodel{}} (A \textbf{C}ell\textbf{U}lar \textbf{R}esponse \textbf{E}ngine), a multi-resolution transcriptome-guided diffusion framework. \themodel{} features a specialized \textbf{Transcriptome Perturbation Functional Feature Extractor (TFE)} that (1) distills function-oriented perturbation embeddings from pre/post states, (2) aligns these signatures to dual chemical views to bridge the cross-modal gap, and (3) performs heterogeneity-aware aggregation to extract robust state-specific signals from noisy transcriptomic data. Extensive evaluations on both standard benchmarks and rigorous out-of-distribution protocols demonstrate that \themodel{} consistently outperforms strong baselines in structural quality and functional consistency. Furthermore, we validate its practical utility via a zero-shot gene-inhibitor design task, highlighting the potential of phenotype-driven generative discovery.

preprint2024arXiv

Nonlinear vibration of a dipteran flight robot system with rotational geometric nonlinearity

The dipteran flight mechanism of the insects is commonly used to design the nonlinear flight robot system. However, the dynamic response of the click mechanism of the nonlinear robot system with multiple stability still unclear. In this paper, a novel dipteran robot model with click mechanism proposed based on the multiple stability of snap-through buckling. The motion of equation of the nonlinear flight robot system is obtained by using the Euler-Lagrange equation. The nonlinear potential energy, the elastic force, equilibrium bifurcation, as well as equilibrium stability are investigated to show the multiple stability characteristics. The transient sets of bifurcation and persistent set of regions in the system parameter plane and the corresponding phase portraits are obtained with multiple stability of single and double well behaviors. Then, the periodic free vibration response are defined by the analytical solution of three kinds of elliptical functions, as well as the amplitude frequency responses are investigated by numerical integration. Based on the topological equivalent method, the chaotic thresholds of the homo-clinic orbits for the chaotic vibration of harmonic forced robot system are derived to show the chaotic parametric condition. Finally, the prototype of nonlinear flapping robot is manufactured and the experimental system is setup. The nonlinear static moment of force curves, periodic response and dynamic flight vibration of dipteran robot system are carried out. It is shown that the test results are agree well with the theoretical analysis and numerical simulation. Those result have the potential application for the structure design of the efficient flight robot.

preprint2023arXiv

Active RIS vs. Passive RIS: Which Will Prevail in 6G?

As a revolutionary paradigm for controlling wireless channels, reconfigurable intelligent surfaces (RISs) have emerged as a candidate technology for future 6G networks. However, due to the "multiplicative fading" effect, the existing passive RISs only achieve limited capacity gains in many scenarios with strong direct links. In this paper, the concept of active RISs is proposed to overcome this fundamental limitation. Unlike passive RISs that reflect signals without amplification, active RISs can amplify the reflected signals via amplifiers integrated into their elements. To characterize the signal amplification and incorporate the noise introduced by the active components, we develop and verify the signal model of active RISs through the experimental measurements based on a fabricated active RIS element. Based on the verified signal model, we further analyze the asymptotic performance of active RISs to reveal the substantial capacity gain they provide for wireless communications. Finally, we formulate the sum-rate maximization problem for an active RIS aided multi-user multiple-input single-output (MU-MISO) system and a joint transmit beamforming and reflect precoding scheme is proposed to solve this problem. Simulation results show that, in a typical wireless system, passive RISs can realize only a limited sum-rate gain of 22%, while active RISs can achieve a significant sum-rate gain of 130%, thus overcoming the "multiplicative fading" effect.

preprint2023arXiv

Nonlinear energy harvesting system with multiple stability

The nonlinear energy harvesting systems of the forced vibration with an electron-mechanical coupling are widely used to capture ambient vibration energy and convert mechanical energy into electrical energy. However, the nonlinear response mechanism of the friction induced vibration (FIV) energy harvesting system with multiple stability and stick-slip motion is still unclear. In the current paper, a novel nonlinear energy harvesting model with multiple stability of single-, double- and triple-well potential is proposed based on V-shaped structure spring and the belt conveying system. The dynamic equations for the energy harvesting system with multiple stability and self-excited friction are established by using Euler-Lagrangian equations. Secondly, the nonlinear restoring force, friction force, and potential energy surfaces for static characteristics of the energy harvesting system are obtained to show the nonlinear varying stiffness, multiple equilibrium points, discontinuous behaviors and multiple well response. Then, the equilibrium surface of bifurcation sets of the autonomous system is given to show the third-order quasi zero stiffness (QZS3), fifth-order quasi zero stiffness (QZS5), double well (DW) and triple well (TW). Furthermore, the response amplitudes of charge, current, voltage and power of the forced electron-mechanical coupled vibration system for QZS3, QZS5, DW and TW are analyzed by using the numerically solution. Finally, a prototype of FIV energy harvesting system is manufactured and the experimental system is setup. The experimental work of static restoring force, damping force and electrical output are well agreeable with the numerical results, which testified the proposed FIV energy harvesting model.

preprint2022arXiv

Active RISs: Signal Modeling, Asymptotic Analysis, and Beamforming Design

Reconfigurable intelligent surfaces (RISs) have emerged as a candidate technology for future 6G networks. However, due to the "multiplicative fading" effect, the existing passive RISs only achieve a negligible capacity gain in environments with strong direct links. In this paper, the concept of active RISs is studied to overcome this fundamental limitation. Unlike the existing passive RISs that reflect signals without amplification, active RISs can amplify the reflected signals via amplifiers integrated into their elements. To characterize the signal amplification and incorporate the noise introduced by the active components, we verify the signal model of active RISs through the experimental measurements on a fabricated active RIS element. Based on the verified signal model, we formulate the sum-rate maximization problem for an active RIS aided multi-user multiple-input single-output (MU-MISO) system and a joint transmit precoding and reflect beamforming algorithm is proposed to solve this problem. Simulation results show that, in a typical wireless system, the existing passive RISs can realize only a negligible sum-rate gain of 3%, while the active RISs can achieve a significant sum-rate gain of 62%, thus overcoming the "multiplicative fading" effect. Finally, we develop a 64-element active RIS aided wireless communication prototype, and the significant gain of active RISs is validated by field test.

preprint2022arXiv

Distance-Aware Precoding for Near-Field Capacity Improvement

Extremely large-scale MIMO (XL-MIMO) is a promising technology to improve the capacity for future 6G networks. With a very large number of antennas, the near-field property of XL-MIMO systems becomes dominant. Unlike the classical far-field line-of-sight (LoS) channel with only one available data stream, the significantly increased degrees of freedom (DoFs) are available in the near-field LoS channel. However, limited by the small number of radio frequency (RF) chains, the existing hybrid precoding architecture widely used for 5G is not able to fully exploit the extra DoFs in the near-field region. In this paper, the available DoFs and the capacity of the near-field LoS channel are theoretically analyzed at first. Then, to exploit the near-field effect as a new possibility for capacity improvement, the distance-aware precoding (DAP) scheme is proposed. We develop the DAP architecture, where a dedicated selection circuit is inserted to connect phase shifters and RF chains. Moreover, each RF chain can be flexibly configured to active or inactive according to the distance-related DoFs in the proposed DAP architecture. Based on the developed DAP architecture, a DAP algorithm is proposed to optimize the number of activated RF chains and precoding matrices to match the increased DoFs in the near-field region. Finally, simulation results verify that, the proposed DAP scheme can efficiently utilize the extra DoFs in the near-field region to improve the spectrum efficiency and the energy efficiency as well.

preprint2022arXiv

ICAF: Iterative Contrastive Alignment Framework for Multimodal Abstractive Summarization

Integrating multimodal knowledge for abstractive summarization task is a work-in-progress research area, with present techniques inheriting fusion-then-generation paradigm. Due to semantic gaps between computer vision and natural language processing, current methods often treat multiple data points as separate objects and rely on attention mechanisms to search for connection in order to fuse together. In addition, missing awareness of cross-modal matching from many frameworks leads to performance reduction. To solve these two drawbacks, we propose an Iterative Contrastive Alignment Framework (ICAF) that uses recurrent alignment and contrast to capture the coherences between images and texts. Specifically, we design a recurrent alignment (RA) layer to gradually investigate fine-grained semantical relationships between image patches and text tokens. At each step during the encoding process, cross-modal contrastive losses are applied to directly optimize the embedding space. According to ROUGE, relevance scores, and human evaluation, our model outperforms the state-of-the-art baselines on MSMO dataset. Experiments on the applicability of our proposed framework and hyperparameters settings have been also conducted.

preprint2022arXiv

Pattern-Division Multiplexing for Continuous-Aperture MIMO

In recent years, continuous-aperture multiple-input multiple-output (CAP-MIMO) is reinvestigated to achieve improved communication performance with limited antenna apertures. Unlike the classical MIMO composed of discrete antennas, CAP-MIMO has a continuous antenna surface, which is expected to generate any current distribution (i.e., pattern) and induce controllable spatial electromagnetic waves. In this way, the information can be modulated on the electromagnetic waves, which makes it promising to approach the ultimate capacity of finite apertures. The pattern design for CAP-MIMO is the key factor to determine the communication performance, but it has not been well studied in the literature. In this paper, we propose the pattern-division multiplexing to design the patterns for CAP-MIMO. Specifically, we first derive the system model of a typical multi-user CAP-MIMO system, which allows us to formulate the sum-rate maximization problem. Then, we propose a general pattern-division multiplexing technique to transform the design of continuous pattern functions to the design of their projection lengths on finite orthogonal bases. Based on this technique, we further propose a pattern design scheme to solve the formulated sum-rate maximization problem. Simulation results show that, the sum-rate achieved by the proposed scheme is about 260% higher than that achieved by the benchmark scheme.

preprint2022arXiv

SparCAssist: A Model Risk Assessment Assistant Based on Sparse Generated Counterfactuals

We introduce SparcAssist, a general-purpose risk assessment tool for the machine learning models trained for language tasks. It evaluates models' risk by inspecting their behavior on counterfactuals, namely out-of-distribution instances generated based on the given data instance. The counterfactuals are generated by replacing tokens in rational subsequences identified by ExPred, while the replacements are retrieved using HotFlip or Masked-Language-Model-based algorithms. The main purpose of our system is to help the human annotators to assess the model's risk on deployment. The counterfactual instances generated during the assessment are the by-product and can be used to train more robust NLP models in the future.

preprint2022arXiv

Towards Explainability in NLP: Analyzing and Calculating Word Saliency through Word Properties

The wide use of black-box models in natural language processing brings great challenges to the understanding of the decision basis, the trustworthiness of the prediction results, and the improvement of the model performance. The words in text samples have properties that reflect their semantics and contextual information, such as the part of speech, the position, etc. These properties may have certain relationships with the word saliency, which is of great help for studying the explainability of the model predictions. In this paper, we explore the relationships between the word saliency and the word properties. According to the analysis results, we further establish a mapping model, Seq2Saliency, from the words in a text sample and their properties to the saliency values based on the idea of sequence tagging. In addition, we establish a new dataset called PrSalM, which contains each word in the text samples, the word properties, and the word saliency values. The experimental evaluations are conducted to analyze the saliency of words with different properties. The effectiveness of the Seq2Saliency model is verified.

preprint2021arXiv

Dissonance Between Human and Machine Understanding

Complex machine learning models are deployed in several critical domains including healthcare and autonomous vehicles nowadays, albeit as functional black boxes. Consequently, there has been a recent surge in interpreting decisions of such complex models in order to explain their actions to humans. Models that correspond to human interpretation of a task are more desirable in certain contexts and can help attribute liability, build trust, expose biases and in turn build better models. It is, therefore, crucial to understand how and which models conform to human understanding of tasks. In this paper, we present a large-scale crowdsourcing study that reveals and quantifies the dissonance between human and machine understanding, through the lens of an image classification task. In particular, we seek to answer the following questions: Which (well-performing) complex ML models are closer to humans in their use of features to make accurate predictions? How does task difficulty affect the feature selection capability of machines in comparison to humans? Are humans consistently better at selecting features that make image recognition more accurate? Our findings have important implications on human-machine collaboration, considering that a long term goal in the field of artificial intelligence is to make machines capable of learning and reasoning like humans.

preprint2021arXiv

Explain and Predict, and then Predict Again

A desirable property of learning systems is to be both effective and interpretable. Towards this goal, recent models have been proposed that first generate an extractive explanation from the input text and then generate a prediction on just the explanation called explain-then-predict models. These models primarily consider the task input as a supervision signal in learning an extractive explanation and do not effectively integrate rationales data as an additional inductive bias to improve task performance. We propose a novel yet simple approach ExPred, that uses multi-task learning in the explanation generation phase effectively trading-off explanation and prediction losses. And then we use another prediction network on just the extracted explanations for optimizing the task performance. We conduct an extensive evaluation of our approach on three diverse language datasets -- fact verification, sentiment classification, and QA -- and find that we substantially outperform existing approaches.

preprint2020arXiv

Capacity Improvement in Wideband Reconfigurable Intelligent Surface-Aided Cell-Free Network

Thanks to the strong ability against the inter-cell interference, cell-free network has been considered as a promising technique to improve the network capacity of future wireless systems. However, for further capacity enhancement, it requires to deploy more base stations (BSs) with high cost and power consumption. To address the issue, inspired by the recently proposed technique called reconfigurable intelligent surface (RIS), we propose the concept of RIS-aided cell-free network to improve the network capacity with low cost and power consumption. Then, for the proposed RIS-aided cell-free network in the typical wideband scenario, we formulate the joint precoding design problem at the BSs and RISs to maximize the network capacity. Due to the non-convexity and high complexity of the formulated problem, we develop an alternating optimization algorithm to solve this challenging problem. Note that most of the considered scenarios in existing works are special cases of the general scenario in this paper, and the proposed joint precoding framework can also serve as a general solution to maximize the capacity in most of existing RIS-aided scenarios. Finally, simulation results verify that, compared with the conventional cell-free network, the network capacity of the proposed scheme can be improved significantly.

preprint2020arXiv

Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks

Recently, increasing attention has been drawn to the internal mechanisms of convolutional neural networks, and the reason why the network makes specific decisions. In this paper, we develop a novel post-hoc visual explanation method called Score-CAM based on class activation mapping. Unlike previous class activation mapping based approaches, Score-CAM gets rid of the dependence on gradients by obtaining the weight of each activation map through its forward passing score on target class, the final result is obtained by a linear combination of weights and activation maps. We demonstrate that Score-CAM achieves better visual performance and fairness for interpreting the decision making process. Our approach outperforms previous methods on both recognition and localization tasks, it also passes the sanity check. We also indicate its application as debugging tools. Official code has been released.

preprint2020arXiv

The propagating mechanism of Chapman-Jouguet deflagration

The deflagration-to-detonation transition (DDT) process is of great importance to both combustion theory and industry safety. In this study, the propagating mechanism of Chapman-Jouguet (C-J) deflagration is studied. Firstly, three models are put forth to decouple the C-J detonation front. These three models are (a) to introduce an expansion parameter into the one-dimensional energy equation, (b) to increase the activation energy of the chemical reaction model and (c) to decouple the shock wave from the flame front by artificial method. The C-J deflagration is obtained after the C-J detonation is decoupled by one-dimensional numerical simulations with different models, chemical reaction kinetics and numerical schemes. Secondly, the propagating mechanism of C-J deflagration is discussed. For the C-J deflagration with a propagating velocity of about 1/2 C-J detonation, the static temperature behind the leading shock wave is too low to ignite the combustion. But, the total temperature of the flow induced by the leading shock wave is high enough to ignite the mixture. The induced flow is slowed down by the rarefaction waves form the wall and its static temperature increases. The flame and the leading shock wave propagate with almost the same velocity and the double-discontinuity structure of the flow field keeps stable. The propagating velocity equals to the sound speed of the combustion products, which is about 1/2 C-J detonation velocity.

preprint2018arXiv

Pathological Evidence Exploration in Deep Retinal Image Diagnosis

Though deep learning has shown successful performance in classifying the label and severity stage of certain disease, most of them give few evidence on how to make prediction. Here, we propose to exploit the interpretability of deep learning application in medical diagnosis. Inspired by Koch's Postulates, a well-known strategy in medical research to identify the property of pathogen, we define a pathological descriptor that can be extracted from the activated neurons of a diabetic retinopathy detector. To visualize the symptom and feature encoded in this descriptor, we propose a GAN based method to synthesize pathological retinal image given the descriptor and a binary vessel segmentation. Besides, with this descriptor, we can arbitrarily manipulate the position and quantity of lesions. As verified by a panel of 5 licensed ophthalmologists, our synthesized images carry the symptoms that are directly related to diabetic retinopathy diagnosis. The panel survey also shows that our generated images is both qualitatively and quantitatively superior to existing methods.