Source author record

Xinyue Zhang

Xinyue Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning Computation and Language cond-mat.mtrl-sci Cryptography and Security eess.IV Emerging Technologies physics.atom-ph physics.optics

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Can LLMs See Without Pixels? Benchmarking Spatial Intelligence from Textual Descriptions

Recent advancements in Spatial Intelligence (SI) have predominantly relied on Vision-Language Models (VLMs), yet a critical question remains: does spatial understanding originate from visual encoders or the fundamental reasoning backbone? Inspired by this question, we introduce SiT-Bench, a novel benchmark designed to evaluate the SI performance of Large Language Models (LLMs) without pixel-level input, comprises over 3,800 expert-annotated items across five primary categories and 17 subtasks, ranging from egocentric navigation and perspective transformation to fine-grained robotic manipulation. By converting single/multi-view scenes into high-fidelity, coordinate-aware textual descriptions, we challenge LLMs to perform symbolic textual reasoning rather than visual pattern matching. Evaluation results of state-of-the-art (SOTA) LLMs reveals that while models achieve proficiency in localized semantic tasks, a significant "spatial gap" remains in global consistency. Notably, we find that explicit spatial reasoning significantly boosts performance, suggesting that LLMs possess latent world-modeling potential. Our proposed dataset SiT-Bench serves as a foundational resource to foster the development of spatially-grounded LLM backbones for future VLMs and embodied agents. Our code and benchmark will be released at https://github.com/binisalegend/SiT-Bench .

preprint2026arXiv

Physics-Informed Deep Recurrent Back-Projection Network for Tunnel Propagation Modeling

Accurate and efficient modeling of radio wave propagation in railway tunnels is is critical for ensuring reliable communication-based train control (CBTC) systems. Fine-grid parabolic wave equation (PWE) solvers provide high-fidelity field predictions but are computationally expensive for large-scale tunnels, whereas coarse-grid models lose essential modal and geometric details. To address this challenge, we propose a physics-informed recurrent back-projection propagation network (PRBPN) that reconstructs fine-resolution received-signal-strength (RSS) fields from coarse PWE slices. The network integrates multi-slice temporal fusion with an iterative projection/back-projection mechanism that enforces physical consistency and avoids any pre-upsampling stage, resulting in strong data efficiency and improved generalization. Simulations across four tunnel cross-section geometries and four frequencies show that the proposed PRBPN closely tracks fine-mesh PWE references. Engineering-level validation on the Massif Central tunnel in France further confirms robustness in data-scarce scenarios, trained with only a few paired coarse/fine RSS. These results indicate that the proposed PRBPN can substantially reduce reliance on computationally intensive fine-grid solvers while maintaining high-fidelity tunnel propagation predictions.

preprint2025arXiv

OmniBench: Towards The Future of Universal Omni-Language Models

Recent advancements in multimodal large language models (MLLMs) have aimed to integrate and interpret data across diverse modalities. However, the capacity of these models to concurrently process and reason about multiple modalities remains underexplored, partly due to the lack of comprehensive modality-wise benchmarks. We introduce OmniBench, a novel benchmark designed to rigorously evaluate models' ability to recognize, interpret, and reason across visual, acoustic, and textual inputs simultaneously. We define language models capable of such tri-modal processing as the omni-language models (OLMs). OmniBench is distinguished by high-quality human annotations, ensuring that accurate responses require integrated understanding and reasoning across all three modalities. Our main findings reveal that: i) open-source OLMs exhibit critical limitations in instruction-following and reasoning capabilities within tri-modal contexts; and ii) most baselines models perform poorly (below 50% accuracy) even when provided with alternative textual representations of images or/and audio. These results suggest that the ability to construct a consistent context from text, image, and audio is often overlooked in existing MLLM training paradigms. To address this gap, we curate an instruction tuning dataset of 84.5K training samples, OmniInstruct, for training OLMs to adapt to tri-modal contexts. We advocate for future research to focus on developing more robust tri-modal integration techniques and training strategies to enhance OLMs. Codes, data and live leaderboard could be found at https://m-a-p.ai/OmniBench.

preprint2022arXiv

CheXplaining in Style: Counterfactual Explanations for Chest X-rays using StyleGAN

Deep learning models used in medical image analysis are prone to raising reliability concerns due to their black-box nature. To shed light on these black-box models, previous works predominantly focus on identifying the contribution of input features to the diagnosis, i.e., feature attribution. In this work, we explore counterfactual explanations to identify what patterns the models rely on for diagnosis. Specifically, we investigate the effect of changing features within chest X-rays on the classifier's output to understand its decision mechanism. We leverage a StyleGAN-based approach (StyleEx) to create counterfactual explanations for chest X-rays by manipulating specific latent directions in their latent space. In addition, we propose EigenFind to significantly reduce the computation time of generated explanations. We clinically evaluate the relevancy of our counterfactual explanations with the help of radiologists. Our code is publicly available.

preprint2020arXiv

Differentially Private and Fair Classification via Calibrated Functional Mechanism

Machine learning is increasingly becoming a powerful tool to make decisions in a wide variety of applications, such as medical diagnosis and autonomous driving. Privacy concerns related to the training data and unfair behaviors of some decisions with regard to certain attributes (e.g., sex, race) are becoming more critical. Thus, constructing a fair machine learning model while simultaneously providing privacy protection becomes a challenging problem. In this paper, we focus on the design of classification model with fairness and differential privacy guarantees by jointly combining functional mechanism and decision boundary fairness. In order to enforce $ε$-differential privacy and fairness, we leverage the functional mechanism to add different amounts of Laplace noise regarding different attributes to the polynomial coefficients of the objective function in consideration of fairness constraint. We further propose an utility-enhancement scheme, called relaxed functional mechanism by adding Gaussian noise instead of Laplace noise, hence achieving $(ε,δ)$-differential privacy. Based on the relaxed functional mechanism, we can design $(ε,δ)$-differentially private and fair classification model. Moreover, our theoretical analysis and empirical results demonstrate that our two approaches achieve both fairness and differential privacy while preserving good utility and outperform the state-of-the-art algorithms.

preprint2020arXiv

Optical anapole mode in nanostructured lithium niobate for enhancing second harmonic generation

Second harmonic generation (SHG) with a material of large transparency is an attractive way of generating coherent light sources at exotic wavelength range such as VUV, UV and visible light. It is of critical importance to improve nonlinear conversion efficiency in order to find practical applications in quantum light source and high resolution nonlinear microscopy, etc. Here an enhanced SHG with conversion efficiency up to the order of 0.01% at SH wavelength of 282 nm under 11 GW/cm2 pump power via the excitation of anapole in lithium niobite (LiNbO3, or LN) nanodisk through the dominating d33 nonlinear coefficient is investigated. The anapole has advantages of strongly suppressing far-field scattering and well-confined internal field which helps to boost the nonlinear conversion. Anapoles in LN nanodisk is facilitated by high index contrast between LN and substrate with properties of near-zero-index via hyperbolic metamaterial structure design. By tailoring the multi-layers structure of hyperbolic metamaterials, the anapole excitation wavelength can be tuned at different wavelengths. It indicates that an enhanced SHG can be achieved at a wide range of pump light wavelengths via different design of the epsilon-near-zero (ENZ) hyperbolic metamaterials substrates. The proposed nanostructure in this work might hold significances for the enhanced light-matter interactions at the nanoscale such as integrated optics.

preprint2019arXiv

Stable halogen 2D Materials: the case of iodine and astatine

Two-dimensional (2D) materials have wide applications towards electronic devices, energy storages, and catalysis, et al. So far, most of the pure element 2D materials are composed of group IIIA,IVA, and VA elements. Beyond the scope, the orbit hybrid configuration becomes a key fact to influence 2D structure stably. Here we show a sp2d3 hybridization in the outmost electrons with O-shell for Iodine and P-shell for astatine element, builds up triangle configuration (beta-type) to form 2D structures beta-iodiene and beta-astatiene. Each atom is connected by pi bonds, and surrounded by 6 atoms. The pi bonds become possible, and band gap approaches zero because of interaction of unpaired single electron to each atom, depending on reducing bond length. By applying compression strain or spin orbit coupling (SOC), the Dirac points or topological nontrivial points can be available in the beta-iodiene and beta-astatiene. Our discovery has paved a new way to construction of 2D materials.

preprint2013arXiv

Production of very-high-$n$ strontium Rydberg atoms

The production of very-high-$n$, $n\sim300$-500, strontium Rydberg atoms is explored using a crossed laser-atom beam geometry. $n$$^{1}$S$_{0}$ and $n$$^{1}$D$_{2}$ states are created by two-photon excitation via the 5s5p $^{1}$P$_{1}$ intermediate state using radiation with wavelengths of $\sim$~461 and $\sim$ 413 nm. Rydberg atom densities as high as $\sim 3 \times 10^{5}$ cm$^{-3}$ have been achieved, sufficient that Rydberg-Rydberg interactions can become important. The isotope shifts in the Rydberg series limits are determined by tuning the 461 nm light to preferentially excite the different strontium isotopes. Photoexcitation in the presence of an applied electric field is examined. The initially quadratic Stark shift of the $n$$^{1}$P$_{1}$ and $n$$^{1}$D$_{2}$ states becomes near-linear at higher fields and the possible use of $n{}^{1}$D$_{2}$ states to create strongly-polarized, quasi-one-dimensional electronic states in strontium is discussed. The data are analyzed with the aid of a two-active-electron (TAE) approximation. The two-electron Hamiltonian, within which the Sr$^{2+}$ core is represented by a semi-empirical potential, is numerically diagonalized allowing calculation of the energies of high-$n$ Rydberg states and their photoexcitation probabilities.