Source author record

Han Xu

Han Xu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

physics.atom-ph Machine Learning Computer Vision physics.chem-ph Cryptography and Security physics.optics cond-mat.mtrl-sci cond-mat.soft eess.SY physics.app-ph quant-ph Systems and Control

Catalog footprint

What is connected

15works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

We present GLM-5V-Turbo, a step toward native foundation models for multimodal agents. As foundation models are increasingly deployed in real environments, agentic capability depends not only on language reasoning, but also on the ability to perceive, interpret, and act over heterogeneous contexts such as images, videos, webpages, documents, GUIs. GLM-5V-Turbo is built around this objective: multimodal perception is integrated as a core component of reasoning, planning, tool use, and execution, rather than as an auxiliary interface to a language model. This report summarizes the main improvements behind GLM-5V-Turbo across model design, multimodal training, reinforcement learning, toolchain expansion, and integration with agent frameworks. These developments lead to strong performance in multimodal coding, visual tool use, and framework-based agentic tasks, while preserving competitive text-only coding capability. More importantly, our development process offers practical insights for building multimodal agents, highlighting the central role of multimodal perception, hierarchical optimization, and reliable end-to-end verification.

preprint2026arXiv

Interpretable Probability Estimation with LLMs via Shapley Reconstruction

Large Language Models (LLMs) demonstrate potential to estimate the probability of uncertain events, by leveraging their extensive knowledge and reasoning capabilities. This ability can be applied to support intelligent decision-making across diverse fields, such as financial forecasting and preventive healthcare. However, directly prompting LLMs for probability estimation faces significant challenges: their outputs are often noisy, and the underlying predicting process is opaque. In this paper, we propose PRISM: Probability Reconstruction via Shapley Measures, a framework that brings transparency and precision to LLM-based probability estimation. PRISM decomposes an LLM's prediction by quantifying the marginal contribution of each input factor using Shapley values. These factor-level contributions are then aggregated to reconstruct a calibrated final estimate. In our experiments, we demonstrate PRISM improves predictive accuracy over direct prompting and other baselines, across multiple domains including finance, healthcare, and agriculture. Beyond performance, PRISM provides a transparent prediction pipeline: our case studies visualize how individual factors shape the final estimate, helping build trust in LLM-based decision support systems.

preprint2026arXiv

On Discrete Age of Information of Status Updating System With General Packet Arrival Processes

Characterizing Age of Information (AoI) in status updating systems with general arrival and service processes has great significance considering that the interarrival and service time of updates can possibly be arbitrary in a real world. While expressions of average continuous AoI under G/G/1/1 queues have been derived in the paper by Soysal and Ulukus, the discrete case remained unsolved. To address it, this paper gives a fully characterization of probability generation functions (PGF) of discrete AoI under G/G/1/1 settings when preemption is allowed. In the non-preemptive case, this paper gives the expressions of PGF of discrete AoI under G/Geo/1/1 settings, which also extends the former results. The average discrete AoI is derived and discussed based on these new theoretical findings.

preprint2022arXiv

Color Invariant Skin Segmentation

This paper addresses the problem of automatically detecting human skin in images without reliance on color information. A primary motivation of the work has been to achieve results that are consistent across the full range of skin tones, even while using a training dataset that is significantly biased toward lighter skin tones. Previous skin-detection methods have used color cues almost exclusively, and we present a new approach that performs well in the absence of such information. A key aspect of the work is dataset repair through augmentation that is applied strategically during training, with the goal of color invariant feature learning to enhance generalization. We have demonstrated the concept using two architectures, and experimental results show improvements in both precision and recall for most Fitzpatrick skin tones in the benchmark ECU dataset. We further tested the system with the RFW dataset to show that the proposed method performs much more consistently across different ethnicities, thereby reducing the chance of bias based on skin color. To demonstrate the effectiveness of our work, extensive experiments were performed on grayscale images as well as images obtained under unconstrained illumination and with artificial filters. Source code: https://github.com/HanXuMartin/Color-Invariant-Skin-Segmentation

preprint2022arXiv

Defense Against Gradient Leakage Attacks via Learning to Obscure Data

Federated learning is considered as an effective privacy-preserving learning mechanism that separates the client's data and model training process. However, federated learning is still under the risk of privacy leakage because of the existence of attackers who deliberately conduct gradient leakage attacks to reconstruct the client data. Recently, popular strategies such as gradient perturbation methods and input encryption methods have been proposed to defend against gradient leakage attacks. Nevertheless, these defenses can either greatly sacrifice the model performance, or be evaded by more advanced attacks. In this paper, we propose a new defense method to protect the privacy of clients' data by learning to obscure data. Our defense method can generate synthetic samples that are totally distinct from the original samples, but they can also maximally preserve their predictive features and guarantee the model performance. Furthermore, our defense strategy makes the gradient leakage attack and its variants extremely difficult to reconstruct the client data. Through extensive experiments, we show that our proposed defense method obtains better privacy protection while preserving high accuracy compared with state-of-the-art methods.

preprint2022arXiv

Laser-Induced Graphitisation of Diamond Under 30 fs Laser Pulse Irradiation

The degree of laser-induced graphitisation from a sp3-bonded to a sp2-bonded carbon fraction in a single crystal chemical vapour deposited (CVD) diamond under a varying fluence of an ultrashort pulsed laser (30 fs, 800 nm, 1 kHz) irradiation has been studied. The tetrahedral CVD sp3-phase was found to transition to primarily an sp2-aromatic crystalline graphitic fraction below the critical fluence of 3.9 J/cm2, above which predominantly an amorphous carbon was formed. A fractional increase of fluence from 3.3 J/cm2 to 3.9 J/cm2 (~ 20 %) resulted in a substantial (~ three-fold) increased depth of the sp2-graphitised areas owing to the non-linear interactions associated with an fs-laser irradiation. Additionally, formation of C=O carbonyl group was observed below the critical threshold fluence; the C=O cleavage occurred gradually with the increase of irradiation fluence of 30 fs laser light. The implications for these findings on enhancement of fs-driven processing of diamond are discussed.

preprint2021arXiv

Strong field ionisation of Argon: Electron momentum spectra and nondipole effects

We investigate the influence of relativistic nondipole effects on the photoelectron spectra of argon, particularly in the low kinetic energy region (0 eV - 5 eV). In our experiment, we use intense linearly polarised 800 nm laser pulse to ionise Ar from a jet and we record photoelectron energy and momentum distributions using a reaction microscope (REMI). Our measurements show that nondipole effect can cause an energy-dependent asymmetry along the laser propagation direction in the photoelectron energy and momentum spectra. Model simulation based on time-dependent Dirac equation (TDDE) can reproduce our measurement results. The electron trajectory analysis based on classical model reveals that the photoelectron which obtains negative momentum shift along laser propagation direction is caused by the interplay between the Lorenz force induced radiation pressure during its free propagation in continuum and re-scattering by Coulomb potential of the parent ion when it is driven back by the laser field.

preprint2020arXiv

Collaborative Attention Network for Person Re-identification

Jointly utilizing global and local features to improve model accuracy is becoming a popular approach for the person re-identification (ReID) problem, because previous works using global features alone have very limited capacity at extracting discriminative local patterns in the obtained feature representation. Existing works that attempt to collect local patterns either explicitly slice the global feature into several local pieces in a handcrafted way, or apply the attention mechanism to implicitly infer the importance of different local regions. In this paper, we show that by explicitly learning the importance of small local parts and part combinations, we can further improve the final feature representation for Re-ID. Specifically, we first separate the global feature into multiple local slices at different scale with a proposed multi-branch structure. Then we introduce the Collaborative Attention Network (CAN) to automatically learn the combination of features from adjacent slices. In this way, the combination keeps the intrinsic relation between adjacent features across local regions and scales, without losing information by partitioning the global features. Experiment results on several widely-used public datasets including Market-1501, DukeMTMC-ReID and CUHK03 prove that the proposed method outperforms many existing state-of-the-art methods.

preprint2020arXiv

DeepRobust: A PyTorch Library for Adversarial Attacks and Defenses

DeepRobust is a PyTorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. It currently contains more than 10 attack algorithms and 8 defense algorithms in image domain and 9 attack algorithms and 4 defense algorithms in graph domain, under a variety of deep learning architectures. In this manual, we introduce the main contents of DeepRobust with detailed instructions. The library is kept updated and can be found at https://github.com/DSE-MSU/DeepRobust.

preprint2020arXiv

Observation of Dynamic Stark Resonances in Strong-Field Excitation

We investigate AC Stark-shifted resonances in argon with ultrashort near-infrared pulses. Using 30 fs pulses we observe periodic enhancements of the excitation yield in the intensity regions corresponding to the absorption of 13 and 14 photons. By reducing the pulse duration to 6 fs with only a few optical cycles, we also demonstrate that the enhancements are significantly reduced beyond what is measurable in the experiment. Comparing these to numerical predictions, which are in quantitative agreement with experimental results, we find that even though the quantum-state distribution can be broad, the enhancements are largely due to efficient population of a select few AC Stark-shifted resonant states rather than the closing of an ionization channel. Because these resonances are dependent on the frequency and intensity of the laser field, the broad bandwidth of the 6 fs pulses means that the resonance condition is fulfilled across a large range of intensities. This is further exaggerated by volume-averaging effects, resulting in excitation of the $5g$ state at almost all intensities and reducing the apparent magnitude of the enhancements. For 30 fs pulses, volume averaging also broadens the quantum state distribution but the enhancements are still large enough to survive. In this case, selectivity of excitation to a single state is reduced below 25% of the relative population. However, an analysis of TDSE simulations indicates that excitation of up to 60% into a single state is possible if volume averaging can be eliminated and the intensity can be precisely controlled.

preprint2020arXiv

Yet Meta Learning Can Adapt Fast, It Can Also Break Easily

Meta learning algorithms have been widely applied in many tasks for efficient learning, such as few-shot image classification and fast reinforcement learning. During meta training, the meta learner develops a common learning strategy, or experience, from a variety of learning tasks. Therefore, during meta test, the meta learner can use the learned strategy to quickly adapt to new tasks even with a few training samples. However, there is still a dark side about meta learning in terms of reliability and robustness. In particular, is meta learning vulnerable to adversarial attacks? In other words, would a well-trained meta learner utilize its learned experience to build wrong or likely useless knowledge, if an adversary unnoticeably manipulates the given training set? Without the understanding of this problem, it is extremely risky to apply meta learning in safety-critical applications. Thus, in this paper, we perform the initial study about adversarial attacks on meta learning under the few-shot classification problem. In particular, we formally define key elements of adversarial attacks unique to meta learning and propose the first attacking algorithm against meta learning under various settings. We evaluate the effectiveness of the proposed attacking strategy as well as the robustness of several representative meta learning algorithms. Experimental results demonstrate that the proposed attacking strategy can easily break the meta learner and meta learning is vulnerable to adversarial attacks. The implementation of the proposed framework will be released upon the acceptance of this paper.

preprint2016arXiv

Linking particle properties to paste extrusion flow characteristics using discrete element simulations

Extrusion is a widely used process for forming pastes into designed shapes, and is central to the manufacture of many industrial products. The extrusion through a square-entry die of a model paste of non-Brownian spheres suspended in a Newtonian fluid is investigated using discrete element simulations, capturing individual particle contacts and hydrodynamic interactions. The simulations reveal inhomogeneous velocity and stress distributions, originating in the inherent microstructure formed by the constituent particles. Such features are shown to be relevant to generic paste extrusion behaviour, such as die swell. The pressure drop across the extruder is correlated with the extrudate velocity using the Benbow-Bridgwater equation, with the empirical parameters being linked directly to particle properties such as surface friction, and processing conditions such as extruder wall roughness. Our model and results bring recent advances in suspension rheology into an industrial setting, laying foundations for future model development, paste formulation and extrusion design.

preprint2016arXiv

The interaction of excited atoms and few-cycle laser pulses

This work describes the first observations of the ionisation of neon in a metastable atomic state utilising a strong-field, few-cycle light pulse. We compare the observations to theoretical predictions based on the Ammosov-Delone-Krainov (ADK) theory and a solution to the time-dependent Schrodinger equation (TDSE). The TDSE provides better agreement with the experimental data than the ADK theory. We optically pump the target atomic species and demonstrate that the ionisation rate depends on the spin state of the target atoms and provide physically transparent interpretation of such a spin dependence in the frameworks of the spin-polarised Hartree-Fock and random-phase approximations.

preprint2015arXiv

Experimental observation of the elusive double-peak structure in R-dependent strong-field ionization rate of H2+

When a diatomic molecule is ionized by an intense laser field, the ionization rate depends very strongly on the inter-nuclear separation. That dependence exhibits a pronounced maximum at the inter-nuclear separation known as the critical distance. This phenomenon was first demonstrated theoretically in H2+ and became known as charge-resonance enhanced ionization (CREI, in reference to a proposed physical mechanism) or simply enhanced ionisation (EI). All theoretical models of this phenomenon predict a double-peak structure in the R-dependent ionization rate of H2+. However, such double-peak structure has never been observed experimentally. It was even suggested that it is impossible to observe due to fast motion of the nuclear wavepackets. Here we report a few-cycle pump-probe experiment which clearly resolves that elusive double-peak structure. In the experiment, an expanding H2+ ion produced by an intense pump pulse is probed by a much weaker probe pulse. The predicted double-peak structure is clearly seen in delay-dependent kinetic energy spectra of protons when pump and probe pulses are polarized parallel to each other. No structure is seen when the probe is polarized perpendicular to the pump.

preprint2012arXiv

Carrier-Envelope-Phase Dependent Dissociation of Hydrogen

We studied dependence of dissociative ionization in H2 on carrier-envelope phase (CEP) of few-cycle (6fs) near-infrared (NIR) laser pulses. For low-energy channels, we present the first experimental observation of CEP dependence for total dissociation yield and the highest dwgree of asymmetry reported to date (40%). The observed modulations in both asymmetry and total yield could be understood in terms of interference between different n-photon dissociation pathways - n and (n+1) photon channels for asymmetry, n and (n+2) photon channels for yield - as suggested by the general theory of CEP effects (Roudnev and Esry, Phys. Rev. Lett. 99, 220406 (2007), [1]). The yield modulation is found to be Pi-periodic in CEP, with its phase strongly dependent on fragment kinetic energy (and reversing its sign within the studied energy range), indicating that the dissociation does not simply follow the CEP dependence of maximum electric field, as a naive intuition might suggest. We also find that a positively chirped pulse can lead to a higher dissociation probability than a transform limited pulse.

Han Xu

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

Interpretable Probability Estimation with LLMs via Shapley Reconstruction

On Discrete Age of Information of Status Updating System With General Packet Arrival Processes

Color Invariant Skin Segmentation

Defense Against Gradient Leakage Attacks via Learning to Obscure Data

Laser-Induced Graphitisation of Diamond Under 30 fs Laser Pulse Irradiation

Strong field ionisation of Argon: Electron momentum spectra and nondipole effects

Collaborative Attention Network for Person Re-identification

DeepRobust: A PyTorch Library for Adversarial Attacks and Defenses

Observation of Dynamic Stark Resonances in Strong-Field Excitation

Yet Meta Learning Can Adapt Fast, It Can Also Break Easily

Linking particle properties to paste extrusion flow characteristics using discrete element simulations

The interaction of excited atoms and few-cycle laser pulses

Experimental observation of the elusive double-peak structure in R-dependent strong-field ionization rate of H2+

Carrier-Envelope-Phase Dependent Dissociation of Hydrogen