Source author record

Zihao Zhu

Zihao Zhu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision cond-mat.str-el Artificial Intelligence Computation and Language cond-mat.mtrl-sci Cryptography and Security Machine Learning physics.app-ph physics.optics Robotics

Catalog footprint

What is connected

8works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

HoloMotion-1 Technical Report

In this report, we present HoloMotion-1, a humanoid motion foundation model for zero-shot whole-body motion tracking. A key innovation of HoloMotion-1 is to scale control-policy training with a large-scale hybrid motion corpus, where video-reconstructed motions from in-the-wild videos provide the dominant source of motion diversity, while curated motion-capture and in-house motion data provide higher-fidelity supervision and deployment-oriented coverage. This data regime enables HoloMotion-1 to move beyond conventional MoCap-only training and exposes the policy to substantially broader behaviors, capture conditions, and motion styles. Learning from such heterogeneous data introduces new challenges, including reconstruction noise, source-domain mismatch, uneven motion quality, and the need for temporal modeling under large behavioral variation. To address these challenges, HoloMotion-1 integrates large-capacity temporal modeling, a sparsely activated Mixture-of-Experts Transformer with KV-cache inference for real-time control, and a sequence-level training strategy that improves learning efficiency on extended motion sequences. Extensive experiments on multiple unseen motion benchmarks show that HoloMotion-1 generalizes robustly across diverse motion types and capture conditions, significantly improves tracking accuracy over prior methods, and transfers directly to a real humanoid robot without task-specific fine-tuning.

preprint2026arXiv

Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R

Geometric foundation models hold promise for unconstrained dense geometry prediction from uncalibrated images. However, in current feed-forward designs, their predicted confidence scores are heuristic, lack probabilistic interpretation, and often fail to indicate where and how much the predicted geometry can be trusted. To address this gap, we present Trust3R, a lightweight evidential uncertainty framework for feed-forward 3D reconstruction. Trust3R combines gated residual mean refinement with a Normal-Inverse-Wishart evidential head, yielding a closed-form multivariate Student-t distribution for per-point geometric uncertainty. This design provides probabilistically grounded pointmap uncertainty estimates while adding moderate inference overhead. We evaluate on diverse indoor and outdoor benchmarks and compare against MASt3R's built-in confidence map as well as common uncertainty-aware baselines spanning single-pass heteroscedastic regression and sampling-based methods such as MC dropout and deep ensembles. Experimental results show that Trust3R consistently improves risk-coverage and sparsification, and generally improves geometric accuracy. These gains are reflected in stronger uncertainty ranking across benchmarks, with 25% lower AURC and 41% lower AUSE on ScanNet++, providing a practical reliability signal for uncertainty-aware weighting in downstream geometry pipelines. The project page and code are available at https://trust3r-z.github.io/.

preprint2024arXiv

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

Adversarial machine learning (AML) studies the adversarial phenomenon of machine learning, which may make inconsistent or unexpected predictions with humans. Some paradigms have been recently developed to explore this adversarial phenomenon occurring at different stages of a machine learning system, such as backdoor attack occurring at the pre-training, in-training and inference stage; weight attack occurring at the post-training, deployment and inference stage; adversarial attack occurring at the inference stage. However, although these adversarial paradigms share a common goal, their developments are almost independent, and there is still no big picture of AML. In this work, we aim to provide a unified perspective to the AML community to systematically review the overall progress of this field. We firstly provide a general definition about AML, and then propose a unified mathematical framework to covering existing attack paradigms. According to the proposed unified framework, we build a full taxonomy to systematically categorize and review existing representative methods for each paradigm. Besides, using this unified framework, it is easy to figure out the connections and differences among different attack paradigms, which may inspire future researchers to develop more advanced attack paradigms. Finally, to facilitate the viewing of the built taxonomy and the related literature in adversarial machine learning, we further provide a website, \ie, \url{http://adversarial-ml.com}, where the taxonomies and literature will be continuously updated.

preprint2022arXiv

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

In order to achieve a general visual question answering (VQA) system, it is essential to learn to answer deeper questions that require compositional reasoning on the image and external knowledge. Meanwhile, the reasoning process should be explicit and explainable to understand the working mechanism of the model. It is effortless for human but challenging for machines. In this paper, we propose a Hierarchical Graph Neural Module Network (HGNMN) that reasons over multi-layer graphs with neural modules to address the above issues. Specifically, we first encode the image by multi-layer graphs from the visual, semantic and commonsense views since the clues that support the answer may exist in different modalities. Our model consists of several well-designed neural modules that perform specific functions over graphs, which can be used to conduct multi-step reasoning within and between different graphs. Compared to existing modular networks, we extend visual reasoning from one graph to more graphs. We can explicitly trace the reasoning process according to module weights and graph attentions. Experiments show that our model not only achieves state-of-the-art performance on the CRIC dataset but also obtains explicit and explainable reasoning procedures.

preprint2022arXiv

Intrinsic new properties of a quantum spin liquid

Quantum fluctuations are expected to lead to highly entangled spin-liquid states in certain two-dimensional spin-1/2 compounds. We have synthesized and measured thermodynamic properties and muon spin relaxation rates in the copper-based two-dimensional triangular-lattice spin liquids Lu$_3$Cu$_2$Sb$_3$O$_{14}$ and Lu$_3$CuZnSb$_3$O$_{14}$. The former is the least disordered of this kind discovered to date. Magnetic entropy generation at high temperatures has been ruled out after carefully correcting for the lattice specific heat. Surprisingly, roughly half of the magnetic entropy is missing down to temperatures of O(10$^{-3}$) the exchange energy, independent of magnetic field up to $gμ_B H \gtrsim k_BΘ_W$, where $Θ_W$ is the Weiss temperature. The magnetic specific heat divided by temperature $C_M(T)/T$ and muon spin relaxation rate $λ(T)$ are both temperature-independent at low temperatures, followed by logarithmic decreases with increasing temperature. This behavior can be simply characterized by scale-invariant time-dependent fluctuations with a single parameter. Since no cooperative effects due to impurities are observed, the measured properties are intrinsic. They are evidence that in Lu$_3$Cu$_2$Sb$_3$O$_{14}$ massive quantum fluctuations lead to either a gigantic specific heat peak from singlet excitations at very low temperatures or, perhaps less likely, an extensively degenerate possibly topological singlet ground state.

preprint2021arXiv

Material-structure integrated design for ultra-broadband microwave metamaterial absorber

We propose herein a method of material-structure integrated design for broadband absorption of dielectric metamaterial, which is achieved by combination of genetic algorithm and simulation platform. A multi-layered metamaterial absorber with an ultra-broadband absorption from 5.3 to 18 GHz (a relative bandwidth of as high as 109%) is realized numerically and experimentally. In addition, simulated results demonstrate the proposed metamaterial exhibits good incident angle and polarization tolerance, which also are significant criteria for practical applications. By investigating the working principle with theoretical calculation and numerical simulation, it can be found that merging of multiple resonance modes encompassing quarter-wavelength interference cancellation, spoof surface plasmon polariton mode, dielectric resonance mode and grating mode is responsible for a remarkable ultra-broadband absorption. Analysis of respective contribution of material and structure indicates that either of them plays an indispensable role in activating different resonance modes, and symphony of material and structure is essential to afford desirable target performance. The material-structure integrated design philosophy highlights the superiority of coupling material and structure and provides an effective comprehensive optimization strategy for dielectric metamaterials.

preprint2020arXiv

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Visual Dialogue task requires an agent to be engaged in a conversation with human about an image. The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation. In this paper, we propose a novel generative decoding architecture to generate high-quality responses, which moves away from decoding the whole encoded semantics towards the design that advocates both transparency and flexibility. In this architecture, word generation is decomposed into a series of attention-based information selection steps, performed by the novel recurrent Deliberation, Abandon and Memory (DAM) module. Each DAM module performs an adaptive combination of the response-level semantics captured from the encoder and the word-level semantics specifically selected for generating each word. Therefore, the responses contain more detailed and non-repetitive descriptions while maintaining the semantic accuracy. Furthermore, DAM is flexible to cooperate with existing visual dialogue encoders and adaptive to the encoder structures by constraining the information selection mode in DAM. We apply DAM to three typical encoders and verify the performance on the VisDial v1.0 dataset. Experimental results show that the proposed models achieve new state-of-the-art performance with high-quality responses. The code is available at https://github.com/JXZe/DAM.

preprint2020arXiv

Persistent spin dynamics and absence of spin freezing in the $H$-$T$ phase diagram of the 2D triangular antiferromagnet YbMgGaO$_4$

We report results of muon spin relaxation and rotation ($μ$SR) experiments on the spin-liquid candidate~YbMgGaO$_{4}$. No static magnetism $\gtrsim 0.003μ_B$ per Yb ion, ordered or disordered, is observed down to 22~mK, a factor of two lower in temperature than previous measurements. Persistent (temperature-independent) spin dynamics are observed up to 0.20~K and at least 1~kOe, thus extending previous zero-field $μ$SR results over a substantial region of the $H$-$T$ phase diagram. Knight shift measurements in a 10-kOe transverse field reveal two lines with nearly equal amplitudes. Inhomogeneous muon depolarization in a longitudinal field, previously characterized by stretched-exponential relaxation due to spatial inhomogeneity, is fit equally well with two exponentials, also of equal amplitudes. We attribute these results to two interstitial muon sites in the unit cell, rather than disorder or other spatial distribution. Further evidence for this attribution is found from agreement between the ratio of the two measured relaxation rates and calculated mean-square local Yb$^{3+}$ dipolar fields at candidate muon sites. Zero-field data can be understood as a combination of two-exponential dynamic relaxation and quasistatic nuclear dipolar fields.

Zihao Zhu

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

HoloMotion-1 Technical Report

Trust It or Not: Evidential Uncertainty for Feed-Forward 3D Reconstruction with Trust3R

Attacks in Adversarial Machine Learning: A Systematic Survey from the Life-cycle Perspective

From Shallow to Deep: Compositional Reasoning over Graphs for Visual Question Answering

Intrinsic new properties of a quantum spin liquid

Material-structure integrated design for ultra-broadband microwave metamaterial absorber

DAM: Deliberation, Abandon and Memory Networks for Generating Detailed and Non-repetitive Responses in Visual Dialogue

Persistent spin dynamics and absence of spin freezing in the $H$-$T$ phase diagram of the 2D triangular antiferromagnet YbMgGaO$_4$