Researcher profile

Gang Zhang

Gang Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
18works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

18 published item(s)

preprint2026arXiv

SonicBench: Dissecting the Physical Perception Bottleneck in Large Audio Language Models

Large Audio Language Models (LALMs) excel at semantic and paralinguistic tasks, yet their ability to perceive the fundamental physical attributes of audio such as pitch, loudness, and spatial location remains under-explored. To bridge this gap, we introduce SonicBench, a psychophysically grounded benchmark that systematically evaluates 12 core physical attributes across five perceptual dimensions. Unlike previous datasets, SonicBench uses a controllable generation toolbox to construct stimuli for two complementary paradigms: recognition (absolute judgment) and comparison (relative judgment). This design allows us to probe not only sensory precision but also relational reasoning capabilities, a domain where humans typically exhibit greater proficiency. Our evaluation reveals a substantial deficiency in LALMs' foundational auditory understanding; most models perform near random guessing and, contrary to human patterns, fail to show the expected advantage on comparison tasks. Furthermore, explicit reasoning yields minimal gains. However, our linear probing analysis demonstrates crucially that frozen audio encoders do successfully capture these physical cues (accuracy at least 60%), suggesting that the primary bottleneck lies in the alignment and decoding stages, where models fail to leverage the sensory signals they have already captured.

preprint2025arXiv

Revisiting $d$-distance (independent) domination in trees and in bipartite graphs

The $d$-distance $p$-packing domination number $γ_d^p(G)$ of $G$ is the minimum size of a set of vertices of $G$ which is both a $d$-distance dominating set and a $p$-packing. In 1994, Beineke and Henning conjectured that if $d\ge 1$ and $T$ is a tree of order $n \geq d+1$, then $γ_d^1(T) \leq \frac{n}{d+1}$. They supported the conjecture by proving it for $d\in \{1,2,3\}$. In this paper, it is proved that $γ_d^1(G) \leq \frac{n}{d+1}$ holds for any bipartite graph $G$ of order $n \geq d+1$, and any $d\ge 1$. Trees $T$ for which $γ_d^1(T) = \frac{n}{d+1}$ holds are characterized. It is also proved that if $T$ has $\ell$ leaves, then $γ_d^1(T) \leq \frac{n-\ell}{d}$ (provided that $n-\ell \geq d$), and $γ_d^1(T) \leq \frac{n+\ell}{d+2}$ (provided that $n\geq d$). The latter result extends Favaron's theorem from 1992 asserting that $γ_1^1(T) \leq \frac{n+\ell}{3}$. In both cases, trees that attain the equality are characterized and relevant conclusions for the $d$-distance domination number of trees derived.

preprint2022arXiv

Anharmonic quantum thermal transport across a van der Waals interface

We investigate the anharmonic phonon scattering across a weakly interacting interface by developing a quantum mechanics-based theory. We find that the contribution from anharmonic three-phonon scatterings to interfacial thermal conductance can be cast into Landauer formula with transmission function being temperature-dependent. Surprisingly, in the weak coupling limit, the transmission due to anharmonic phonon scattering is unbounded with increasing temperature, which is physically impossible for two-phonon processes. We further reveal that the anharmonic contribution in a real heterogeneous interface (e.g., between graphene and monolayer molybdenum disulfide) can dominate over the harmonic process even at room temperature, highlighting the important role of anharmonicity in weakly interacting heterogeneous systems.

preprint2022arXiv

CPGNet: Cascade Point-Grid Fusion Network for Real-Time LiDAR Semantic Segmentation

LiDAR semantic segmentation essential for advanced autonomous driving is required to be accurate, fast, and easy-deployed on mobile platforms. Previous point-based or sparse voxel-based methods are far away from real-time applications since time-consuming neighbor searching or sparse 3D convolution are employed. Recent 2D projection-based methods, including range view and multi-view fusion, can run in real time, but suffer from lower accuracy due to information loss during the 2D projection. Besides, to improve the performance, previous methods usually adopt test time augmentation (TTA), which further slows down the inference process. To achieve a better speed-accuracy trade-off, we propose Cascade Point-Grid Fusion Network (CPGNet), which ensures both effectiveness and efficiency mainly by the following two techniques: 1) the novel Point-Grid (PG) fusion block extracts semantic features mainly on the 2D projected grid for efficiency, while summarizes both 2D and 3D features on 3D point for minimal information loss; 2) the proposed transformation consistency loss narrows the gap between the single-time model inference and TTA. The experiments on the SemanticKITTI and nuScenes benchmarks demonstrate that the CPGNet without ensemble models or TTA is comparable with the state-of-the-art RPVNet, while it runs 4.7 times faster.

preprint2022arXiv

Effects of high order interatomic potential on elastic phonon scatterings

Interatomic potentials beyond quadratic order provide scattering sources for phonon transport in lattice. By using a weakly-interacting interface model, we investigated the relation between the order of interatomic potential and the multiple-phonon scattering process. We find that high order interatomic potential not only causes multiple-phonon scattering processes, but also has significant impacts on elastic phonon scattering processes. Using fourth order potential as an example, we show that it can significantly affects elastic phonon scatterings, through the formation of localized phonons. Such impact is closely related to the correlations of interfacial atoms and it becomes more significant with increasing temperature. Our work suggests that it is insufficient to consider only quadratic potential to investigate elastic phonon transport.

preprint2022arXiv

Eliminating edge electronic and phonon states of phosphorene nanoribbon by unique edge reconstruction

Edge termination plays a vital role in determining the properties of 2D materials. By performing compelling ab initio simulations, a lowest-energy U-edge [ZZ(U)] reconstruction is revealed in the bilayer phosphorene. Such reconstruction reduces 60% edge energy compared with the pristine one and occurs almost without energy barrier, implying it should be the dominating edge in reality. The electronic band structure of phosphorene nanoribbon with such reconstruction resembles that of intrinsic 2D layer, exhibiting nearly edgeless band characteristics. Although ZZ(U) changes the topology of phosphorene nanoribbon (PNR), simulated TEM, STEM and STM images indicates it is very hard to be identified. One possible identify method is IR/Raman analyses because ZZ(U) edge alters vibrational modes dramatically. Beyond, it also increases the thermal conductivity of PNR 1.4 and 2.3 times than the pristine and Klein edges.

preprint2022arXiv

Equalized Focal Loss for Dense Long-Tailed Object Detection

Despite the recent success of long-tailed object detection, almost all long-tailed object detectors are developed based on the two-stage paradigm. In practice, one-stage detectors are more prevalent in the industry because they have a simple and fast pipeline that is easy to deploy. However, in the long-tailed scenario, this line of work has not been explored so far. In this paper, we investigate whether one-stage detectors can perform well in this case. We discover the primary obstacle that prevents one-stage detectors from achieving excellent performance is: categories suffer from different degrees of positive-negative imbalance problems under the long-tailed data distribution. The conventional focal loss balances the training process with the same modulating factor for all categories, thus failing to handle the long-tailed problem. To address this issue, we propose the Equalized Focal Loss (EFL) that rebalances the loss contribution of positive and negative samples of different categories independently according to their imbalance degrees. Specifically, EFL adopts a category-relevant modulating factor which can be adjusted dynamically by the training status of different categories. Extensive experiments conducted on the challenging LVIS v1 benchmark demonstrate the effectiveness of our proposed method. With an end-to-end training pipeline, EFL achieves 29.2% in terms of overall AP and obtains significant performance improvements on rare categories, surpassing all existing state-of-the-art methods. The code is available at https://github.com/ModelTC/EOD.

preprint2022arXiv

UFO: Unified Feature Optimization

This paper proposes a novel Unified Feature Optimization (UFO) paradigm for training and deploying deep models under real-world and large-scale scenarios, which requires a collection of multiple AI functions. UFO aims to benefit each single task with a large-scale pretraining on all tasks. Compared with the well known foundation model, UFO has two different points of emphasis, i.e., relatively smaller model size and NO adaptation cost: 1) UFO squeezes a wide range of tasks into a moderate-sized unified model in a multi-task learning manner and further trims the model size when transferred to down-stream tasks. 2) UFO does not emphasize transfer to novel tasks. Instead, it aims to make the trimmed model dedicated for one or more already-seen task. With these two characteristics, UFO provides great convenience for flexible deployment, while maintaining the benefits of large-scale pretraining. A key merit of UFO is that the trimming process not only reduces the model size and inference consumption, but also even improves the accuracy on certain tasks. Specifically, UFO considers the multi-task training and brings two-fold impact on the unified model: some closely related tasks have mutual benefits, while some tasks have conflicts against each other. UFO manages to reduce the conflicts and to preserve the mutual benefits through a novel Network Architecture Search (NAS) method. Experiments on a wide range of deep representation learning tasks (i.e., face recognition, person re-identification, vehicle re-identification and product retrieval) show that the model trimmed from UFO achieves higher accuracy than its single-task-trained counterpart and yet has smaller model size, validating the concept of UFO. Besides, UFO also supported the release of 17 billion parameters computer vision (CV) foundation model which is the largest CV model in the industry.

preprint2021arXiv

Quantum operation of fermionic systems and process tomography using Majorana fermion gates

Quantum tomography is an important tool for the characterisation of quantum operations. In this paper, we present a framework of quantum tomography in fermionic systems. Compared with qubit systems, fermions obey the superselection rule, which sets constraints on states, processes and measurements in a fermionic system. As a result, we can only partly reconstruct an operation that acts on a subset of fermion modes, and the full reconstruction always requires at least one ancillary fermion mode in addition to the subset. We also report a protocol for the full reconstruction based on gates in Majorana fermion quantum computer, including a set of circuits for realising the informationally-complete state preparation and measurement.

preprint2021arXiv

Real Image Super Resolution Via Heterogeneous Model Ensemble using GP-NAS

With advancement in deep neural network (DNN), recent state-of-the-art (SOTA) image superresolution (SR) methods have achieved impressive performance using deep residual network with dense skip connections. While these models perform well on benchmark dataset where low-resolution (LR) images are constructed from high-resolution (HR) references with known blur kernel, real image SR is more challenging when both images in the LR-HR pair are collected from real cameras. Based on existing dense residual networks, a Gaussian process based neural architecture search (GP-NAS) scheme is utilized to find candidate network architectures using a large search space by varying the number of dense residual blocks, the block size and the number of features. A suite of heterogeneous models with diverse network structure and hyperparameter are selected for model-ensemble to achieve outstanding performance in real image SR. The proposed method won the first place in all three tracks of the AIM 2020 Real Image Super-Resolution Challenge.

preprint2021arXiv

Theoretical analysis of thermal boundary conductance of MoS2-SiO2 and WS2-SiO2 interface

Understanding the physical processes involved in interfacial heat transfer is critical for the interpretation of thermometric measurements and the optimization of heat dissipation in nanoelectronic devices that are based on transition metal dichalcogenide (TMD) semiconductors. We model the phononic and electronic contributions to the thermal boundary conductance (TBC) variability for the MoS$_{2}$-SiO$_{2}$ and WS$_{2}$-SiO$_{2}$ interface. A phenomenological theory to model diffuse phonon transport at disordered interfaces is introduced and yields $G$=13.5 and 12.4 MW/K/m$^{2}$ at 300 K for the MoS$_{2}$-SiO$_{2}$ and WS$_{2}$-SiO$_{2} $ interface, respectively. We compare its predictions to those of the coherent phonon model and find that the former fits the MoS$_{2}$-SiO$_{2}$ data from experiments and simulations significantly better. Our analysis suggests that heat dissipation at the TMD-SiO$_{2}$ interface is dominated by phonons scattered diffusely by the rough interface although the electronic TBC contribution can be significant even at low electron densities ($n\leq10^{12}$ cm$^{-2}$) and may explain some of the variation in the experimental TBC data from the literature. The physical insights from our study can be useful for the development of thermally aware designs in TMD-based nanoelectronics.

preprint2020arXiv

1st Place Solution of LVIS Challenge 2020: A Good Box is not a Guarantee of a Good Mask

This article introduces the solutions of the team lvisTraveler for LVIS Challenge 2020. In this work, two characteristics of LVIS dataset are mainly considered: the long-tailed distribution and high quality instance segmentation mask. We adopt a two-stage training pipeline. In the first stage, we incorporate EQL and self-training to learn generalized representation. In the second stage, we utilize Balanced GroupSoftmax to promote the classifier, and propose a novel proposal assignment strategy and a new balanced mask loss for mask head to get more precise mask predictions. Finally, we achieve 41.5 and 41.2 AP on LVIS v1.0 val and test-dev splits respectively, outperforming the baseline based on X101-FPN-MaskRCNN by a large margin.

preprint2020arXiv

Detecting and Studying High-Energy Collider Neutrinos with FASER at the LHC

Neutrinos are copiously produced at particle colliders, but no collider neutrino has ever been detected. Colliders, and particularly hadron colliders, produce both neutrinos and anti-neutrinos of all flavors at very high energies, and they are therefore highly complementary to those from other sources. FASER, the recently approved Forward Search Experiment at the Large Hadron Collider, is ideally located to provide the first detection and study of collider neutrinos. We investigate the prospects for neutrino studies of a proposed component of FASER, FASER$ν$, a 25cm x 25cm x 1.35m emulsion detector to be placed directly in front of the FASER spectrometer in tunnel TI12. FASER$ν$ consists of 1000 layers of emulsion films interleaved with 1-mm-thick tungsten plates, with a total tungsten target mass of 1.2 tons. We estimate the neutrino fluxes and interaction rates at FASER$ν$, describe the FASER$ν$ detector, and analyze the characteristics of the signals and primary backgrounds. For an integrated luminosity of 150 fb$^{-1}$ to be collected during Run 3 of the 14 TeV Large Hadron Collider from 2021-23, and assuming standard model cross sections, approximately 1300 electron neutrinos, 20,000 muon neutrinos, and 20 tau neutrinos will interact in FASER$ν$, with mean energies of 600 GeV to 1 TeV, depending on the flavor. With such rates and energies, FASER will measure neutrino cross sections at energies where they are currently unconstrained, will bound models of forward particle production, and could open a new window on physics beyond the standard model.

preprint2020arXiv

Magnon-magnon interaction and magnon relaxation time in ferromagnetic Cr2Ge2Te6 monolayer

Despite the intense amount of attention and huge potential of two-dimensional (2D) magnets for applications in novel magnetic, magneto-optical, magneto-thermal and magneto-electronic devices, there has yet to be a robust strategy developed to systematically understand magnon-magnon (MMI) interactions at finite temperature. In this paper, we present a first-principles theoretical method to introduce the finite temperature magnon-magnon interaction into Heisenberg Hamiltonian through a nonlinear correction energy. The Wick theorem is used to decouple the four-magnon operators to two-magnon order. We demonstrate the capabilities of this method by studying the strength of MMI in Cr2Ge2Te6 (CGT) monolayer. The spin wave spectrum at finite temperature and the time-dependent spin autocorrelation function are explored. It is found that the magnon relaxation time due to magnon-magnon scattering increases with temperature because of the reduction in magnon energy, while decreases with wavevector and external magnetic field. Our results provide a new insight to understand the magnon damping and energy dissipation in two-dimensional ferromagnetic materials.

preprint2020arXiv

Material Platforms for Defect Qubits and Single Photon Emitters

Quantum technology has grown out of quantum information theory and now provides a valuable tool that researchers from numerous fields can add to their toolbox of research methods. To date, various systems have been exploited to promote the application of quantum information processing. The systems that can be used for quantum technology include superconducting circuits, ultra-cold atoms, trapped ions, semiconductor quantum dots, and solid-state spins and emitters. In this review, we will discuss the state of the art on material platforms for spin-based quantum technology, with a focus on the progress in solid-state spins and emitters in several leading host materials, including diamond, silicon carbide, boron nitride, silicon, two-dimensional semiconductors, and other materials. We will highlight how first-principles calculations can serve as an exceptionally robust tool for finding the novel defect qubits and single photon emitters in solids, through detailed predictions of the electronic, magnetic and optical properties.

preprint2020arXiv

NTIRE 2020 Challenge on Real Image Denoising: Dataset, Methods and Results

This paper reviews the NTIRE 2020 challenge on real image denoising with focus on the newly introduced dataset, the proposed methods and their results. The challenge is a new version of the previous NTIRE 2019 challenge on real image denoising that was based on the SIDD benchmark. This challenge is based on a newly collected validation and testing image datasets, and hence, named SIDD+. This challenge has two tracks for quantitatively evaluating image denoising performance in (1) the Bayer-pattern rawRGB and (2) the standard RGB (sRGB) color spaces. Each track ~250 registered participants. A total of 22 teams, proposing 24 methods, competed in the final phase of the challenge. The proposed methods by the participating teams represent the current state-of-the-art performance in image denoising targeting real noisy images. The newly collected SIDD+ datasets are publicly available at: https://bit.ly/siddplus_data.

preprint2020arXiv

Technical Proposal: FASERnu

FASERnu is a proposed small and inexpensive emulsion detector designed to detect collider neutrinos for the first time and study their properties. FASERnu will be located directly in front of FASER, 480 m from the ATLAS interaction point along the beam collision axis in the unused service tunnel TI12. From 2021-23 during Run 3 of the 14 TeV LHC, roughly 1,300 electron neutrinos, 20,000 muon neutrinos, and 20 tau neutrinos will interact in FASERnu with TeV-scale energies. With the ability to observe these interactions, reconstruct their energies, and distinguish flavors, FASERnu will probe the production, propagation, and interactions of neutrinos at the highest human-made energies ever recorded. The FASERnu detector will be composed of 1000 emulsion layers interleaved with tungsten plates. The total volume of the emulsion and tungsten is 25cm x 25cm x 1.35m, and the tungsten target mass is 1.2 tonnes. From 2021-23, 7 sets of emulsion layers will be installed, with replacement roughly every 20-50 1/fb in planned Technical Stops. In this document, we summarize FASERnu's physics goals and discuss the estimates of neutrino flux and interaction rates. We then describe the FASERnu detector in detail, including plans for assembly, transport, installation, and emulsion replacement, and procedures for emulsion readout and analyzing the data. We close with cost estimates for the detector components and infrastructure work and a timeline for the experiment.

preprint2019arXiv

Three-terminal interface as a thermoelectric generator beyond Seebeck effect

We investigate thermoelectric transport through interfaces with inelastic scatterings by developing a quantum theory, which has been extensively validated by existing theories. We find that under temperature bias, while a two-terminal conductor-insulator interface behaves only as a thermal resistor, a three-terminal conductor-insulator-conductor interface can function as an electricity generator caused by phonon-mediated electron scatterings with heat-charge current separation. Unlike conventional thermoelectrics which is a property of a bulk caused by the Seebeck effect, this thermoelectric behavior is a property of an interface driven by electron-phonon scatterings.