Source author record

Zhe Qu

Zhe Qu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mtrl-sci cond-mat.str-el Machine Learning Artificial Intelligence Distributed, Parallel, and Cluster Computing Cryptography and Security Computer Vision cond-mat.supr-con eess.AS Networking and Internet Architecture quant-ph Sound

Catalog footprint

What is connected

18works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Collective Communication for 100k+ GPUs

The increasing scale of large language models (LLMs) necessitates highly efficient collective communication frameworks, particularly as training workloads extend to hundreds of thousands of GPUs. Traditional communication methods face significant throughput and latency limitations at this scale, hindering both the development and deployment of state-of-the-art models. This paper presents the NCCLX collective communication framework, developed at Meta, engineered to optimize performance across the full LLM lifecycle, from the synchronous demands of large-scale training to the low-latency requirements of inference. The framework is designed to support complex workloads on clusters exceeding 100,000 GPUs, ensuring reliable, high-throughput, and low-latency data exchange. Empirical evaluation on the Llama4 model demonstrates substantial improvements in communication efficiency. This research contributes a robust solution for enabling the next generation of LLMs to operate at unprecedented scales.

preprint2026arXiv

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

Evaluating image captions without references remains challenging because global embedding similarity often misses fine-grained mismatches such as hallucinated objects, missing attributes, or incorrect relations. We propose MSD-Score, a reference-free metric that models image patch and text token embeddings as von Mises-Fisher mixtures on the unit hypersphere. Instead of treating each modality as a single point, MSD-Score formulates image-text matching as a multi-scale distributional scoring problem. Semantic discrepancies are quantified via a weighted bi-directional KL divergence and combined with global similarity in a multi-scale framework for both single- and multi-candidate evaluations. Extensive experiments show that MSD-Score achieves state-of-the-art correlation with human judgments among reference-free metrics. Beyond accuracy, its probabilistic formulation yields transparent and decomposable diagnostics of local grounding errors, providing a deterministic complementary signal to holistic similarity metrics and judge-based evaluators.

preprint2022arXiv

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

A representative model in integrative analysis of two high-dimensional correlated datasets is to decompose each data matrix into a low-rank common matrix generated by latent factors shared across datasets, a low-rank distinctive matrix corresponding to each dataset, and an additive noise matrix. Existing decomposition methods claim that their common matrices capture the common pattern of the two datasets. However, their so-called common pattern only denotes the common latent factors but ignores the common pattern between the two coefficient matrices of these common latent factors. We propose a new unsupervised learning method, called the common and distinctive pattern analysis (CDPA), which appropriately defines the two types of data patterns by further incorporating the common and distinctive patterns of the coefficient matrices. A consistent estimation approach is developed for high-dimensional settings, and shows reasonably good finite-sample performance in simulations. Our simulation studies and real data analysis corroborate that the proposed CDPA can provide better characterization of common and distinctive patterns and thereby benefit data mining.

preprint2022arXiv

Generalized Federated Learning via Sharpness Aware Minimization

Federated Learning (FL) is a promising framework for performing privacy-preserving, distributed learning with a set of clients. However, the data distribution among clients often exhibits non-IID, i.e., distribution shift, which makes efficient optimization difficult. To tackle this problem, many FL algorithms focus on mitigating the effects of data heterogeneity across clients by increasing the performance of the global model. However, almost all algorithms leverage Empirical Risk Minimization (ERM) to be the local optimizer, which is easy to make the global model fall into a sharp valley and increase a large deviation of parts of local clients. Therefore, in this paper, we revisit the solutions to the distribution shift problem in FL with a focus on local learning generality. To this end, we propose a general, effective algorithm, \texttt{FedSAM}, based on Sharpness Aware Minimization (SAM) local optimizer, and develop a momentum FL algorithm to bridge local and global models, \texttt{MoFedSAM}. Theoretically, we show the convergence analysis of these two algorithms and demonstrate the generalization bound of \texttt{FedSAM}. Empirically, our proposed algorithms substantially outperform existing FL studies and significantly decrease the learning deviation.

preprint2022arXiv

LoMar: A Local Defense Against Poisoning Attack on Federated Learning

Federated learning (FL) provides a high efficient decentralized machine learning framework, where the training data remains distributed at remote clients in a network. Though FL enables a privacy-preserving mobile edge computing framework using IoT devices, recent studies have shown that this approach is susceptible to poisoning attacks from the side of remote clients. To address the poisoning attacks on FL, we provide a \textit{two-phase} defense algorithm called {Lo}cal {Ma}licious Facto{r} (LoMar). In phase I, LoMar scores model updates from each remote client by measuring the relative distribution over their neighbors using a kernel density estimation method. In phase II, an optimal threshold is approximated to distinguish malicious and clean updates from a statistical perspective. Comprehensive experiments on four real-world datasets have been conducted, and the experimental results show that our defense strategy can effectively protect the FL system. {Specifically, the defense performance on Amazon dataset under a label-flipping attack indicates that, compared with FG+Krum, LoMar increases the target label testing accuracy from $96.0\%$ to $98.8\%$, and the overall averaged testing accuracy from $90.1\%$ to $97.0\%$.

preprint2022arXiv

On the Convergence of Multi-Server Federated Learning with Overlapping Area

Multi-server Federated learning (FL) has been considered as a promising solution to address the limited communication resource problem of single-server FL. We consider a typical multi-server FL architecture, where the coverage areas of regional servers may overlap. The key point of this architecture is that the clients located in the overlapping areas update their local models based on the average model of all accessible regional models, which enables indirect model sharing among different regional servers. Due to the complicated network topology, the convergence analysis is much more challenging than single-server FL. In this paper, we firstly propose a novel MS-FedAvg algorithm for this multi-server FL architecture and analyze its convergence on non-iid datasets for general non-convex settings. Since the number of clients located in each regional server is much less than in single-server FL, the bandwidth of each client should be large enough to successfully communicate training models with the server, which indicates that full client participation can work in multi-server FL. Also, we provide the convergence analysis of the partial client participation scheme and develop a new biased partial participation strategy to further accelerate convergence. Our results indicate that the convergence results highly depend on the ratio of the number of clients in each area type to the total number of clients in all three strategies. The extensive experiments show remarkable performance and support our theoretical results.

preprint2022arXiv

Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception

Recently, adversarial machine learning attacks have posed serious security threats against practical audio signal classification systems, including speech recognition, speaker recognition, and music copyright detection. Previous studies have mainly focused on ensuring the effectiveness of attacking an audio signal classifier via creating a small noise-like perturbation on the original signal. It is still unclear if an attacker is able to create audio signal perturbations that can be well perceived by human beings in addition to its attack effectiveness. This is particularly important for music signals as they are carefully crafted with human-enjoyable audio characteristics. In this work, we formulate the adversarial attack against music signals as a new perception-aware attack framework, which integrates human study into adversarial attack design. Specifically, we conduct a human study to quantify the human perception with respect to a change of a music signal. We invite human participants to rate their perceived deviation based on pairs of original and perturbed music signals, and reverse-engineer the human perception process by regression analysis to predict the human-perceived deviation given a perturbed signal. The perception-aware attack is then formulated as an optimization problem that finds an optimal perturbation signal to minimize the prediction of perceived deviation from the regressed human perception model. We use the perception-aware framework to design a realistic adversarial music attack against YouTube's copyright detector. Experiments show that the perception-aware attack produces adversarial music with significantly better perceptual quality than prior work.

preprint2022arXiv

Regulate the direct-indirect electronic band gap transition by electron-phonon interaction in BaSnO3

The neutron powder diffraction, specific heat, thermal conductivity, and Raman scattering measurements were presented to study the interplays of lattice, phonons and electrons of the Sr-doping Ba1-xSrxSnO3 (x was less than or equal to 0.1). Although Ba1-xSrxSnO3 kept the cubic lattice, the Raman spectra suggested a dynamic distortion at low temperature. The density functional theory was applied to analyze the electronic structures and phonon dispersions of Ba1-xSrxSnO3(x = 0, 0.0125), and the behaviors of electron bands around Fermi levels were discussed. According to the experimental and theoretical results, the Sr-doping played a significant role in tuning the indirect band gap of BaSnO3 and influenced the electron-phonon interaction.

preprint2021arXiv

Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients

Federated learning (FL) is a new machine learning framework which trains a joint model across a large amount of decentralized computing devices. Existing methods, e.g., Federated Averaging (FedAvg), are able to provide an optimization guarantee by synchronously training the joint model, but usually suffer from stragglers, i.e., IoT devices with low computing power or communication bandwidth, especially on heterogeneous optimization problems. To mitigate the influence of stragglers, this paper presents a novel FL algorithm, namely Hybrid Federated Learning (HFL), to achieve a learning balance in efficiency and effectiveness. It consists of two major components: synchronous kernel and asynchronous updater. Unlike traditional synchronous FL methods, our HFL introduces the asynchronous updater which actively pulls unsynchronized and delayed local weights from stragglers. An adaptive approximation method, Adaptive Delayed-SGD (AD-SGD), is proposed to merge the delayed local updates into the joint model. The theoretical analysis of HFL shows that the convergence rate of the proposed algorithm is $\mathcal{O}(\frac{1}{t+τ})$ for both convex and non-convex optimization problems.

preprint2021arXiv

Topological charge-entropy scaling in kagome Chern magnet TbMn$_6$Sn$_6$

In ordinary materials, electrons conduct both electricity and heat, where their charge-entropy relations observe the Mott formula and the Wiedemann-Franz law. In topological quantum materials, the transverse motion of relativistic electrons can be strongly affected by the quantum field arising around the topological fermions, where a simple model description of their charge-entropy relations remains elusive. Here we report the topological charge-entropy scaling in the kagome Chern magnet TbMn$_6$Sn$_6$, featuring pristine Mn kagome lattices with strong out-of-plane magnetization. Through both electric and thermoelectric transports, we observe quantum oscillations with a nontrivial Berry phase, a large Fermi velocity and two-dimensionality, supporting the existence of Dirac fermions in the magnetic kagome lattice. This quantum magnet further exhibits large anomalous Hall, anomalous Nernst, and anomalous thermal Hall effects, all of which persist to above room temperature. Remarkably, we show that the charge-entropy scaling relations of these anomalous transverse transports can be ubiquitously described by the Berry curvature field effects in a Chern-gapped Dirac model. Our work points to a model kagome Chern magnet for the proof-of-principle elaboration of the topological charge-entropy scaling.

preprint2013arXiv

Single Crystal Growth, Transport, and Electronic Band Structure of YCoGa$_5$

Single crystal of YCoGa5 has been grown via Ga self-flux. In this paper, we report the single crystal growth, crystallographic parameters, resistivity, heat capacity, and band structure results of YCoGa5. YCoGa5 accommodates the HoCoGa5 type structure (space group P4/mmm (No. 123), Z = 1, a = 4.2131(6) A, c = 6.7929(13) A, which is isostructural to the extensively studied heavy fermion superconductor system CeMIn5 (M = Co, Rh, Ir) and the unconventional superconductor PuCoGa5 with Tc = 18.5 K. No superconductivity is observed down to 1.75 K. Band structure calculation results show that its band at the Fermi level is mainly composed of Co-3d and Ga-4p electrons states, which explains its similarity of physical properties to YbCoGa5 and LuCoGa5.

preprint2013arXiv

Spin-phonon coupling probed by infrared transmission spectroscopy in the double perovskite Ba$_2$YMoO$_6$

In this work, we investigate the local structural distortion of the double perovskite Ba$_2$YMoO$_6$ by means of infrared transmission spectroscopy. At 300 K, three bands are observed at $\sim$ 255.1 cm$^{-1}$, $\sim$ 343.4 cm$^{-1}$, and $\sim$ 561.5 cm$^{-1}$, which are related to the motion between the cation Ba$^{2+}$ and the anion YMO$_6^{-2}$, the Y-O stretching motion and the stretching vibration of the MoO$_6$ octahedron, respectively. These modes continue to harden upon cooling owing to the shrink of the lattice constant. When the temperature decreases to $T \leq$ 130 K around which the spin singlet dimer begins to form, an additional phonon mode appears at $\sim$ 611 cm$^{-1}$, suggesting the occurrence of local distortion of MoO$_6$ octahedra. With further decrease of the temperature, its intensity enhances and its peak position keeps unchanged. These results indicate that the formation of the spin singlet dimers is accompanied with the occurrence of the local structure distortion of MoO$_6$ octahedra, providing evidence for the strong spin-phonon coupling in the double perovskite Ba$_2$YMoO$_6$.

preprint2013arXiv

The effect of Al doping on the structure and magnetism in cobaltite CaBaCo4O7

We report the effects of Al-doping on the structure and magnetic properties in CaBa(Co$_{1-x}$Al$_{x}$)$_4$O$_7$ (0$\leq$x$\leq$0.25). The system exhibits a structural transition from an orthorhombic symmetry to a hexagonal symmetry when the Al content exceeds $x =$ 0.1. The Curie temperature and the value of the magnetization decrease with increasing Al doping level, indicating that the ferrimagnetic ground state is gradually suppressed. The ground state eventually transits into a spin-glass state for $x >$ 0.1. Moreover, the short-range magnetic correlations, which occur at high temperatures in CaBaCo$_4$O$_7$, are found to be gradually suppressed with increasing Al content and eventually disappear for $x =$ 0.25. By comparing our results with other Co-site doping cases, we suggest that the lattice and the spin degrees of freedom are relatively decoupled in CaBaCo$_4$O$_7$.

preprint2012arXiv

The effect of disorder on quantum phase transition in the double layered ruthenates (Sr1-xCax)3Ru2O7

(Sr1-xCax)3Ru2O7 is characterized by complex magnetic states, spanning from a long-range antiferromagnetically ordered state over an unusual heavy-mass nearly ferromagnetic (NFM) state to an itinerant metamagnetic (IMM) state. The NFM state, which occurs in the 0.4 > x > 0.08 composition range, freezes into a cluster-spin-glass (CSG) phase at low temperatures [Z. Qu et al., Phys. Rev. B 78, 180407(R) (2008)]. In this article, we present the scaling analyses of magnetization and the specific heat for (Sr1-xCax)3Ru2O7 in the 0.4 > x > 0.08 composition range. We find that in a temperature region immediately above the spin freezing temperature T$_f$, the isothermal magnetization M(H) and the temperature dependence of electronic specific heat C_e(T) exhibit anomalous power-law singularities; both quantities are controlled by a single exponent. The temperature dependence of magnetization M(T) also displays a power-law behavior, but its exponent differs remarkably from that derived from M(H) and C_e(T). Our analyses further reveal that the magnetization data M(H,T) obey a phenomenological scaling law of M(H,T) \propto H^αf(H/T^δ) in a temperature region between the spin freezing temperature T_f and the scaling temperature T_scaling. T_scaling systematically decreases with the decease of Ca content. This scaling law breaks down near the critical concentration x = 0.1 where a CSG-to-IMM phase transition occurs. We discussed these behaviors in term of the effect of disorder on the quantum phase transition.

preprint2011arXiv

Magnetic properties of the ferrimagnetic cobaltite CaBaCo4O7

The magnetic properties of the ferrimagnetic cobaltite CaBaCo$_4$O$_7$ are systematically investigated. We find that the susceptibility exhibits a downward deviation below $\sim$ 360 K, suggesting the occurrence of short range magnetic correlations at temperature well above $T_C$. The effective moment is determined to be 4.5 $μ_B$/f.u, which is consistent with that expected for the Co$^{2+}$/Co$^{3+}$ high spin species. Using a criterion given by Banerjee [Phys. Lett. \textbf{12}, 16 (1964)], we demonstrate that the paramagnetic to ferrimagnetic transition in CaBaCo$_4$O$_7$ has a first order character.

preprint2011arXiv

Spin dynamics in triangular lattice antiferromagnets CuCr$_{1-x}$Mg$_x$O$_2$

The electron spin resonance (ESR) spectroscopy was employed to investigate the spin dynamics in triangular lattice antiferromagnets CuCr$_{1-x}$Mg$_{x}$O$_2$ with $x =$ 0 and 0.02. All spectra can be well fitted by a single Lorentzian lineshape. The analysis of the $g$ factor, the linewidth $\bigtriangleup H$, and the ESR intensity $I$ as a function of temperature suggests the development of significant antiferromagnetic (AFM) spin fluctuations at temperature well above $T_N$ in both samples. However, the evolution of the AFM spin fluctuations is different for each sample. For undoped sample the ESR intensity $I$ is almost temperature independent between $\sim$ 100 K and 50 K and then drops rapidly below 50 K. But for $x =$ 0.02, the $I$ monotonously increases with cooling and reduces rapidly only below $T_N$. These results indicate that the AFM spin fluctuations are extremely strong in the undoped sample and appear to be suppressed upon Mg doping.

preprint2009arXiv

Complex electronic states in double layered ruthenates (Sr1-xCax)3Ru2O7

The magnetic ground state of (Sr$_{1-x}$Ca$_x$)$_3$Ru$_2$O$_7$ (0 $\leq x \leq$ 1) is complex, ranging from an itinerant metamagnetic state (0 $\leq x <$ 0.08), to an unusual heavy-mass, nearly ferromagnetic (FM) state (0.08 $< x <$ 0.4), and finally to an antiferromagnetic (AFM) state (0.4 $\leq x \leq$ 1). In this report we elucidate the electronic properties for these magnetic states, and show that the electronic and magnetic properties are strongly coupled in this system. The electronic ground state evolves from an AFM quasi-two-dimensional metal for $x =$ 1.0, to an Anderson localized state for $0.4 \leq x < 1.0$ (the AFM region). When the magnetic state undergoes a transition from the AFM to the nearly FM state, the electronic ground state switches to a weakly localized state induced by magnetic scattering for $0.25 \leq x < 0.4$, and then to a magnetic metallic state with the in-plane resistivity $ρ_{ab} \propto T^α$ ($α>$ 2) for $0.08 < x < 0.25$. The system eventually transforms into a Fermi liquid ground state when the magnetic ground state enters the itinerant metamagnetic state for $x < 0.08$. When $x$ approaches the critical composition ($x \sim$ 0.08), the Fermi liquid temperature is suppressed to zero Kelvin, and non-Fermi liquid behavior is observed. These results demonstrate the strong interplay between charge and spin degrees of freedom in the double layered ruthenates.

preprint2008arXiv

Unusual heavy-mass nearly ferromagnetic state with a surprisingly large Wilson ratio in the double layered ruthenates (Sr$_{1-x}$Ca$_{x}$)$_{3}$Ru$_{2}$O$_{7}$

We report an unusual nearly ferromagnetic, heavy-mass state with a surprisingly large Wilson ratio $R_{\textrm{w}}$ (e.g., $R_{\textrm{w}}\sim$ 700 for $x =$ 0.2) in double layered ruthenates (Sr$_{1-x}$Ca$_{x}$)$_{3}$Ru$_{2}$O$_{7}$ with 0.08 $< x <$ 0.4. This state does not evolve into a long-range ferromagnetically ordered state despite considerably strong ferromagnetic correlations, but freezes into a cluster-spin-glass at low temperatures. In addition, evidence of non-Fermi liquid behavior is observed as the spin freezing temperature of the cluster-spin-glass approaches zero near $x \approx$ 0.1. We discuss the origin of this unique magnetic state from the Fermi surface information probed by Hall effect measurements.

Zhe Qu

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Collective Communication for 100k+ GPUs

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

CDPA: Common and Distinctive Pattern Analysis between High-dimensional Datasets

Generalized Federated Learning via Sharpness Aware Minimization

LoMar: A Local Defense Against Poisoning Attack on Federated Learning

On the Convergence of Multi-Server Federated Learning with Overlapping Area

Perception-Aware Attack: Creating Adversarial Music via Reverse-Engineering Human Perception

Regulate the direct-indirect electronic band gap transition by electron-phonon interaction in BaSnO3

Stragglers Are Not Disaster: A Hybrid Federated Learning Algorithm with Delayed Gradients

Topological charge-entropy scaling in kagome Chern magnet TbMn$_6$Sn$_6$

Single Crystal Growth, Transport, and Electronic Band Structure of YCoGa$_5$

Spin-phonon coupling probed by infrared transmission spectroscopy in the double perovskite Ba$_2$YMoO$_6$

The effect of Al doping on the structure and magnetism in cobaltite CaBaCo4O7

The effect of disorder on quantum phase transition in the double layered ruthenates (Sr1-xCax)3Ru2O7

Magnetic properties of the ferrimagnetic cobaltite CaBaCo4O7

Spin dynamics in triangular lattice antiferromagnets CuCr$_{1-x}$Mg$_x$O$_2$

Complex electronic states in double layered ruthenates (Sr1-xCax)3Ru2O7

Unusual heavy-mass nearly ferromagnetic state with a surprisingly large Wilson ratio in the double layered ruthenates (Sr$_{1-x}$Ca$_{x}$)$_{3}$Ru$_{2}$O$_{7}$