Source author record

Lu Zhao

Lu Zhao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ph nucl-th hep-ex hep-lat Artificial Intelligence Computer Vision cond-mat.mtrl-sci cond-mat.supr-con Distributed, Parallel, and Cluster Computing math.NA Numerical Analysis

Catalog footprint

What is connected

11works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. This imbalance leads to memory overflow on GPUs with limited capacity, constraining model scalability. Existing load balancing methods, which cap expert capacity, compromise model accuracy and fail on memory-constrained hardware. To address this, we propose MemFine, a memory-aware fine-grained scheduling framework for MoE training. MemFine decomposes the token distribution and expert computation into manageable chunks and employs a chunked recomputation strategy, dynamically optimized through a theoretical memory model to balance memory efficiency and throughput. Experiments demonstrate that MemFine reduces activation memory by 48.03% and improves throughput by 4.42% compared to full recomputation-based baselines, enabling stable large-scale MoE training on memory-limited GPUs.

preprint2022arXiv

Incoherent phonon transport dominates heat conduction across van der Waals superlattices

Heat conduction mechanisms in superlattices could be different across different types of interfaces. Van der Waals superlattices are structures physically assembled through weak van der Waals interactions by design, and may host properties beyond the traditional limits of lattice matching and processing compatibility, offering new types of interfaces. In this work, natural van der Waals (SnS)1.17(NbS2)n superlattices are synthesized, and their thermal conductivities are measured by time-domain thermoreflectance as a function of interface density. Our results show that heat conduction of (SnS)1.17(NbS2)n superlattices is dominated by interface scattering when the coherent length of phonons is larger than the superlattice period, indicating incoherent phonon transport dominates cross-plane heat conduction in van der Waals superlattices even when the period is atomically thin and abrupt. Moreover, our result suggests that the widely accepted heat conduction mechanism for conventional superlattices that coherent phonons dominate when the period is short, is not applicable due to symmetry breaking in most van der Waals superlattices. Our findings provide new insight for understanding the thermal behavior of van der Waals superlattices, and devise approaches for effective thermal management of superlattices depending on the distinct types of interfaces.

preprint2022arXiv

STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression

Deep neural networks (DNNs) have delivered a remarkable performance in many tasks of computer vision. However, over-parameterized representations of popular architectures dramatically increase their computational complexity and storage costs, and hinder their availability in edge devices with constrained resources. Regardless of many tensor decomposition (TD) methods that have been well-studied for compressing DNNs to learn compact representations, they suffer from non-negligible performance degradation in practice. In this paper, we propose Scalable Tensorizing Networks (STN), which dynamically and adaptively adjust the model size and decomposition structure without retraining. First, we account for compression during training by adding a low-rank regularizer to guarantee networks' desired low-rank characteristics in full tensor format. Then, considering network layers exhibit various low-rank structures, STN is obtained by a data-driven adaptive TD approach, for which the topological structure of decomposition per layer is learned from the pre-trained model, and the ranks are selected appropriately under specified storage constraints. As a result, STN is compatible with arbitrary network architectures and achieves higher compression performance and flexibility over other tensorizing versions. Comprehensive experiments on several popular architectures and benchmarks substantiate the superiority of our model towards improving parameter efficiency.

preprint2021arXiv

Inverse obstacle scattering for elastic waves in the time domain

This paper concerns an inverse elastic scattering problem which is to determine a rigid obstacle from time domain scattered field data for a single incident plane wave. By using Helmholtz decomposition, we reduce the initial-boundary value problem of the time domain Navier equation to a coupled initial-boundary value problem of wave equations, and prove the uniqueness of the solution for the coupled problem by employing energy method. The retarded single layer potential is introduced to establish the coupled boundary integral equations, and the uniqueness is discussed for the solution of the coupled boundary integral equations. Based on the convolution quadrature method for time discretization, the coupled boundary integral equations are reformulated into a system of boundary integral equations in s-domain, and then a convolution quadrature based nonlinear integral equation method is proposed for the inverse problem. Numerical experiments are presented to show the feasibility and effectiveness of the proposed method.

preprint2015arXiv

The recoil correction and spin-orbit force for the possible $B^* \bar{B}^{}$ and $D^ \bar{D}^{*}$ states

In the framework of the one-boson exchange model, we have calculated the effective potentials between two heavy mesons $B^* \bar{B}^{*}$ and $D^* \bar{D}^{*}$ from the t- and u-channel $π$-, $η$-, $ρ$-, $ω$- and $σ$-meson exchanges. We keep the recoil corrections to the $B^* \bar{B}^{*}$ and $D^* \bar{D}^{*}$ systems up to $O(\frac{1}{M^2})$, which turns out to be important for the very loosely bound molecular states. Our numerical results show that the momentum-related corrections are favorable to the formation of the molecular states in the $I^G=1^+$, $J^{PC}=1^{+-}$ in the $B^* \bar{B}^{*}$ and $D^* \bar{D}^{*}$ systems.

preprint2014arXiv

Fabrication of superconducting nanowires based on ultra-thin Nb films by means of nanoimprint lithography

Nanoimprint lithography (NIL) is an attractive nonconventional lithographic technique in the fabrication of superconducting nanowires for superconducting nanowire single-photon detectors (SNSPDs) with large effective detection areas or multi-element devices consisting of hundreds of SNSPDs, due to its low cost and high throughput. In this work, NIL was used to pattern superconducting nanowires with meander-type structures based on ultra-thin (~4 nm) Nb films deposited by DC-magnetron sputtering at room temperature. A combination of thermal-NIL and UV-NIL was exploited to transfer the meander pattern from the imprint hard mold to Nb films. The hard mold based on Si wafer was defined by e-beam lithography (EBL), which was almost nonexpendable due to the application of IPS as a soft mold to transfer the patterns to the imprint resist in the NIL process. The specimens fabricated by NIL keep good superconducting properties which are comparable to that by conventional EBL process.

preprint2014arXiv

Hidden-Charm Tetraquarks and Charged Zc States

Experimentally several charged axial-vector hidden-charm states were reported. Within the framework of the color-magnetic interaction, we have systematically considered the mass spectrum of the hidden-charm and hidden-bottom tetraquark states. It is impossible to accommodate all the three charged states $Z_c(3900)$, $Z_c(4025)$ and $Z_c(4200)$ within the axial vector tetraquark spectrum simultaneously. Not all these three states are tetraquark candidates. Moreover, the eigenvector of the chromomagnetic interaction contains valuable information of the decay pattern of the tetraquark states. The dominant decay mode of the lowest axial vector tetraquark state is $J/ψπ$ while its $D^*\bar{D}$ and $\bar{D}^*D^*$ modes are strongly suppressed, which is in contrast with the fact that the dominant decay mode of $Z_c(3900)$ and $Z_c(4025)$ is $\bar{D}D^*$ and $\bar{D}^*D^*$ respectively. We emphasize that all the available experimental information indicates that $Z_c(4200)$ is a very promising candidate of the lowest axial vector hidden-charm tetraquark state.

preprint2014arXiv

The Spin-orbit force, recoil corrections and possible $B \bar{B}^{}$ and $D \bar{D}^{}$ molecular states

In the framework of the one boson exchange model, we have calculated the effective potentials between two heavy mesons $B \bar{B}^{*}$ and $D \bar{D}^{*}$ from the t- and u-channel $π$, $η$, $ρ$, $ω$ and $σ$ meson exchange with four kinds of quantum number: $I=0$, $J^{PC}=1^{++}$; $I=0$, $J^{PC}=1^{+-}$; $I=1$, $J^{PC}=1^{++}$; $I=1$, $J^{PC}=1^{+-}$. We keep the recoil corrections to the $B \bar{B}^{*}$ and $D \bar{D}^{*}$ system up to $O(\frac{1}{M^2})$. The spin orbit force appears at $O(\frac{1}{M})$, which turns out to be important for the very loosely bound molecular states. Our numerical results show that the momentum-related corrections are unfavorable to the formation of the molecular states in the $I=0$, $J^{PC}=1^{++}$ and $I=1$, $J^{PC}=1^{+-}$ channels in the $D \bar{D}^{*}$ systems.

preprint2013arXiv

A possible NN*(1440) quasi-molecular state

Inspired by the recent observation of a narrow resonance-like structure around 2360 MeV in the p+n to d + π0 + π0 cross section, the possibility of forming a NN*(1440) quasi-molecular state is investigated by using a meson exchange model in which the π, σ, ρ and ω exchanges in t- and u-channels are considered. By adopting the coupling constants extracted from the relevant NN scattering and N*(1440) decay data, it is found that a deuteron-like quasi-molecular state of NN*(1440) with a binding energy in the range of from 2 to 67MeV can be formed. Therefore, it is speculated that the observed structure around 2360 MeV might be or may have a large component of the NN*(1440) quasi-molecular state.

preprint2013arXiv

Prediction of super-heavy $N^$ and $Λ^$ resonances with hidden beauty

The meson-baryon coupled channel unitary approach with the local hidden gauge formalism is extended to the hidden beauty sector. A few narrow $N^*$ and $Λ^*$ resonances around 11 GeV are predicted as dynamically generated states from the interactions of heavy beauty mesons and baryons. Production cross sections of these predicted resonances in $pp$ and $ep$ collisions are estimated as a guide for the possible experimental search at relevant facilities.

preprint2013arXiv

The meson-exchange model for the $Λ\barΛ$ interaction

In the present work, we apply the one-boson-exchange potential (OBEP) model to investigate the possibility of Y(2175) and $η(2225)$ as bound states of $Λ\barΛ(^3S_1)$ and $Λ\barΛ(^1S_0)$ respectively. We consider the effective potential from the pseudoscalar $η$-exchange and $η^{'}$-exchange, the scalar $σ$-exchange, and the vector $ω$-exchange and $ϕ$-exchange. The $η$ and $η^{'}$ meson exchange potential is repulsive force for the state $^1S_0$ and attractive for $^3S_1$. The results depend very sensitively on the cutoff parameter of the $ω$-exchange ($Λ_ω$) and least sensitively on that of the $ϕ$-exchange ($Λ_ϕ$). Our result suggests the possible interpretation of Y(2175) and $η(2225)$ as the bound states of $Λ\barΛ(^3S_1)$ and $Λ\barΛ(^1S_0)$ respectively.

Lu Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

Incoherent phonon transport dominates heat conduction across van der Waals superlattices

STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression

Inverse obstacle scattering for elastic waves in the time domain

The recoil correction and spin-orbit force for the possible $B^* \bar{B}^{}$ and $D^ \bar{D}^{*}$ states

Fabrication of superconducting nanowires based on ultra-thin Nb films by means of nanoimprint lithography

Hidden-Charm Tetraquarks and Charged Zc States

The Spin-orbit force, recoil corrections and possible $B \bar{B}^{}$ and $D \bar{D}^{}$ molecular states

A possible NN*(1440) quasi-molecular state

Prediction of super-heavy $N^$ and $Λ^$ resonances with hidden beauty

The meson-exchange model for the $Λ\barΛ$ interaction

Lu Zhao

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

Incoherent phonon transport dominates heat conduction across van der Waals superlattices

STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression

Inverse obstacle scattering for elastic waves in the time domain

The recoil correction and spin-orbit force for the possible $B^* \bar{B}^{*}$ and $D^* \bar{D}^{*}$ states

Fabrication of superconducting nanowires based on ultra-thin Nb films by means of nanoimprint lithography

Hidden-Charm Tetraquarks and Charged Zc States

The Spin-orbit force, recoil corrections and possible $B \bar{B}^{*}$ and $D \bar{D}^{*}$ molecular states

A possible NN*(1440) quasi-molecular state

Prediction of super-heavy $N^*$ and $Λ^*$ resonances with hidden beauty

The meson-exchange model for the $Λ\barΛ$ interaction

The recoil correction and spin-orbit force for the possible $B^* \bar{B}^{}$ and $D^ \bar{D}^{*}$ states

The Spin-orbit force, recoil corrections and possible $B \bar{B}^{}$ and $D \bar{D}^{}$ molecular states

Prediction of super-heavy $N^$ and $Λ^$ resonances with hidden beauty