Researcher profile

Lu Zhao

Lu Zhao contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

MemFine: Memory-Aware Fine-Grained Scheduling for MoE Training

The training of large-scale Mixture of Experts (MoE) models faces a critical memory bottleneck due to severe load imbalance caused by dynamic token routing. This imbalance leads to memory overflow on GPUs with limited capacity, constraining model scalability. Existing load balancing methods, which cap expert capacity, compromise model accuracy and fail on memory-constrained hardware. To address this, we propose MemFine, a memory-aware fine-grained scheduling framework for MoE training. MemFine decomposes the token distribution and expert computation into manageable chunks and employs a chunked recomputation strategy, dynamically optimized through a theoretical memory model to balance memory efficiency and throughput. Experiments demonstrate that MemFine reduces activation memory by 48.03% and improves throughput by 4.42% compared to full recomputation-based baselines, enabling stable large-scale MoE training on memory-limited GPUs.

preprint2022arXiv

Incoherent phonon transport dominates heat conduction across van der Waals superlattices

Heat conduction mechanisms in superlattices could be different across different types of interfaces. Van der Waals superlattices are structures physically assembled through weak van der Waals interactions by design, and may host properties beyond the traditional limits of lattice matching and processing compatibility, offering new types of interfaces. In this work, natural van der Waals (SnS)1.17(NbS2)n superlattices are synthesized, and their thermal conductivities are measured by time-domain thermoreflectance as a function of interface density. Our results show that heat conduction of (SnS)1.17(NbS2)n superlattices is dominated by interface scattering when the coherent length of phonons is larger than the superlattice period, indicating incoherent phonon transport dominates cross-plane heat conduction in van der Waals superlattices even when the period is atomically thin and abrupt. Moreover, our result suggests that the widely accepted heat conduction mechanism for conventional superlattices that coherent phonons dominate when the period is short, is not applicable due to symmetry breaking in most van der Waals superlattices. Our findings provide new insight for understanding the thermal behavior of van der Waals superlattices, and devise approaches for effective thermal management of superlattices depending on the distinct types of interfaces.

preprint2022arXiv

STN: Scalable Tensorizing Networks via Structure-Aware Training and Adaptive Compression

Deep neural networks (DNNs) have delivered a remarkable performance in many tasks of computer vision. However, over-parameterized representations of popular architectures dramatically increase their computational complexity and storage costs, and hinder their availability in edge devices with constrained resources. Regardless of many tensor decomposition (TD) methods that have been well-studied for compressing DNNs to learn compact representations, they suffer from non-negligible performance degradation in practice. In this paper, we propose Scalable Tensorizing Networks (STN), which dynamically and adaptively adjust the model size and decomposition structure without retraining. First, we account for compression during training by adding a low-rank regularizer to guarantee networks' desired low-rank characteristics in full tensor format. Then, considering network layers exhibit various low-rank structures, STN is obtained by a data-driven adaptive TD approach, for which the topological structure of decomposition per layer is learned from the pre-trained model, and the ranks are selected appropriately under specified storage constraints. As a result, STN is compatible with arbitrary network architectures and achieves higher compression performance and flexibility over other tensorizing versions. Comprehensive experiments on several popular architectures and benchmarks substantiate the superiority of our model towards improving parameter efficiency.

preprint2021arXiv

Inverse obstacle scattering for elastic waves in the time domain

This paper concerns an inverse elastic scattering problem which is to determine a rigid obstacle from time domain scattered field data for a single incident plane wave. By using Helmholtz decomposition, we reduce the initial-boundary value problem of the time domain Navier equation to a coupled initial-boundary value problem of wave equations, and prove the uniqueness of the solution for the coupled problem by employing energy method. The retarded single layer potential is introduced to establish the coupled boundary integral equations, and the uniqueness is discussed for the solution of the coupled boundary integral equations. Based on the convolution quadrature method for time discretization, the coupled boundary integral equations are reformulated into a system of boundary integral equations in s-domain, and then a convolution quadrature based nonlinear integral equation method is proposed for the inverse problem. Numerical experiments are presented to show the feasibility and effectiveness of the proposed method.