Source author record

Han Meng

Han Meng appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.mes-hall Machine Learning Artificial Intelligence Computation and Language Distributed, Parallel, and Cluster Computing eess.SP

Catalog footprint

What is connected

6works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

Layerwise offloading reduces the GPU memory footprint of large diffusion transformer (DiT) inference by prefetching upcoming layers from host memory, but its effectiveness hinges on hiding prefetch latency behind per-layer computation. This assumption breaks down when the per-GPU compute workload is small. Moreover, on PCIe-only nodes, prefetch and inter-GPU collective communications such as all-reduce and all-to-all contend on the shared PCIe path, exposing prefetch latency even when compute would otherwise hide it. We revisit layerwise offloading as a co-scheduling problem between prefetch and communication, guided by a first-order analytical model that predicts when prefetch can be hidden by computation. Building on this model, we design ChunkFlow, a communication-aware, chunk-granular offloading runtime that adaptively yields to collective communication and smoothly trades GPU memory for prefetch volume. On three representative diffusion transformers running on two H100 GPUs over PCIe with Ulysses sequence parallelism, ChunkFlow delivers up to 1.28x step-time speedup over SGLang's existing layerwise offloading, reduces peak GPU memory by up to 49% over the no-offload baseline at near-identical step time once the workload is large enough, and exposes a tunable memory-latency tradeoff that recovers near-zero step-time overhead in the small-workload regime.

preprint2026arXiv

DIP: Dynamic In-Context Planner For Diffusion Language Models

Diffusion language models (DLMs) have shown strong potential for general natural language tasks with in-context examples. However, due to the bidirectional attention mechanism, DLMs incur substantial computational cost as context length increases. This work addresses this issue with a key discovery: unlike the sequential generation in autoregressive language models (ARLMs), the diffusion generation paradigm in DLMs allows \textit{efficient dynamic adjustment of the context} during generation. Building on this insight, we propose \textbf{D}ynamic \textbf{I}n-Context \textbf{P}lanner (DIP), a context-optimization method that dynamically selects and inserts in-context examples during generation, rather than providing all examples in the prompt upfront. Results show DIP maintains generation quality while achieving up to 12.9$\times$ inference speedup over standard inference and 1.17$\times$ over KV cache-enhanced inference.

preprint2022arXiv

Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network

This paper proposes a knowledge-and-data-driven graph neural network-based collaboration learning model for reliable aircraft recognition in a heterogeneous radar network. The aircraft recognizability analysis shows that: (1) the semantic feature of an aircraft is motion patterns driven by the kinetic characteristics, and (2) the grammatical features contained in the radar cross-section (RCS) signals present spatial-temporal-frequency (STF) diversity decided by both the electromagnetic radiation shape and motion pattern of the aircraft. Then a STF graph attention convolutional network (STFGACN) is developed to distill semantic features from the RCS signals received by the heterogeneous radar network. Extensive experiment results verify that the STFGACN outperforms the baseline methods in terms of detection accuracy, and ablation experiments are carried out to further show that the expansion of the information dimension can gain considerable benefits to perform robustly in the low signal-to-noise ratio region.

preprint2020arXiv

Thermal conductivity of one-dimensional carbon-boron nitride van der Waals heterostructure: A molecular dynamics study

Investigating thermal transport in van der Waals heterostructure is of scientific interest and practical importance for their applications in a broad range. In this work, thermal conductivity of one-dimensional heterostructure consisting of carbon and boron nitride nanotubes is systematically investigated via molecular dynamics simulations. Thermal conductivity is found to have strong dependences on temperature, length and diameter. In addition, the axial strain and intensity of van der Waals interaction are demonstrated to be able to modulate thermal conductivity up to about 43% and 37%, respectively. Moreover, the dependence of thermal conductivity on the chirality of componential nanotubes is studied. These results are explained based on lattice dynamics insights. This work not only provides feasible strategies to modulate thermal conductivity, but also enhances the understanding of the fundamental physics of phonon transport in one-dimensional heterostructure.

preprint2016arXiv

Nano-Cross-Junction Effect on Phonon Transport in Silicon-Nanowire-Cages

Wave effects of phonons can give rise to controllability of heat conduction beyond that by particle scattering at surfaces and interfaces. In this work, we propose a new class of 3D nanostructure: a silicon-nanowire-cage (SiNWC) structure consisting of silicon nanowires (SiNWs) connected by nano-cross-junctions (NCJs). We perform equilibrium molecular dynamics (MD) simulations, and find an ultralow value of thermal conductivity of SiNWC, 0.173 Wm-1K-1, which is one order lower than that of SiNWs. By further modal analysis and atomistic Green's function calculations, we identify that the large reduction is due to significant phonon localization induced by the phonon local resonance and hybridization at the junction part in a wide range of phonon modes. This localization effect does not require the cage to be periodic, unlike the phononic crystals, and can be realized in structures that are easier to synthesize, for instance in a form of randomly oriented SiNWs network.

preprint2016arXiv

The unexpected thermal conductivity from graphene disk, carbon nanocone to carbon nanotube

Graphene and single-wall carbon nanotube (SWCNT) have attracted great attention because of their ultra-high thermal conductivity. However, there are few works exploring the relations of their thermal conductivity quantitatively. The carbon nanocone (CNC) is a graded structure fall in between graphene disk (GD) and SWCNT. We perform non-equilibrium molecular dynamics (NEMD) simulation to study the thermal conductivity of CNC with different apex angles, and then compare them with that of GD and SWCNT. Our results show that, different from the homogeneous thermal conductivity in SWCNT, the CNC also has a natural graded thermal conductivity which is similar to the GD. Unexpectedly, the graded rate keeps almost the same when the apex angle decreases from 180° (GD) to 19°, but then suddenly declines to zero when the apex angle decreases from 19° to 0° (SWCNT). What is more interesting, the graded effect is not diminished when the interatomic force constant is weakened and mean free path is shorten. That is, besides nanoscale, the graded effect can be observed in macroscale graphene or CNC structures.

Han Meng

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

ChunkFlow: Communication-Aware Chunked Prefetching for Layerwise Offloading in Distributed Diffusion Transformer Inference

DIP: Dynamic In-Context Planner For Diffusion Language Models

Spatio-Temporal-Frequency Graph Attention Convolutional Network for Aircraft Recognition Based on Heterogeneous Radar Network

Thermal conductivity of one-dimensional carbon-boron nitride van der Waals heterostructure: A molecular dynamics study

Nano-Cross-Junction Effect on Phonon Transport in Silicon-Nanowire-Cages

The unexpected thermal conductivity from graphene disk, carbon nanocone to carbon nanotube