Paper detail

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

Diffusion Large Language Models (dLLMs) have emerged as a competitive alternative to autoregressive (AR) models, offering better hardware utilization and bidirectional context through parallel block-level decoding. However, as dLLMs continue to scale up with mixture-of-experts (MoE) architectures, their deployment on resource-constrained devices remains an open challenge. Existing AR-based methods often incur either prohibitive I/O overhead or significant compute bottlenecks. In this work, we propose TIDE, a novel resource-efficient inference system that leverages the temporal stability of expert activations during the diffusion process within the block. Specifically, we leverage the temporal stability of expert activations during the diffusion process within the block and introduce an interval-based expert refresh strategy that updates the expert placement in an I/O-aware fashion. To ensure optimal performance, we formulate the inference scheduling as a mathematical programming problem, solving for the optimal interval that minimizes I/O traffic and CPU computation. Most importantly, TIDE is a lossless optimization that requires no model training, providing a "free lunch" acceleration for dLLM inference. In a single GPU-CPU system, we demonstrate that TIDE achieves up to 1.4$\times$ and 1.5$\times$ throughput improvements over prior baselines on LLaDA2.0-mini and LLaDA2.0-flash models, respectively.

preprint2026arXivOpen access

Zhiben Chen Youpeng Zhao Yang Sui Jun Wang Yuzhang Shang

Computation and Language

Open graph Reviews Discussion

Signal facts

What is known right now

Open access5 authors1 topic1 save

Imported metadata coverageMissing code, dataset, citation and institution fields are tracked without dominating the paper.Details

Citations: 0Reviews: 0Saves: 1Code: not linkedDataset: not linkedInstitutions: 0

Next steps

Decide what to do with this paper

Like1 Dislike0Score 1

Use like or dislike for the fast social read. The more specific scholarly feedback stays available below when needed.

Save to reading list1

Keep the important signals around this paper in one place: votes, save state, collection context, reviews and the metadata you need before deciding what to do next.

Authors

Zhiben Chen Youpeng Zhao Yang Sui Jun Wang Yuzhang Shang

Institutions

No institution affiliation has been imported for this paper yet.

Useful1

Move through nearby people, institutions, topics and adjacent work without leaving the paper page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

ContributeLeave structured feedbackUse the review template when you have a concrete strength, concern or method question.Open review form

No structured reviews yet. High-signal critique starts here.

DiscussAdd a high-signal commentKeep quick notes, caveats and replication pointers separate from formal reviews.Open comment form

Iaroslav ArgunovMay 20, 2026, 8:27 PM

Wow! Good work!

Like0 Dislike0Score 0

TIDE: Efficient and Lossless MoE Diffusion LLM Inference with I/O-aware Expert Offload

What is known right now

Decide what to do with this paper

Keep the important context close to the paper

Authors

Institutions

Research map

Building this map preview

0 review(s)

1 comment(s)