Source author record

Tianlun Hu

Tianlun Hu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Networking and Internet Architecture Artificial Intelligence Distributed, Parallel, and Cluster Computing Multiagent Systems

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead. Existing MoE communication paths are often buffer-centric, using explicit inter-process relay and reordering buffers around collective transfer. This report presents a relay-buffer-free communication design for MoE inference acceleration on Ascend systems. The design reorganizes dispatch and combine around direct placement into destination expert windows and direct reading from remote expert windows. Built on globally pooled high-bandwidth memory and symmetric-memory allocation, it removes most intermediate relay and reordering buffers while retaining only lightweight control state, including counts, offsets, and synchronization metadata. We instantiate the design as two schedules for the main phases of MoE inference: a prefill schedule with richer planning state for throughput-oriented execution, and a compact decode schedule for latency-sensitive execution. Experiments on Ascend-based MoE workloads show reduced dispatch and combine latency in both settings. At the serving level, the implementation improves time to first token (TTFT), preserves competitive time per output token (TPOT), and enlarges the feasible scheduling space under practical latency constraints. These results indicate that, on platforms with globally addressable device memory, reducing intermediate buffering and output restoration around expert execution is an effective direction for accelerating MoE inference.

preprint2022arXiv

Inter-Cell Slicing Resource Partitioning via Coordinated Multi-Agent Deep Reinforcement Learning

Network slicing enables the operator to configure virtual network instances for diverse services with specific requirements. To achieve the slice-aware radio resource scheduling, dynamic slicing resource partitioning is needed to orchestrate multi-cell slice resources and mitigate inter-cell interference. It is, however, challenging to derive the analytical solutions due to the complex inter-cell interdependencies, interslice resource constraints, and service-specific requirements. In this paper, we propose a multi-agent deep reinforcement learning (DRL) approach that improves the max-min slice performance while maintaining the constraints of resource capacity. We design two coordination schemes to allow distributed agents to coordinate and mitigate inter-cell interference. The proposed approach is extensively evaluated in a system-level simulator. The numerical results show that the proposed approach with inter-agent coordination outperforms the centralized approach in terms of delay and convergence. The proposed approach improves more than two-fold increase in resource efficiency as compared to the baseline approach.

preprint2022arXiv

Knowledge Transfer in Deep Reinforcement Learning for Slice-Aware Mobility Robustness Optimization

The legacy mobility robustness optimization (MRO) in self-organizing networks aims at improving handover performance by optimizing cell-specific handover parameters. However, such solutions cannot satisfy the needs of next-generation network with network slicing, because it only guarantees the received signal strength but not the per-slice service quality. To provide the truly seamless mobility service, we propose a deep reinforcement learning-based slice-aware mobility robustness optimization (SAMRO) approach, which improves handover performance with per-slice service assurance by optimizing slice-specific handover parameters. Moreover, to allow safe and sample efficient online training, we develop a two-step transfer learning scheme: 1) regularized offline reinforcement learning, and 2) effective online fine-tuning with mixed experience replay. System-level simulations show that compared against the legacy MRO algorithms, SAMRO significantly improves slice-aware service continuation while optimizing the handover performance.

Tianlun Hu

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Inter-Cell Slicing Resource Partitioning via Coordinated Multi-Agent Deep Reinforcement Learning

Knowledge Transfer in Deep Reinforcement Learning for Slice-Aware Mobility Robustness Optimization