Researcher profile

Tianlun Hu

Tianlun Hu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Relay Buffer Independent Communication over Pooled HBM for Efficient MoE Inference on Ascend

Mixture-of-Experts (MoE) inference requires large-scale token exchange across devices, making dispatch and combine major bottlenecks in both prefill and decode. Beyond network transfer, routing-driven layout transformation, temporary relay, and output restoration can add substantial overhead. Existing MoE communication paths are often buffer-centric, using explicit inter-process relay and reordering buffers around collective transfer. This report presents a relay-buffer-free communication design for MoE inference acceleration on Ascend systems. The design reorganizes dispatch and combine around direct placement into destination expert windows and direct reading from remote expert windows. Built on globally pooled high-bandwidth memory and symmetric-memory allocation, it removes most intermediate relay and reordering buffers while retaining only lightweight control state, including counts, offsets, and synchronization metadata. We instantiate the design as two schedules for the main phases of MoE inference: a prefill schedule with richer planning state for throughput-oriented execution, and a compact decode schedule for latency-sensitive execution. Experiments on Ascend-based MoE workloads show reduced dispatch and combine latency in both settings. At the serving level, the implementation improves time to first token (TTFT), preserves competitive time per output token (TPOT), and enlarges the feasible scheduling space under practical latency constraints. These results indicate that, on platforms with globally addressable device memory, reducing intermediate buffering and output restoration around expert execution is an effective direction for accelerating MoE inference.

preprint2022arXiv

Inter-Cell Slicing Resource Partitioning via Coordinated Multi-Agent Deep Reinforcement Learning

Network slicing enables the operator to configure virtual network instances for diverse services with specific requirements. To achieve the slice-aware radio resource scheduling, dynamic slicing resource partitioning is needed to orchestrate multi-cell slice resources and mitigate inter-cell interference. It is, however, challenging to derive the analytical solutions due to the complex inter-cell interdependencies, interslice resource constraints, and service-specific requirements. In this paper, we propose a multi-agent deep reinforcement learning (DRL) approach that improves the max-min slice performance while maintaining the constraints of resource capacity. We design two coordination schemes to allow distributed agents to coordinate and mitigate inter-cell interference. The proposed approach is extensively evaluated in a system-level simulator. The numerical results show that the proposed approach with inter-agent coordination outperforms the centralized approach in terms of delay and convergence. The proposed approach improves more than two-fold increase in resource efficiency as compared to the baseline approach.

preprint2022arXiv

Knowledge Transfer in Deep Reinforcement Learning for Slice-Aware Mobility Robustness Optimization

The legacy mobility robustness optimization (MRO) in self-organizing networks aims at improving handover performance by optimizing cell-specific handover parameters. However, such solutions cannot satisfy the needs of next-generation network with network slicing, because it only guarantees the received signal strength but not the per-slice service quality. To provide the truly seamless mobility service, we propose a deep reinforcement learning-based slice-aware mobility robustness optimization (SAMRO) approach, which improves handover performance with per-slice service assurance by optimizing slice-specific handover parameters. Moreover, to allow safe and sample efficient online training, we develop a two-step transfer learning scheme: 1) regularized offline reinforcement learning, and 2) effective online fine-tuning with mixed experience replay. System-level simulations show that compared against the legacy MRO algorithms, SAMRO significantly improves slice-aware service continuation while optimizing the handover performance.