Researcher profile

Guanghui Yu

Guanghui Yu contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

When large language models (LLMs) serve real-time inference in commercial online advertising systems, end-to-end latency must be strictly bounded to the millisecond range. Yet every token generated during the decode phase triggers thousands of kernel launches, and kernel launch overhead alone can account for 14.6% of end-to-end inference time. MegaKernel eliminates launch overhead and inter-operator HBM round-trips by fusing multiple operators into a single persistent kernel. However, existing MegaKernel implementations face a fundamental tension between portability and efficiency on resource-constrained GPUs such as NVIDIA Ada: hand-tuned solutions are tightly coupled to specific architectures and lack portability, while auto-compiled approaches introduce runtime dynamic scheduling whose branch penalties are unacceptable in latency-critical settings. We observe that under a fixed deployment configuration, the optimal execution path of a MegaKernel is uniquely determined, and runtime dynamic decision-making can be entirely hoisted to compile time. Building on this insight, we propose Ada-MK: (1) a three-dimensional shared-memory constraint model combined with K-dimension splitting that reduces peak shared memory usage by 50%; (2) MLIR-based fine-grained DAG offline search that solidifies the optimal execution path, completely eliminating runtime branching; and (3) a heterogeneous hybrid inference engine that embeds MegaKernel as a plugin into TensorRT-LLM, combining high-throughput Prefill with low-latency Decode. On an NVIDIA L20, Ada-MK improves single-batch throughput by up to 23.6% over vanilla TensorRT-LLM and 50.2% over vLLM, achieving positive gains across all tested scenarios--the first industrial deployment of MegaKernel in a commercial online advertising system.

preprint2026arXiv

Efficient LLM-based Advertising via Model Compression and Parallel Verification

Large language models (LLMs) have shown remarkable potential in advertising scenarios such as ad creative generation and targeted advertising. However, deploying LLMs in real-time advertising systems poses significant challenges due to their high inference latency and computational cost. In this paper, we propose an Efficient Generative Targeting framework that integrates adaptive group quantization, layer-adaptive hierarchical sparsification, and prefix-tree parallel verification to accelerate LLM inference while preserving generation quality. Extensive experiments on two real-world advertising scenarios demonstrate that our framework achieves significant speedup with acceptable quality degradation, making it operationally viable for practical deployments.

preprint2022arXiv

Waveform Design Using Half-duplex Devices for 6G Joint Communications and Sensing

Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper proposes novel coexistence schemes to gain super sensing range (SSR) and simple SI cancellation. SSR enables JCS to gain a sensing range of a sensing-only scheme and shares the resources with communications. Random time-division is proposed to gain a super Doppler range. Flexible sensing implanted OFDM (FSIOFDM) is also proposed. FSI-OFDM uses random sensing occasions to gain super Doppler range, as well as utilizes the fixed tail sensing occasions to achieve supper distance range. The simulation results show that the proposed schemes can gain SSR with limited resources.

preprint2020arXiv

6G White Paper on Machine Learning in Wireless Communication Networks

The focus of this white paper is on machine learning (ML) in wireless communications. 6G wireless communication networks will be the backbone of the digital transformation of societies by providing ubiquitous, reliable, and near-instant wireless connectivity for humans and machines. Recent advances in ML research has led enable a wide range of novel technologies such as self-driving vehicles and voice assistants. Such innovation is possible as a result of the availability of advanced ML models, large datasets, and high computational power. On the other hand, the ever-increasing demand for connectivity will require a lot of innovation in 6G wireless networks, and ML tools will play a major role in solving problems in the wireless domain. In this paper, we provide an overview of the vision of how ML will impact the wireless communication systems. We first give an overview of the ML methods that have the highest potential to be used in wireless networks. Then, we discuss the problems that can be solved by using ML in various layers of the network such as the physical layer, medium access layer, and application layer. Zero-touch optimization of wireless networks using ML is another interesting aspect that is discussed in this paper. Finally, at the end of each section, important research questions that the section aims to answer are presented.

preprint2019arXiv

6G Mobile Communication Network: Vision, Challenges and Key Technologies

With the open of the scale-up commercial deployment of 5G network, more and more researchers and related organizations began to consider the next generation of mobile communication system. This article will explore the 6G concept for 2030s. Firstly, this article summarizes the future 6G vision with four keywords: "Intelligent Connectivity", "Deep Connectivity", "Holographic Connectivity" and "Ubiquitous Connectivity", and these four keywords together constitute the 6G overall vision of "Wherever you think, everything follows your heart ". Then, the technical requirements and challenges to realize the 6G vision are analyzed, including peak throughput, higher energy efficiency, connection every where and anytime, new theories and technologies, self-aggregating communications fabric, and some non-technical challenges. Then the potential key technologies of 6G are classified and presented: communication technologies on new spectrum, including terahertz communication and visible light communication; fundamental technologies, including sparse theory (compressed sensing), new channel coding technology, large-scale antenna and flexible spectrum usage; special technical features, including Space-Air-Ground-Sea integrated communication and wireless tactile network. By exploring the 6G vision, requirements and challenges, as well as potential key technologies, this article attempts to outline the overall framework of 6G, and to provide directional guidance for the subsequent 6G research. Keywords 6G, vision, terahertz, VLC, compressed sensing, free duplex, wireless tactile network