Source author record

Guanghui Yu

Guanghui Yu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SP Computation and Language Information Theory math.IT cond-mat.mes-hall cond-mat.mtrl-sci cond-mat.str-el cond-mat.supr-con

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Ada-MK: Adaptive MegaKernel Optimization via Automated DAG-based Search for LLM Inference

When large language models (LLMs) serve real-time inference in commercial online advertising systems, end-to-end latency must be strictly bounded to the millisecond range. Yet every token generated during the decode phase triggers thousands of kernel launches, and kernel launch overhead alone can account for 14.6% of end-to-end inference time. MegaKernel eliminates launch overhead and inter-operator HBM round-trips by fusing multiple operators into a single persistent kernel. However, existing MegaKernel implementations face a fundamental tension between portability and efficiency on resource-constrained GPUs such as NVIDIA Ada: hand-tuned solutions are tightly coupled to specific architectures and lack portability, while auto-compiled approaches introduce runtime dynamic scheduling whose branch penalties are unacceptable in latency-critical settings. We observe that under a fixed deployment configuration, the optimal execution path of a MegaKernel is uniquely determined, and runtime dynamic decision-making can be entirely hoisted to compile time. Building on this insight, we propose Ada-MK: (1) a three-dimensional shared-memory constraint model combined with K-dimension splitting that reduces peak shared memory usage by 50%; (2) MLIR-based fine-grained DAG offline search that solidifies the optimal execution path, completely eliminating runtime branching; and (3) a heterogeneous hybrid inference engine that embeds MegaKernel as a plugin into TensorRT-LLM, combining high-throughput Prefill with low-latency Decode. On an NVIDIA L20, Ada-MK improves single-batch throughput by up to 23.6% over vanilla TensorRT-LLM and 50.2% over vLLM, achieving positive gains across all tested scenarios--the first industrial deployment of MegaKernel in a commercial online advertising system.

preprint2026arXiv

Efficient LLM-based Advertising via Model Compression and Parallel Verification

Large language models (LLMs) have shown remarkable potential in advertising scenarios such as ad creative generation and targeted advertising. However, deploying LLMs in real-time advertising systems poses significant challenges due to their high inference latency and computational cost. In this paper, we propose an Efficient Generative Targeting framework that integrates adaptive group quantization, layer-adaptive hierarchical sparsification, and prefix-tree parallel verification to accelerate LLM inference while preserving generation quality. Extensive experiments on two real-world advertising scenarios demonstrate that our framework achieves significant speedup with acceptable quality degradation, making it operationally viable for practical deployments.

preprint2022arXiv

Waveform Design Using Half-duplex Devices for 6G Joint Communications and Sensing

Joint communications and sensing is a promising 6G technology, and the challenge is how to integrate them efficiently. Existing frequency-division and time-division coexistence can hardly bring a gain of integration. Directly using orthogonal frequency-division multiplexing (OFDM) to sense requires complex in-band full-duplex to cancel the selfinterference (SI). To solve these problems, this paper proposes novel coexistence schemes to gain super sensing range (SSR) and simple SI cancellation. SSR enables JCS to gain a sensing range of a sensing-only scheme and shares the resources with communications. Random time-division is proposed to gain a super Doppler range. Flexible sensing implanted OFDM (FSIOFDM) is also proposed. FSI-OFDM uses random sensing occasions to gain super Doppler range, as well as utilizes the fixed tail sensing occasions to achieve supper distance range. The simulation results show that the proposed schemes can gain SSR with limited resources.

preprint2020arXiv

6G White Paper on Machine Learning in Wireless Communication Networks

The focus of this white paper is on machine learning (ML) in wireless communications. 6G wireless communication networks will be the backbone of the digital transformation of societies by providing ubiquitous, reliable, and near-instant wireless connectivity for humans and machines. Recent advances in ML research has led enable a wide range of novel technologies such as self-driving vehicles and voice assistants. Such innovation is possible as a result of the availability of advanced ML models, large datasets, and high computational power. On the other hand, the ever-increasing demand for connectivity will require a lot of innovation in 6G wireless networks, and ML tools will play a major role in solving problems in the wireless domain. In this paper, we provide an overview of the vision of how ML will impact the wireless communication systems. We first give an overview of the ML methods that have the highest potential to be used in wireless networks. Then, we discuss the problems that can be solved by using ML in various layers of the network such as the physical layer, medium access layer, and application layer. Zero-touch optimization of wireless networks using ML is another interesting aspect that is discussed in this paper. Finally, at the end of each section, important research questions that the section aims to answer are presented.

preprint2019arXiv

6G Mobile Communication Network: Vision, Challenges and Key Technologies

With the open of the scale-up commercial deployment of 5G network, more and more researchers and related organizations began to consider the next generation of mobile communication system. This article will explore the 6G concept for 2030s. Firstly, this article summarizes the future 6G vision with four keywords: "Intelligent Connectivity", "Deep Connectivity", "Holographic Connectivity" and "Ubiquitous Connectivity", and these four keywords together constitute the 6G overall vision of "Wherever you think, everything follows your heart ". Then, the technical requirements and challenges to realize the 6G vision are analyzed, including peak throughput, higher energy efficiency, connection every where and anytime, new theories and technologies, self-aggregating communications fabric, and some non-technical challenges. Then the potential key technologies of 6G are classified and presented: communication technologies on new spectrum, including terahertz communication and visible light communication; fundamental technologies, including sparse theory (compressed sensing), new channel coding technology, large-scale antenna and flexible spectrum usage; special technical features, including Space-Air-Ground-Sea integrated communication and wireless tactile network. By exploring the 6G vision, requirements and challenges, as well as potential key technologies, this article attempts to outline the overall framework of 6G, and to provide directional guidance for the subsequent 6G research. Keywords 6G, vision, terahertz, VLC, compressed sensing, free duplex, wireless tactile network

preprint2014arXiv

High-T_c superconductivity in ultrathin Bi_2Sr_2CaCu_2O_8+x down to halfunit-cell thickness by protection with graphene

High-T_c superconductors confined to two dimension exhibit novel physical phenomena, such as superconductor-insulator transition. In the Bi_2Sr_2CaCu_2O_8+x (Bi2212) model system, despite extensive studies, the intrinsic superconducting properties at the thinness limit have been difficult to determine. Here we report a method to fabricate high quality single-crystal Bi2212 films down to half-unit-cell thickness in the form of graphene/Bi2212 van der Waals heterostructure, in which sharp superconducting transitions are observed. The heterostructure also exhibits a nonlinear current-voltage characteristic due to the Dirac nature of the graphene band structure. More interestingly, although the critical temperature remains essentially the same with reduced thickness of Bi2212, the slope of the normal state T-linear resistivity varies by a factor of 4-5, and the sheet resistance increases by three orders of magnitude, indicating a surprising decoupling of the normal state resistance and superconductivity. The developed technique is versatile, applicable to investigate other two-dimensional (2D) superconducting materials.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint