Source author record

Jianwen Ding

Jianwen Ding appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Information Theory Machine Learning eess.SY math.IT Systems and Control

Catalog footprint

What is connected

4works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

The recent growth of on-device Large Language Model (LLM) inference has driven significant interest in device-edge collaborative LLM inference. As a promising architecture, Speculative Decoding (SD) is increasingly adopted where a lightweight draft model rapidly generates candidate tokens to be verified by a powerful target model. However, a fundamental challenge lies in achieving per-token resource scheduling to effectively adapt SD paradigm to resource-constrained edge environment. This paper proposes a Generative Entropy- and Lyapunov-based Adaptive Token Offloading framework, named GELATO, to maximize decoding throughput under energy constraints in a device-edge collaborative SD system. Specifically, an outer drift-plus-penalty loop makes online decisions to establish a reference drafting budget, managing long-term energy-throughput trade-off. Further, a nested entropy-driven generation mechanism executes early exiting to adapt to per-token dynamic generative uncertainty. Theoretical analysis establishes a rigorous performance bound on long-term throughput for GELATO. Extensive evaluations demonstrate that GELATO achieves a globally optimal tradeoff, outperforming state-of-the-art distributed SD architectures by 64.98% in token throughput and reducing energy consumption by 47.47% under resource-constrained environments, while preserving LLM decoding quality.

preprint2026arXiv

Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission

Device-edge collaborative inference with Deep Neural Networks (DNNs) faces fundamental trade-offs among accuracy, latency and energy consumption. Current scheduling exhibits two drawbacks: a granularity mismatch between coarse, task-level decisions and fine-grained, packet-level channel dynamics, and insufficient awareness of per-task complexity. Consequently, scheduling solely at the task level leads to inefficient resource utilization. This paper proposes a novel ENergy-ACcuracy Hierarchical optimization framework for split Inference, named ENACHI, that jointly optimizes task- and packet-level scheduling to maximize accuracy under energy and delay constraints. A two-tier Lyapunov-based framework is developed for ENACHI, with a progressive transmission technique further integrated to enhance adaptivity. At the task level, an outer drift-plus-penalty loop makes online decisions for DNN partitioning and bandwidth allocation, and establishes a reference power budget to manage the long-term energy-accuracy trade-off. At the packet level, an uncertainty-aware progressive transmission mechanism is employed to adaptively manage per-sample task complexity. This is integrated with a nested inner control loop implementing a novel reference-tracking policy, which dynamically adjusts per-slot transmit power to adapt to fluctuating channel conditions. Experiments on ImageNet dataset demonstrate that ENACHI outperforms state-of-the-art benchmarks under varying deadlines and bandwidths, achieving a 43.12\% gain in inference accuracy with a 62.13\% reduction in energy consumption under stringent deadlines, and exhibits high scalability by maintaining stable energy consumption in congested multi-user scenarios.

preprint2022arXiv

5G for Railways: the Next Generation Railway Dedicated Communications

To overcome increasing traffic, provide various new services, further ensure safety and security, significantly improve travel comfort, a new communication system for railways is required. Since 2019, public networks have been evolving to the fifth generation communication (5G) worldwide, whereas the main communication system of railway is still based on the second generation communication (2G). It is thus necessary for railways to replace the current 2G-based technology with the next generation railway dedicated communication system with improved capacity and capability, and the 5G for railways (5G-R) technology is a promising solution for further intelligent railways. This article gives a review of the current developments of the next generation railway communications, followed by a discussion of the typical services that the 5G-R can provide to intelligent railways. Then, main application scenarios of 5G-R are summarized and system configurations are compared. Some key technologies of 5G-R such as network architecture, massive MIMO, millimeter-wave, multiple access scheme, ultra-reliable low latency communication, and advanced video processing are presented and analyzed. Finally, some challenges of 5G-R are highlighted.

preprint2022arXiv

Triple-Band Scheduling with Millimeter Wave and Terahertz Bands for Wireless Backhaul

With the explosive growth of mobile traffic demand, densely deployed small cells underlying macrocells have great potential for 5G and beyond wireless networks. In this paper, we consider the problem of supporting traffic flows with diverse QoS requirements by exploiting three high frequency bands, i.e., the 28GHz band, the E-band, and the Terahertz (THz) band. The cooperation of the three bands is helpful for maximizing the number of flows with their QoS requirements satisfied. To solve the formulated nonlinear integer programming problem, we propose a triple-band scheduling scheme which can select the optimum scheduling band for each flow among three different frequency bands. The proposed scheme also efficiently utilizes the resource to schedule flow transmissions in time slots. Extensive simulations demonstrate the superior performance of the proposed scheme over three baseline schemes with respect to the number of completed flows and the system throughput.

Jianwen Ding

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

GELATO: Generative Entropy- and Lyapunov-based Adaptive Token Offloading for Device-Edge Speculative LLM Inference

Hierarchical Online-Scheduling for Energy-Efficient Split Inference with Progressive Transmission

5G for Railways: the Next Generation Railway Dedicated Communications

Triple-Band Scheduling with Millimeter Wave and Terahertz Bands for Wireless Backhaul