Source author record

Dajun Zhang

Dajun Zhang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Hardware Architecture Machine Learning Networking and Internet Architecture

Catalog footprint

What is connected

2works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoning: accuracy collapses to 0-2.5% when half the cache is removed. We ask a different question: must every token live in HBM, or can some live elsewhere? We introduce a semantics-aware memory hierarchy that sorts tokens into four tiers -- HBM, DDR, compressed, and evicted -- using cumulative attention scoring. Low-importance tokens are moved to CPU memory rather than destroyed; before each attention step they are prefetched back at full precision, contributing exactly the same terms as if they had never left the GPU. We formalize this as zero-approximation-error offloading and derive our central finding: accuracy depends solely on how many tokens are permanently discarded (the eviction ratio), not on how many remain in HBM. A controlled 3x3 grid over HBM and eviction ratios confirms this across three model scales (7B-32B) and four benchmarks. With only 3% eviction, the hierarchy retains 91% of full-cache accuracy on GSM8K and 71% on MATH-500 (n=200); at 14B scale it matches the uncompressed baseline (90% vs. 86%) while halving HBM occupancy. A head-to-head reproduction of R-KV -- the current SOTA eviction method -- on our setup achieves only 0-32% at comparable budgets. A system prototype with real GPU-CPU data movement shows that the price of this preservation is modest -- 5-7% transfer overhead -- and scaling analysis projects 2-48 GB HBM savings at production batch sizes.

preprint2016arXiv

Trust-based Secure Routing in Software-defined Vehicular Ad Hoc Networks

With the rising interest of expedient, safe, and high-efficient transportation, vehicular ad hoc networks (VANETs) have turned into a critical technology in smart transportation systems. Because of the high mobility of nodes, VANETs are vulnerable to security attacks. In this paper, we propose a novel framework of software-defined VANETs with trust management. Specifically, we separate the forwarding plane in VANETs from the control plane, which is responsible for the control functionality, such as routing protocols and trust management in VANETs. Using the on-demand distance vector routing (TAODV) protocol as an example, we present a routing protocol named software-defined trust based ad hoc on-demand distance vector routing (SD-TAODV). Simulation results are presented to show the effectiveness of the proposed software-defined VANETs with trust management.