Researcher profile

Yuanzhe Li

Yuanzhe Li contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2026arXiv

AIConfigurator: Lightning-Fast Configuration Optimization for Multi-Framework LLM Serving

Optimizing Large Language Model (LLM) inference in production systems is increasingly difficult due to dynamic workloads, stringent latency/throughput targets, and a rapidly expanding configuration space. This complexity spans not only distributed parallelism strategies (tensor/pipeline/expert) but also intricate framework-specific runtime parameters such as those concerning the enablement of CUDA graphs, available KV-cache memory fractions, and maximum token capacity, which drastically impact performance. The diversity of modern inference frameworks (e.g., TRT-LLM, vLLM, SGLang), each employing distinct kernels and execution policies, makes manual tuning both framework-specific and computationally prohibitive. We present AIConfigurator, a unified performance-modeling system that enables rapid, framework-agnostic inference configuration search without requiring GPU-based profiling. AIConfigurator combines (1) a methodology that decomposes inference into analytically modelable primitives - GEMM, attention, communication, and memory operations while capturing framework-specific scheduling dynamics; (2) a calibrated kernel-level performance database for these primitives across a wide range of hardware platforms and popular open-weights models (GPT-OSS, Qwen, DeepSeek, LLama, Mistral); and (3) an abstraction layer that automatically resolves optimal launch parameters for the target backend, seamlessly integrating into production-grade orchestration systems. Evaluation on production LLM serving workloads demonstrates that AIConfigurator identifies superior serving configurations that improve performance by up to 40% for dense models (e.g., Qwen3-32B) and 50% for MoE architectures (e.g., DeepSeek-V3), while completing searches within 30 seconds on average. Enabling the rapid exploration of vast design spaces - from cluster topology down to engine specific flags.

preprint2026arXiv

Programmable calculus operations in electromagnetic space using space-time-coding metasurface

With the rapid advancement of metasurfaces and the increasing demand for programmable metasurfaces to simplify information systems, wave-based computation using metasurfaces has emerged as an attractive research topic. To facilitate the mathematical operations in electromagnetic (EM) space, here we propose a space-time coding metasurface (STCM) system capable of directly performing calculus operations on the spatial energy distributions of EM waves. By exploiting harmonic characteristics induced by time-varying coding, the responses of meta-atoms at specific harmonics can be flexibly controlled, which enables the metasurface system to address more complex tasks. Owing to its programmability, the STCM can dynamically switch functions in real time to accommodate different calculus tasks. To fully leverage the capability of STCM, we not only present the space-time coding sequences for differentiation and integration of EM waves, but also develop and numerically simulate the space-time coding sequences that can independently and simultaneously implement different calculus operations on the same incident EM waves. To experimentally validate the feasibility of the EM calculus operations, proof-of-concept experiments are conducted using a programmable 2-bit STCM. Good agreements among the theory, numerical simulations, and experiments confirm the feasibility of performing calculus operations in the EM space and demonstrate the broad application prospects of STCM in EM wave manipulations, wireless communications, and signal processing.