Source author record

Shao Tang

Shao Tang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

cond-mat.str-el Artificial Intelligence Computation and Language cond-mat.dis-nn cond-mat.mtrl-sci Data Structures and Algorithms Human-Computer Interaction

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation

Distilling the capabilities from a large reasoning model (LRM) to a smaller student model often involves training on substantial amounts of reasoning data. However, knowledge distillation (KD) over lengthy sequences with prompt (P), chain-of-thought (CoT), and answer (A) sections makes the process computationally expensive. In this work, we investigate how the allocation of supervision across different sections (P, CoT, A) affects student performance. Our analysis shows that selective KD over only the CoT tokens can be effective when the prompt and answer information is encompassed by it. Building on this insight, we establish a truncation protocol to quantify computation-quality tradeoffs as a function of sequence length. We observe that beyond a specific length, longer training sequences provide marginal returns for downstream performance but require substantially higher memory and FLOPs. To this end, training on only the first $50\%$ of tokens of every training sequence can retain, on average, $\approx91\%$ of full-sequence performance on math benchmarks while reducing training time, memory usage, and FLOPs by about $50\%$ each. Codes are available at https://github.com/weiruichen01/distilling-the-essence.

preprint2026arXiv

EZBlender: Efficient 3D Editing with Plan-and-ReAct Agent

As a cornerstone of the modern digital economy, 3D modeling and rendering demand substantial resources and manual effort when scene editing is performed in the traditional manner. Despite recent progress in VLM-based agents for 3D editing, the fundamental trade-off between editing precision and agent responsiveness remains unresolved. To overcome these limitations, we present EZBlender, a Blender agent with a hybrid framework that combines planning-based task decomposition and reactive local autonomy for efficient human AI collaboration and semantically faithful 3D editing. Specifically, this unexplored Plan-and-ReAct design not only preserves editing quality but also significantly reduces latency and computational cost. To further validate the efficiency and effectiveness of the proposed edge-autonomy architecture, we construct a dedicated multi-tasking benchmark that has not been systematically investigated in prior research. In addition, we provide a comprehensive analysis of language model preference, system responsiveness, and economic efficiency.

preprint2026arXiv

LLM Query Scheduling with Prefix Reuse and Latency Constraints

The efficient deployment of large language models (LLMs) in online settings requires optimizing inference performance under stringent latency constraints, particularly the time-to-first-token (TTFT) and time-per-output-token (TPOT). This paper focuses on the query scheduling problem for LLM inference with prefix reuse, a technique that leverages shared prefixes across queries to reduce computational overhead. Our work reveals previously unknown limitations of the existing first-come-first-serve (FCFS) and longest-prefix-match (LPM) scheduling strategies with respect to satisfying latency constraints. We present a formal theoretical framework for LLM query scheduling under RadixAttention, a prefix reuse mechanism that stores and reuses intermediate representations in a radix tree structure. Our analysis establishes the NP-hardness of the scheduling problem with prefix reuse under TTFT constraints and proposes a novel scheduling algorithm, $k$-LPM, which generalizes existing methods by balancing prefix reuse and fairness in query processing. Theoretical guarantees demonstrate that $k$-LPM achieves improved TTFT performance under realistic traffic patterns captured by a data generative model. Empirical evaluations in a realistic serving setting validates our findings, showing significant reductions in P99 TTFT compared to baseline methods.

preprint2016arXiv

Shock-waves and commutation speed of memristors

Progress of silicon based technology is nearing its physical limit, as minimum feature size of components is reaching a mere 10 nm. The resistive switching behaviour of transition metal oxides and the associated memristor device is emerging as a competitive technology for next generation electronics. Significant progress has already been made in the past decade and devices are beginning to hit the market; however, it has been mainly the result of empirical trial and error. Hence, gaining theoretical insight is of essence. In the present work we report the striking result of a connection between the resistive switching and {\em shock wave} formation, a classic topic of non-linear dynamics. We argue that the profile of oxygen vacancies that migrate during the commutation forms a shock wave that propagates through a highly resistive region of the device. We validate the scenario by means of model simulations and experiments in a manganese-oxide based memristor device. The shock wave scenario brings unprecedented physical insight and enables to rationalize the process of oxygen-vacancy-driven resistive change with direct implications for a key technological aspect -- the commutation speed.

preprint2015arXiv

Mottness-induced healing in strongly correlated superconductors

We study impurity healing effects in models of strongly correlated superconductors. We show that in general both the range and the amplitude of the spatial variations caused by nonmagnetic impurities are significantly suppressed in the superconducting as well as in the normal states. We explicitly quantify the weights of the local and the non-local responses to inhomogeneities and show that the former are overwhelmingly dominant over the latter. By quantifying the spatial range of the local response, we show that it is restricted to only a few lattice spacings over a significant range of dopings in the vicinity of the Mott insulating state. We demonstrate that this healing effect is ultimately due to the suppression of charge fluctuations induced by Mottness. We also define and solve analytically a simplified yet accurate model of healing, within which we obtain simple expressions for quantities of direct experimental relevance.

preprint2015arXiv

Quantum criticality at the Anderson transition: a TMT perspective

We present a complete analytical and numerical solution of the Typical Medium Theory (TMT) for the Anderson metal-insulator transition. In this theory, we self-consistently calculate the typical amplitude of the electron wave-We present a complete analytical and numerical solution of the Typical Medium Theory (TMT) for the Anderson metal-insulator transition. This approach self-consistently calculates the typical amplitude of the electronic wave-functions, thus representing the conceptually simplest order-parameter theory for the Anderson transition. We identify all possible universality classes for the critical behavior, which can be found within such a mean-field approach. This provides insights into how interaction-induced renormalizations of the disorder potential may produce qualitative modifications of the critical behavior. We also formulate a simplified description of the leading critical behavior, thus obtaining an effective Landau theory for Anderson localization.

preprint2015arXiv

Strong correlations generically protect d-wave superconductivity against disorder

We address the question of why strongly correlated d-wave superconductors, such as the cuprates, prove to be surprisingly robust against the introduction of non-magnetic impurities. We show that, very generally, both the pair-breaking and the normal state transport scattering rates are significantly suppressed by strong correlations effects arising in the proximity to a Mott insulating state. We also show that the correlation-renormalized scattering amplitude is generically enhanced in the forward direction, an effect which was previously often ascribed to the specific scattering by charged impurities outside the copper-oxide planes.

Shao Tang

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Distilling the Essence: Efficient Reasoning Distillation via Sequence Truncation

EZBlender: Efficient 3D Editing with Plan-and-ReAct Agent

LLM Query Scheduling with Prefix Reuse and Latency Constraints

Shock-waves and commutation speed of memristors

Mottness-induced healing in strongly correlated superconductors

Quantum criticality at the Anderson transition: a TMT perspective

Strong correlations generically protect d-wave superconductivity against disorder