Source author record

Zhuoran Li

Zhuoran Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.IV Multimedia Computation and Language Machine Learning math.AP math.CA

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

Group Relative Policy Optimization (GRPO) has emerged as a promising critic-free reinforcement learning paradigm for reasoning tasks. However, standard GRPO employs a coarse-grained credit assignment mechanism that propagates group-level rewards uniformly to to every token in a sequence, neglecting the varying contribution of individual reasoning steps. We address this limitation by introducing Outcome-grounded Advantage Reshaping (OAR), a fine-grained credit assignment mechanism that redistributes advantages based on how much each token influences the model's final answer. We instantiate OAR via two complementary strategies: (1) OAR-P, which estimates outcome sensitivity through counterfactual token perturbations, serving as a high-fidelity attribution signal; (2) OAR-G, which uses an input-gradient sensitivity proxy to approximate the influence signal with a single backward pass. These importance signals are integrated with a conservative Bi-Level advantage reshaping scheme that suppresses low-impact tokens and boosts pivotal ones while preserving the overall advantage mass. Empirical results on extensive mathematical reasoning benchmarks demonstrate that while OAR-P sets the performance upper bound, OAR-G achieves comparable gains with negligible computational overhead, both significantly outperforming a strong GRPO baseline, pushing the boundaries of critic-free LLM reasoning.

preprint2022arXiv

$L^2$ Schrödinger maximal estimates associated with finite type phases in $\mathbb{R}^2$

In this paper, we establish Schrödinger maximal estimates associated with the finite type phases \begin{equation*} ϕ(ξ_1,ξ_2):=ξ^m_1+ξ^m_2,\;(ξ_1,ξ_2)\in [0,1]^2, \end{equation*} where $m \geq 4$ is an even number. Following [12], we prove an $L^2$ fractal restriction estimate associated with the surfaces \begin{equation*} F^2_m:=\{(ξ_1,ξ_2,ϕ(ξ_1,ξ_2)):\;(ξ_1,ξ_2)\in [0,1]^2\} \end{equation*} as the main result, which also gives results on the average Fourier decay of fractal measures associated with these surfaces. The key ingredients of the proof include the rescaling technique from [16], Bourgain-Demeter's $\ell^2$ decoupling inequality, the reduction of dimension arguments from [17] and induction on scales.

preprint2020arXiv

Assessing the Quality-of-Experience of Adaptive Bitrate Video Streaming

The diversity of video delivery pipeline poses a grand challenge to the evaluation of adaptive bitrate (ABR) streaming algorithms and objective quality-of-experience (QoE) models. Here we introduce so-far the largest subject-rated database of its kind, namely WaterlooSQoE-IV, consisting of 1350 adaptive streaming videos created from diverse source contents, video encoders, network traces, ABR algorithms, and viewing devices. We collect human opinions for each video with a series of carefully designed subjective experiments. Subsequent data analysis and testing/comparison of ABR algorithms and QoE models using the database lead to a series of novel observations and interesting findings, in terms of the effectiveness of subjective experiment methodologies, the interactions between user experience and source content, viewing device and encoder type, the heterogeneities in the bias and preference of user experiences, the behaviors of ABR algorithms, and the performance of objective QoE models. Most importantly, our results suggest that a better objective QoE model, or a better understanding of human perceptual experience and behaviour, is the most dominating factor in improving the performance of ABR algorithms, as opposed to advanced optimization frameworks, machine learning strategies or bandwidth predictors, where a majority of ABR research has been focused on in the past decade. On the other hand, our performance evaluation of 11 QoE models shows only a moderate correlation between state-of-the-art QoE models and subjective ratings, implying rooms for improvement in both QoE modeling and ABR algorithms. The database is made publicly available at: \url{https://ece.uwaterloo.ca/~zduanmu/waterloosqoe4/}.

preprint2020arXiv

Characterizing Generalized Rate-Distortion Performance of Video Coding: An Eigen Analysis Approach

Rate-distortion (RD) theory is at the heart of lossy data compression. Here we aim to model the generalized RD (GRD) trade-off between the visual quality of a compressed video and its encoding profiles (e.g., bitrate and spatial resolution). We first define the theoretical functional space $\mathcal{W}$ of the GRD function by analyzing its mathematical properties.We show that $\mathcal{W}$ is a convex set in a Hilbert space, inspiring a computational model of the GRD function, and a method of estimating model parameters from sparse measurements. To demonstrate the feasibility of our idea, we collect a large-scale database of real-world GRD functions, which turn out to live in a low-dimensional subspace of $\mathcal{W}$. Combining the GRD reconstruction framework and the learned low-dimensional space, we create a low-parameter eigen GRD method to accurately estimate the GRD function of a source video content from only a few queries. Experimental results on the database show that the learned GRD method significantly outperforms state-of-the-art empirical RD estimation methods both in accuracy and efficiency. Last, we demonstrate the promise of the proposed model in video codec comparison.

Zhuoran Li

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

Outcome-Grounded Advantage Reshaping for Fine-Grained Credit Assignment in Mathematical Reasoning

$L^2$ Schrödinger maximal estimates associated with finite type phases in $\mathbb{R}^2$

Assessing the Quality-of-Experience of Adaptive Bitrate Video Streaming

Characterizing Generalized Rate-Distortion Performance of Video Coding: An Eigen Analysis Approach