Researcher profile

Junren Chen

Junren Chen contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

How You Begin is How You Reason: Driving Exploration in RLVR via Prefix-Tuned Priors

Reinforcement learning with verifiable rewards (RLVR) recently thrives in large language model (LLM) reasoning tasks. However, the reward sparsity and the long reasoning horizon make effective exploration challenging. In practice, this challenge manifests as the \emph{entropy collapse} phenomenon, where RLVR improves single-rollout accuracy but fails to expand coverage on successful reasoning trajectories. Passive exploration techniques like entropy regularization tend to dismiss generation quality, resulting in noisy rollouts. In response to this issue, we propose an Information-Maximizing Augmented eXploration (IMAX) framework to train a pool of soft prefixes that reshapes the base model's prior over reasoning trajectories. Rather than relying on RL to incentivize exploration on top of the base model, each prefix acts as a trainable control knob that induces a distinct rollout distribution from the same backbone model. To encourage discovery of diverse and task-relevant reasoning behaviors, we derive an Information Maximization (InfoMax) reward to complement the verifiable rewards for RL training. IMAX is in general algorithm-agnostic and can be seamlessly integrated into existing RLVR pipelines. Experiment results have shown that across three backbone scales, IMAX consistently improves reasoning performance over standard RLVR, with gains up to 11.60\% in Pass@4 and 10.57\% in Avg@4.

preprint2026arXiv

Robust Mean Estimation under Quantization

We consider the problem of mean estimation under quantization and adversarial corruption. We construct multivariate robust estimators that are optimal up to logarithmic factors in two different settings. The first is a one-bit setting, where each bit depends only on a single sample, and the second is a partial quantization setting, in which the estimator may use a small fraction of unquantized data.

preprint2023arXiv

High Dimensional Statistical Estimation under Uniformly Dithered One-bit Quantization

In this paper, we propose a uniformly dithered 1-bit quantization scheme for high-dimensional statistical estimation. The scheme contains truncation, dithering, and quantization as typical steps. As canonical examples, the quantization scheme is applied to the estimation problems of sparse covariance matrix estimation, sparse linear regression (i.e., compressed sensing), and matrix completion. We study both sub-Gaussian and heavy-tailed regimes, where the underlying distribution of heavy-tailed data is assumed to have bounded moments of some order. We propose new estimators based on 1-bit quantized data. In sub-Gaussian regime, our estimators achieve near minimax rates, indicating that our quantization scheme costs very little. In heavy-tailed regime, while the rates of our estimators become essentially slower, these results are either the first ones in an 1-bit quantized and heavy-tailed setting, or already improve on existing comparable results from some respect. Under the observations in our setting, the rates are almost tight in compressed sensing and matrix completion. Our 1-bit compressed sensing results feature general sensing vector that is sub-Gaussian or even heavy-tailed. We also first investigate a novel setting where both the covariate and response are quantized. In addition, our approach to 1-bit matrix completion does not rely on likelihood and represent the first method robust to pre-quantization noise with unknown distribution. Experimental results on synthetic data are presented to support our theoretical analysis.

preprint2022arXiv

Error Bound of Empirical $\ell_2$ Risk Minimization for Noisy Standard and Generalized Phase Retrieval Problems

In this paper, we study the estimation performance of empirical $\ell_2$ risk minimization (ERM) in noisy (standard) phase retrieval (NPR) given by $y_k = |α_k^*x_0|^2+η_k$, or noisy generalized phase retrieval (NGPR) formulated as $y_k = x_0^*A_kx_0 + η_k$, where $x_0\in\mathbb{K}^d$ is the desired signal, $n$ is the sample size, $η= (η_1,...,η_n)^\top$ is the noise vector. We establish new error bounds under different noise patterns, and our proofs are valid for both $\mathbb{K}=\mathbb{R}$ and $\mathbb{K}=\mathbb{C}$. In NPR under arbitrary noise vector $η$, we derive a new error bound $O\big(\|η\|_\infty\sqrt{\frac{d}{n}} + \frac{|\mathbf{1}^\topη|}{n}\big)$, which is tighter than the currently known one $O\big(\frac{\|η\|}{\sqrt{n}}\big)$ in many cases. In NGPR, we show $O\big(\|η\|\frac{\sqrt{d}}{n}\big)$ for arbitrary $η$. In both problems, the bounds for arbitrary noise immediately give rise to $\tilde{O}(\sqrt{\frac{d}{n}})$ for sub-Gaussian or sub-exponential random noise, with some conventional but inessential assumptions (e.g., independent or zero-mean condition) removed or weakened. In addition, we make a first attempt to ERM under heavy-tailed random noise assumed to have bounded $l$-th moment. To achieve a trade-off between bias and variance, we truncate the responses and propose a corresponding robust ERM estimator, which is shown to possess the guarantee $\tilde{O}\big(\big[\sqrt{\frac{d}{n}}\big]^{1-1/l}\big)$ in both NPR, NGPR. All the error bounds straightforwardly extend to the more general problems of rank-$r$ matrix recovery, and these results deliver a conclusion that the full-rank frame $\{A_k\}_{k=1}^n$ in NGPR is more robust to biased noise than the rank-1 frame $\{α_kα_k^*\}_{k=1}^n$ in NPR. Extensive experimental results are presented to illustrate our theoretical findings.