Researcher profile

Junyu Zhang

Junyu Zhang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
15works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

15 published item(s)

preprint2026arXiv

Abundant Population of Broad H$α$ Emitters in the GOODS-N Field Revealed by CONGRESS, FRESCO, and JADES

We present a spectroscopic search for broad H$α$ emitters at z$\approx$3.7-6.5 in the GOODS-N field, utilizing JWST/NIRCam slitless spectroscopy from FRESCO and CONGRESS, complemented by JADES imaging. We identify 19 broad H$α$ emitters with FWHM$>$1000 km/s at z$\approx$4-5.5, including 9 new sources. The broad H$α$ luminosity function (LF) derived from our sample is consistent with those of other JWST-selected broad-line AGN reported in the literature. The black hole masses and AGN bolometric luminosities, inferred from the broad H$α$ components, indicate that most sources are accreting at ~10% of the Eddington limit. We derive their host stellar masses via SED fitting and find higher $M_{BH}/M_{*}$ ratios relative to the local $M_{BH}-M_{*}$ relations, consistent with previous studies. We find that 42% of the sample do not satisfy the widely-used color selection criteria for Little Red Dots (LRDs), with the majority of these sources lacking the characteristic steep rest-optical red slope, indicating that the LRD selection is highly incomplete when selecting AGN galaxies. A comparison of the average SEDs between our sample and LRDs selected in the same field reveals that the steep red slopes observed in some LRDs are likely due to line-boosting effects as previously suggested. Furthermore, we find that 68% of color-selected LRDs with H$α$ detections in the NIRCam/Grism spectra do not exhibit broad-line features. While the limited sensitivity of the grism spectra may hinder the detection of broad-line components in faint sources, our findings still highlight the enigmatic nature of the LRD population.

preprint2026arXiv

Automated Reformulation of Robust Optimization via Memory-Augmented Large Language Models

Robust optimization (RO) provides a principled framework for decision-making under uncertainty, but its practical use is often limited by the need to manually reformulate uncertain optimization models into tractable deterministic counterparts. Recent large language models (LLMs) have been shown promising for automating optimization formulation, yet RO reformulation remains challenging because it requires precise multi-step reasoning and mathematically consistent transformations. To facilitate systematic evaluation of LLM-based reformulation, for which no dedicated benchmark currently exists, we develop AutoRO-Bench, a benchmark featuring an automated data generation pipeline for the core RO reformulation task and a curated dataset for the RO application task. To address the reformulation challenge, we propose Automated Reformulation with Experience Memory (AutoREM), a tuning-free memory-augmented framework that autonomously builds a structured textual experience memory by reflecting on past failed trajectories through a tailored offline adaptation procedure. AutoREM requires neither domain-specific expert knowledge nor parameter updates, and the resulting memory readily transfers across different base LLMs. Experimental results show that AutoREM consistently improves the accuracy and efficiency of RO reformulation across in-distribution datasets, out-of-distribution datasets, and diverse base LLMs.

preprint2026arXiv

Line-search and Adaptive Step Sizes for Nonconvex-strongly-concave Minimax Optimization

In this paper, we propose a novel reformulation of the smooth nonconvex-strongly-concave (NC-SC) minimax problems that casts the problem as a joint minimization. We show that our reformulation preserves not only first-order stationarity, but also global and local optimality, second-order stationarity, and the Kurdyka-Łojasiewicz (KL) property, of the original NC-SC problem, which is substantially stronger than its nonsmooth counterpart in the literature. With these enhanced structures, we design a versatile parameter-free and nonmonotone line-search framework that does not require evaluating the inner maximization. Under mild conditions, global convergence rates can be obtained, and, with KL property, full sequence convergence with asymptotic rates is also established. In particular, we show our framework is compatible with the gradient descent-ascent (GDA) algorithm. By equipping GDA with Barzilai-Borwein (BB) step sizes and nonmonotone line-search, our method exhibits superior numerical performance against the compared benchmarks.

preprint2023arXiv

On the Sample Complexity and Metastability of Heavy-tailed Policy Search in Continuous Control

Reinforcement learning is a framework for interactive decision-making with incentives sequentially revealed across time without a system dynamics model. Due to its scaling to continuous spaces, we focus on policy search where one iteratively improves a parameterized policy with stochastic policy gradient (PG) updates. In tabular Markov Decision Problems (MDPs), under persistent exploration and suitable parameterization, global optimality may be obtained. By contrast, in continuous space, the non-convexity poses a pathological challenge as evidenced by existing convergence results being mostly limited to stationarity or arbitrary local extrema. To close this gap, we step towards persistent exploration in continuous space through policy parameterizations defined by distributions of heavier tails defined by tail-index parameter alpha, which increases the likelihood of jumping in state space. Doing so invalidates smoothness conditions of the score function common to PG. Thus, we establish how the convergence rate to stationarity depends on the policy's tail index alpha, a Holder continuity parameter, integrability conditions, and an exploration tolerance parameter introduced here for the first time. Further, we characterize the dependence of the set of local maxima on the tail index through an exit and transition time analysis of a suitably defined Markov chain, identifying that policies associated with Levy Processes of a heavier tail converge to wider peaks. This phenomenon yields improved stability to perturbations in supervised learning, which we corroborate also manifests in improved performance of policy search, especially when myopic and farsighted incentives are misaligned.

preprint2022arXiv

A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP

As an important framework for safe Reinforcement Learning, the Constrained Markov Decision Process (CMDP) has been extensively studied in the recent literature. However, despite the rich results under various on-policy learning settings, there still lacks some essential understanding of the offline CMDP problems, in terms of both the algorithm design and the information theoretic sample complexity lower bound. In this paper, we focus on solving the CMDP problems where only offline data are available. By adopting the concept of the single-policy concentrability coefficient $C^*$, we establish an $Ω\left(\frac{\min\left\{|\mathcal{S}||\mathcal{A}|,|\mathcal{S}|+I\right\} C^*}{(1-γ)^3ε^2}\right)$ sample complexity lower bound for the offline CMDP problem, where $I$ stands for the number of constraints. By introducing a simple but novel deviation control mechanism, we propose a near-optimal primal-dual learning algorithm called DPDL. This algorithm provably guarantees zero constraint violation and its sample complexity matches the above lower bound except for an $\tilde{\mathcal{O}}((1-γ)^{-1})$ factor. Comprehensive discussion on how to deal with the unknown constant $C^*$ and the potential asynchronous structure on the offline dataset are also included.

preprint2022arXiv

A Unified Primal-Dual Algorithm Framework for Inequality Constrained Problems

In this paper, we propose a unified primal-dual algorithm framework based on the augmented Lagrangian function for composite convex problems with conic inequality constraints. The new framework is highly versatile. First, it not only covers many existing algorithms such as PDHG, Chambolle-Pock (CP), GDA, OGDA and linearized ALM, but also guides us to design a new efficient algorithm called Simi-OGDA (SOGDA). Second, it enables us to study the role of the augmented penalty term in the convergence analysis. Interestingly, a properly selected penalty not only improves the numerical performance of the above methods, but also theoretically enables the convergence of algorithms like PDHG and SOGDA. Under properly designed step sizes and penalty term, our unified framework preserves the $\mathcal{O}(1/N)$ ergodic convergence while not requiring any prior knowledge about the magnitude of the optimal Lagrangian multiplier. Linear convergence rate for affine equality constrained problem is also obtained given appropriate conditions. Finally, numerical experiments on linear programming, $\ell_1$ minimization problem, and multi-block basis pursuit problem demonstrate the efficiency of our methods.

preprint2021arXiv

SuperNeurons: FFT-based Gradient Sparsification in the Distributed Training of Deep Neural Networks

The performance and efficiency of distributed training of Deep Neural Networks highly depend on the performance of gradient averaging among all participating nodes, which is bounded by the communication between nodes. There are two major strategies to reduce communication overhead: one is to hide communication by overlapping it with computation, and the other is to reduce message sizes. The first solution works well for linear neural architectures, but latest networks such as ResNet and Inception offer limited opportunity for this overlapping. Therefore, researchers have paid more attention to minimizing communication. In this paper, we present a novel gradient compression framework derived from insights of real gradient distributions, and which strikes a balance between compression ratio, accuracy, and computational overhead. Our framework has two major novel components: sparsification of gradients in the frequency domain, and a range-based floating point representation to quantize and further compress gradients frequencies. Both components are dynamic, with tunable parameters that achieve different compression ratio based on the accuracy requirement and systems' platforms, and achieve very high throughput on GPUs. We prove that our techniques guarantee the convergence with a diminishing compression ratio. Our experiments show that the proposed compression framework effectively improves the scalability of most popular neural networks on a 32 GPU cluster to the baseline of no compression, without compromising the accuracy and convergence speed.

preprint2021arXiv

Zero-sum risk-sensitive continuous-time stochastic games with unbounded payoff and transition rates and Borel spaces

We study a finite-horizon two-person zero-sum risk-sensitive stochastic game for continuous-time Markov chains and Borel state and action spaces, in which payoff rates, transition rates and terminal reward functions are allowed to be unbounded from below and from above and the policies can be history-dependent. Under suitable conditions, we establish the existence of a solution to the corresponding Shapley equation (SE) by an approximation technique. Then, by the SE and the extension of the Dynkin's formula, we prove the existence of a Nash equilibrium and verify that the value of the stochastic game is the unique solution to the SE. Moreover, we develop a value iteration-type algorithm for approaching to the value of the stochastic game. The convergence of the algorithm is proved by a special contraction operator in our risk-sensitive stochastic game. Finally, we demonstrate our main results by two examples.

preprint2020arXiv

Cautious Reinforcement Learning via Distributional Risk in the Dual Domain

We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite. Prior efforts are predominately afflicted by computational challenges associated with the fact that risk-sensitive MDPs are time-inconsistent. To ameliorate this issue, we propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning. The caution measures the distributional risk of a policy, which is a function of the policy's long-term state occupancy distribution. To solve this problem in an online model-free manner, we propose a stochastic variant of primal-dual method that uses Kullback-Lieber (KL) divergence as its proximal term. We establish that the number of iterations/samples required to attain approximately optimal solutions of this scheme matches tight dependencies on the cardinality of the state and action spaces, but differs in its dependence on the infinity norm of the gradient of the risk measure. Experiments demonstrate the merits of this approach for improving the reliability of reward accumulation without additional computational burdens.

preprint2020arXiv

Cubic Regularized Newton Method for Saddle Point Models: a Global and Local Convergence Analysis

In this paper, we propose a cubic regularized Newton (CRN) method for solving convex-concave saddle point problems (SPP). At each iteration, a cubic regularized saddle point subproblem is constructed and solved, which provides a search direction for the iterate. With properly chosen stepsizes, the method is shown to converge to the saddle point with global linear and local superlinear convergence rates, if the saddle point function is gradient Lipschitz and strongly-convex-strongly-concave. In the case that the function is merely convex-concave, we propose a homotopy continuation (or path-following) method. Under a Lipschitz-type error bound condition, we present an iteration complexity bound of $\mathcal{O}\left(\ln \left(1/ε\right)\right)$ to reach an $ε$-solution through a homotopy continuation approach, and the iteration complexity bound becomes $\mathcal{O}\left(\left(1/ε\right)^{\frac{1-θ}{θ^2}}\right)$ under a Hölderian-type error bound condition involving a parameter $θ$ ($0<θ<1$).

preprint2020arXiv

Finite-size analysis of continuous variable source-independent quantum random number generation

We study the impact of finite-size effect on continuous variable source-independent quantum random number generation. The central-limit theorem and maximum likelihood estimation theorem are used to derive the formula which could output the statistical fluctuations and determine upper bound of parameters of practical quantum random number generation. With these results, we can see the check data length and confidence probability has intense relevance to the final randomness, which can be adjusted according to the demand in implementation. Besides, other key parameters, such as sampling range size and sampling resolution, have also been considered in detail. It is found that the distribution of quantified output related with sampling range size has significant effects on the loss of final randomness due to finite-size effect. The overall results indicate that the finite-size effect should be taken into consideration for implementing the continuous variable source-independent quantum random number generation in practical.

preprint2020arXiv

Generalization Bounds for Stochastic Saddle Point Problems

This paper studies the generalization bounds for the empirical saddle point (ESP) solution to stochastic saddle point (SSP) problems. For SSP with Lipschitz continuous and strongly convex-strongly concave objective functions, we establish an $\mathcal{O}(1/n)$ generalization bound by using a uniform stability argument. We also provide generalization bounds under a variety of assumptions, including the cases without strong convexity and without bounded domains. We illustrate our results in two examples: batch policy learning in Markov decision process, and mixed strategy Nash equilibrium estimation for stochastic games. In each of these examples, we show that a regularized ESP solution enjoys a near-optimal sample complexity. To the best of our knowledge, this is the first set of results on the generalization theory of ESP.

preprint2020arXiv

On the Divergence of Decentralized Non-Convex Optimization

We study a generic class of decentralized algorithms in which $N$ agents jointly optimize the non-convex objective $f(u):=1/N\sum_{i=1}^{N}f_i(u)$, while only communicating with their neighbors. This class of problems has become popular in modeling many signal processing and machine learning applications, and many efficient algorithms have been proposed. However, by constructing some counter-examples, we show that when certain local Lipschitz conditions (LLC) on the local function gradient $\nabla f_i$&#39;s are not satisfied, most of the existing decentralized algorithms diverge, even if the global Lipschitz condition (GLC) is satisfied, where the sum function $f$ has Lipschitz gradient. This observation raises an important open question: How to design decentralized algorithms when the LLC, or even the GLC, is not satisfied? To address the above question, we design a first-order algorithm called Multi-stage gradient tracking algorithm (MAGENTA), which is capable of computing stationary solutions with neither the LLC nor the GLC. In particular, we show that the proposed algorithm converges sublinearly to certain $ε$-stationary solution, where the precise rate depends on various algorithmic and problem parameters. In particular, if the local function $f_i$&#39;s are $Q$th order polynomials, then the rate becomes $\mathcal{O}(1/ε^{Q-1})$. Such a rate is tight for the special case of $Q=2$ where each $f_i$ satisfies LLC. To our knowledge, this is the first attempt that studies decentralized non-convex optimization problems with neither the LLC nor the GLC.

preprint2020arXiv

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In recent years, reinforcement learning (RL) systems with general goals beyond a cumulative sum of rewards have gained traction, such as in constrained problems, exploration, and acting upon prior experiences. In this paper, we consider policy optimization in Markov Decision Problems, where the objective is a general concave utility function of the state-action occupancy measure, which subsumes several of the aforementioned examples as special cases. Such generality invalidates the Bellman equation. As this means that dynamic programming no longer works, we focus on direct policy search. Analogously to the Policy Gradient Theorem \cite{sutton2000policy} available for RL with cumulative rewards, we derive a new Variational Policy Gradient Theorem for RL with general utilities, which establishes that the parametrized policy gradient may be obtained as the solution of a stochastic saddle point problem involving the Fenchel dual of the utility function. We develop a variational Monte Carlo gradient estimation algorithm to compute the policy gradient based on sample paths. We prove that the variational policy gradient scheme converges globally to the optimal policy for the general objective, though the optimization problem is nonconvex. We also establish its rate of convergence of the order $O(1/t)$ by exploiting the hidden convexity of the problem, and proves that it converges exponentially when the problem admits hidden strong convexity. Our analysis applies to the standard RL problem with cumulative rewards as a special case, in which case our result improves the available convergence rate.

preprint2020arXiv

Wireless Communication Based on Microwave Photon-Level Detection With Superconducting Devices: Achievable Rate Prediction

Future wireless communication system embraces physical-layer signal detection with high sensitivity, especially in the microwave photon level. Currently, the receiver primarily adopts the signal detection based on semi-conductor devices for signal detection, while this paper introduces high-sensitivity photon-level microwave detection based on superconducting structure. We first overview existing works on the photon-level communication in the optical spectrum as well as the microwave photon-level sensing based on superconducting structure in both theoretical and experimental perspectives, including microwave detection circuit model based on Josephson junction, microwave photon counter based on Josephson junction, and two reconstruction approaches under background noise. In addition, we characterize channel modeling based on two different microwave photon detection approaches, including the absorption barrier and the dual-path Handury Brown-Twiss (HBT) experiments, and predict the corresponding achievable rates. According to the performance prediction, it is seen that the microwave photon-level signal detection can increase the receiver sensitivity compared with the state-of-the-art standardized communication system with waveform signal reception, with gain over $10$dB.