Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
21works
0followers
22topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

21 published item(s)

preprint2026arXiv

Towards Differentially Private Reinforcement Learning with General Function Approximation

We present the first theoretical guarantees for differentially private online reinforcement learning (RL) with general function approximation, extending beyond prior work restricted to tabular and linear settings. Our approach combines a batched policy update scheme with the exponential mechanism, together with a novel regret analysis. We show that, even under general function approximation, the regret in the model-free setting under differential privacy matches the state of the art for the linear case, scaling as $\widetilde{O}(K^{3/5})$, where $K$ denotes the number of episodes. As an important by-product, we also establish the first regret bound for online RL with batch update that depends on the standard complexity measure of coverability, complementing existing results based on a newly introduced Eluder-Condition class. In addition, we uncover fundamental gaps in recent results for private RL with linear function approximation, thereby clarifying its landscape.

preprint2025arXiv

Improved Bounds for Private and Robust Alignment

In this paper, we study the private and robust alignment of language models from a theoretical perspective by establishing upper bounds on the suboptimality gap in both offline and online settings. We consider preference labels subject to privacy constraints and/or adversarial corruption, and analyze two distinct interplays between them: privacy-first and corruption-first. For the privacy-only setting, we show that log loss with an MLE-style algorithm achieves near-optimal rates, in contrast to conventional wisdom. For the joint privacy-and-corruption setting, we first demonstrate that existing offline algorithms in fact provide stronger guarantees -- simultaneously in terms of corruption level and privacy parameters -- than previously known, which further yields improved bounds in the corruption-only regime. In addition, we also present the first set of results for private and robust online alignment. Our results are enabled by new uniform convergence guarantees for log loss and square loss under privacy and corruption, which we believe have broad applicability across learning theory and statistics.

preprint2023arXiv

Provably Efficient Model-Free Constrained RL with Linear Function Approximation

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ regret and $\tilde{\mathcal{O}}(\sqrt{d^3H^3T})$ constraint violation bounds can be achieved, where $d$ is the dimension of the feature mapping, $H$ is the length of the episode, and $T$ is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to $T$.

preprint2022arXiv

Adversarial Attack via Dual-Stage Network Erosion

Deep neural networks are vulnerable to adversarial examples, which can fool deep models by adding subtle perturbations. Although existing attacks have achieved promising results, it still leaves a long way to go for generating transferable adversarial examples under the black-box setting. To this end, this paper proposes to improve the transferability of adversarial examples, and applies dual-stage feature-level perturbations to an existing model to implicitly create a set of diverse models. Then these models are fused by the longitudinal ensemble during the iterations. The proposed method is termed Dual-Stage Network Erosion (DSNE). We conduct comprehensive experiments both on non-residual and residual networks, and obtain more transferable adversarial examples with the computational cost similar to the state-of-the-art method. In particular, for the residual networks, the transferability of the adversarial examples can be significantly improved by biasing the residual block information to the skip connections. Our work provides new insights into the architectural vulnerability of neural networks and presents new challenges to the robustness of neural networks.

preprint2022arXiv

Deep Generative Survival Analysis: Nonparametric Estimation of Conditional Survival Function

We propose a deep generative approach to nonparametric estimation of conditional survival and hazard functions with right-censored data. The key idea of the proposed method is to first learn a conditional generator for the joint conditional distribution of the observed time and censoring indicator given the covariates, and then construct the Kaplan-Meier and Nelson-Aalen estimators based on this conditional generator for the conditional hazard and survival functions. Our method combines ideas from the recently developed deep generative learning and classical nonparametric estimation in survival analysis. We analyze the convergence properties of the proposed method and establish the consistency of the generative nonparametric estimators of the conditional survival and hazard functions. Our numerical experiments validate the proposed method and demonstrate its superior performance in a range of simulated models. We also illustrate the applications of the proposed method in constructing prediction intervals for survival times with the PBC (Primary Biliary Cholangitis) and SUPPORT (Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments) datasets.

preprint2022arXiv

Differentially Private Reinforcement Learning with Linear Function Approximation

Motivated by the wide adoption of reinforcement learning (RL) in real-world personalized services, where users' sensitive and private information needs to be protected, we study regret minimization in finite-horizon Markov decision processes (MDPs) under the constraints of differential privacy (DP). Compared to existing private RL algorithms that work only on tabular finite-state, finite-actions MDPs, we take the first step towards privacy-preserving learning in MDPs with large state and action spaces. Specifically, we consider MDPs with linear function approximation (in particular linear mixture MDPs) under the notion of joint differential privacy (JDP), where the RL agent is responsible for protecting users' sensitive data. We design two private RL algorithms that are based on value iteration and policy optimization, respectively, and show that they enjoy sub-linear regret performance while guaranteeing privacy protection. Moreover, the regret bounds are independent of the number of states, and scale at most logarithmically with the number of actions, making the algorithms suitable for privacy protection in nowadays large-scale personalized services. Our results are achieved via a general procedure for learning in linear mixture MDPs under changing regularizers, which not only generalizes previous results for non-private learning, but also serves as a building block for general private reinforcement learning.

preprint2022arXiv

Interference Constrained Beam Alignment for Time-Varying Channels via Kernelized Bandits

To fully utilize the abundant spectrum resources in millimeter wave (mmWave), Beam Alignment (BA) is necessary for large antenna arrays to achieve large array gains. In practical dynamic wireless environments, channel modeling is challenging due to time-varying and multipath effects. In this paper, we formulate the beam alignment problem as a non-stationary online learning problem with the objective to maximize the received signal strength under interference constraint. In particular, we employ the non-stationary kernelized bandit to leverage the correlation among beams and model the complex beamforming and multipath channel functions. Furthermore, to mitigate interference to other user equipment, we leverage the primal-dual method to design a constrained UCB-type kernelized bandit algorithm. Our theoretical analysis indicates that the proposed algorithm can adaptively adjust the beam in time-varying environments, such that both the cumulative regret of the received signal and constraint violations have sublinear bounds with respect to time. This result is of independent interest for applications such as adaptive pricing and news ranking. In addition, the algorithm assumes the channel is a black-box function and does not require any prior knowledge for dynamic channel modeling, and thus is applicable in a variety of scenarios. We further show that if the information about the channel variation is known, the algorithm will have better theoretical guarantees and performance. Finally, we conduct simulations to highlight the effectiveness of the proposed algorithm.

preprint2022arXiv

Learning Coated Adversarial Camouflages for Object Detectors

An adversary can fool deep neural network object detectors by generating adversarial noises. Most of the existing works focus on learning local visible noises in an adversarial "patch" fashion. However, the 2D patch attached to a 3D object tends to suffer from an inevitable reduction in attack performance as the viewpoint changes. To remedy this issue, this work proposes the Coated Adversarial Camouflage (CAC) to attack the detectors in arbitrary viewpoints. Unlike the patch trained in the 2D space, our camouflage generated by a conceptually different training framework consists of 3D rendering and dense proposals attack. Specifically, we make the camouflage perform 3D spatial transformations according to the pose changes of the object. Based on the multi-view rendering results, the top-n proposals of the region proposal network are fixed, and all the classifications in the fixed dense proposals are attacked simultaneously to output errors. In addition, we build a virtual 3D scene to fairly and reproducibly evaluate different attacks. Extensive experiments demonstrate the superiority of CAC over the existing attacks, and it shows impressive performance both in the virtual scene and the real world. This poses a potential threat to the security-critical computer vision systems.

preprint2022arXiv

Learning to Control under Time-Varying Environment

This paper investigates the problem of regret minimization in linear time-varying (LTV) dynamical systems. Due to the simultaneous presence of uncertainty and non-stationarity, designing online control algorithms for unknown LTV systems remains a challenging task. At a cost of NP-hard offline planning, prior works have introduced online convex optimization algorithms, although they suffer from nonparametric rate of regret. In this paper, we propose the first computationally tractable online algorithm with regret guarantees that avoids offline planning over the state linear feedback policies. Our algorithm is based on the optimism in the face of uncertainty (OFU) principle in which we optimistically select the best model in a high confidence region. Our algorithm is then more explorative when compared to previous approaches. To overcome non-stationarity, we propose either a restarting strategy (R-OFU) or a sliding window (SW-OFU) strategy. With proper configuration, our algorithm is attains sublinear regret $O(T^{2/3})$. These algorithms utilize data from the current phase for tracking variations on the system dynamics. We corroborate our theoretical findings with numerical experiments, which highlight the effectiveness of our methods. To the best of our knowledge, our study establishes the first model-based online algorithm with regret guarantees under LTV dynamical systems.

preprint2022arXiv

Model-Driven Deep Learning-Based MIMO-OFDM Detector: Design, Simulation, and Experimental Results

Multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM), a fundamental transmission scheme, promises high throughput and robustness against multipath fading. However, these benefits rely on the efficient detection strategy at the receiver and come at the expense of the extra bandwidth consumed by the cyclic prefix (CP). We use the iterative orthogonal approximate message passing (OAMP) algorithm in this paper as the prototype of the detector because of its remarkable potential for interference suppression. However, OAMP is computationally expensive for the matrix inversion per iteration. We replace the matrix inversion with the conjugate gradient (CG) method to reduce the complexity of OAMP. We further unfold the CG-based OAMP algorithm into a network and tune the critical parameters through deep learning (DL) to enhance detection performance. Simulation results and complexity analysis show that the proposed scheme has significant gain over other iterative detection methods and exhibits comparable performance to the state-of-the-art DL-based detector at a reduced computational cost. Furthermore, we design a highly efficient CP-free MIMO-OFDM receiver architecture to remove the CP overhead. This architecture first eliminates the intersymbol interference by buffering the previously recovered data and then detects the signal using the proposed detector. Numerical experiments demonstrate that the designed receiver offers a higher spectral efficiency than traditional receivers. Finally, over-the-air tests verify the effectiveness and robustness of the proposed scheme in realistic environments.

preprint2022arXiv

On Kernelized Multi-Armed Bandits with Constraints

We study a stochastic bandit problem with a general unknown reward function and a general unknown constraint function. Both functions can be non-linear (even non-convex) and are assumed to lie in a reproducing kernel Hilbert space (RKHS) with a bounded norm. This kernelized bandit setup strictly generalizes standard multi-armed bandits and linear bandits. In contrast to safety-type hard constraints studied in prior works, we consider soft constraints that may be violated in any round as long as the cumulative violations are small, which is motivated by various practical applications. Our ultimate goal is to study how to utilize the nature of soft constraints to attain a finer complexity-regret-constraint trade-off in the kernelized bandit setting. To this end, leveraging primal-dual optimization, we propose a general framework for both algorithm design and performance analysis. This framework builds upon a novel sufficient condition, which not only is satisfied under general exploration strategies, including \emph{upper confidence bound} (UCB), \emph{Thompson sampling} (TS), and new ones based on \emph{random exploration}, but also enables a unified analysis for showing both sublinear regret and sublinear or even zero constraint violation. We demonstrate the superior performance of our proposed algorithms via numerical experiments based on both synthetic and real-world datasets. Along the way, we also make the first detailed comparison between two popular methods for analyzing constrained bandits and Markov decision processes (MDPs) by discussing the key difference and some subtleties in the analysis, which could be of independent interest to the communities.

preprint2022arXiv

Shuffle Private Linear Contextual Bandits

Differential privacy (DP) has been recently introduced to linear contextual bandits to formally address the privacy concerns in its associated personalized services to participating users (e.g., recommendations). Prior work largely focus on two trust models of DP: the central model, where a central server is responsible for protecting users sensitive data, and the (stronger) local model, where information needs to be protected directly on user side. However, there remains a fundamental gap in the utility achieved by learning algorithms under these two privacy models, e.g., $\tilde{O}(\sqrt{T})$ regret in the central model as compared to $\tilde{O}(T^{3/4})$ regret in the local model, if all users are unique within a learning horizon $T$. In this work, we aim to achieve a stronger model of trust than the central model, while suffering a smaller regret than the local model by considering recently popular shuffle model of privacy. We propose a general algorithmic framework for linear contextual bandits under the shuffle trust model, where there exists a trusted shuffler in between users and the central server, that randomly permutes a batch of users data before sending those to the server. We then instantiate this framework with two specific shuffle protocols: one relying on privacy amplification of local mechanisms, and another incorporating a protocol for summing vectors and matrices of bounded norms. We prove that both these instantiations lead to regret guarantees that significantly improve on that of the local model, and can potentially be of the order $\tilde{O}(T^{3/5})$ if all users are unique. We also verify this regret behavior with simulations on synthetic data. Finally, under the practical scenario of non-unique users, we show that the regret of our shuffle private algorithm scale as $\tilde{O}(T^{2/3})$, which matches that the central model could achieve in this case.

preprint2022arXiv

Temporal and Spatial Chaos of RN-AdS Black Holes Immersed in Perfect Fluid Dark Matter

We investigate the thermodynamic chaos of RN-AdS black holes immersed in Perfect Fluid Dark Matter by considering the dynamical equations of the fluid system evolved in the spinodal region. Based on the Melnikov method, it is shown that there exists a critical amplitude that affects the temporal chaos. And the influence of black holes charge and state parameter on the critical amplitude is investigated with specific initial temperature. Then, for inevitable spatial chaos, three different types of portraits are discssued according to the difference between the phase transition pressure and the ambient pressure. Additionally, we check the local equilibrium near saddle points which shows that spatial chaos always exists regardless of the perturbation intensity.

preprint2022arXiv

The Ages of Optically Bright Sub-Clusters in the Serpens Star-Forming Region

The Serpens Molecular Cloud is one of the most active star-forming regions within 500 pc, with over one thousand of YSOs at different evolutionary stages. The ages of the member stars inform us about the star formation history of the cloud. In this paper, we develop a spectral energy distribution (SED) fitting method for nearby evolved (diskless) young stars from members of the Pleiades to estimate their ages, with a temperature scale adopted from APOGEE spectra. When compared with literature temperatures of selected YSOs in Orion, the SED fits to cool (<5000 K) stars have temperatures that differ by an average of <~ 50 K and have a scatter of ~ 210 K for both disk-hosting and diskless stars. We then apply this method to YSOs in the Serpens Molecular Cloud to estimate ages of optical members previously identified from Gaia DR2 astrometry data. The optical members in Serpens are concentrated in different subgroups with ages from ~4 Myr to ~22 Myr; the youngest clusters, W40 and Serpens South, are dusty regions that lack enough optical members to be included in this analysis. These ages establish that the Serpens Molecular Cloud has been forming stars for much longer than has been inferred from infrared surveys.

preprint2022arXiv

Weighted Gaussian Process Bandits for Non-stationary Environments

In this paper, we consider the Gaussian process (GP) bandit optimization problem in a non-stationary environment. To capture external changes, the black-box function is allowed to be time-varying within a reproducing kernel Hilbert space (RKHS). To this end, we develop WGP-UCB, a novel UCB-type algorithm based on weighted Gaussian process regression. A key challenge is how to cope with infinite-dimensional feature maps. To that end, we leverage kernel approximation techniques to prove a sublinear regret bound, which is the first (frequentist) sublinear regret guarantee on weighted time-varying bandits with general nonlinear rewards. This result generalizes both non-stationary linear bandits and standard GP-UCB algorithms. Further, a novel concentration inequality is achieved for weighted Gaussian process regression with general weights. We also provide universal upper bounds and weight-dependent upper bounds for weighted maximum information gains. These results are of independent interest for applications such as news ranking and adaptive pricing, where weights can be adopted to capture the importance or quality of data. Finally, we conduct experiments to highlight the favorable gains of the proposed algorithm in many cases when compared to existing methods.

preprint2020arXiv

A Note on Load Balancing in Many-Server Heavy-Traffic Regime

In this note, we apply Stein&#39;s method to analyze the performance of general load balancing schemes in the many-server heavy-traffic regime. In particular, consider a load balancing system of $N$ servers and the distance of arrival rate to the capacity region is given by $N^{1-α}$ with $α> 1$. We are interested in the performance as $N$ goes to infinity under a large class of policies. We establish different asymptotics under different scalings and conditions. Specifically, (i) If the second moments linearly increase with $N$ with coefficients $σ_a^2$ and $ν_s^2$, then for any $α> 4$, the distribution of the sum queue length scaled by $N^{-α}$ converges to an exponential random variable with mean $\frac{σ_a^2 + ν_s^2}{2}$. (3) If the second moments quadratically increase with $N$ with coefficients $\tildeσ_a^2$ and $\tildeν_s^2$, then for any $α> 3$, the distribution of the sum queue length scaled by $N^{-α-1}$ converges to an exponential random variable with mean $\frac{\tildeσ_a^2 + \tildeν_s^2}{2}$. Both results are simple applications of our previously developed framework of Stein&#39;s method for heavy-traffic analysis in \cite{zhou2020note}.

preprint2020arXiv

Asymptotically Optimal Load Balancing in Large-scale Heterogeneous Systems with Multiple Dispatchers

We consider the load balancing problem in large-scale heterogeneous systems with multiple dispatchers. We introduce a general framework called Local-Estimation-Driven (LED). Under this framework, each dispatcher keeps local (possibly outdated) estimates of queue lengths for all the servers, and the dispatching decision is made purely based on these local estimates. The local estimates are updated via infrequent communications between dispatchers and servers. We derive sufficient conditions for LED policies to achieve throughput optimality and delay optimality in heavy-traffic, respectively. These conditions directly imply delay optimality for many previous local-memory based policies in heavy traffic. Moreover, the results enable us to design new delay optimal policies for heterogeneous systems with multiple dispatchers. Finally, the heavy-traffic delay optimality of the LED framework directly resolves a recent open problem on how to design optimal load balancing schemes using delayed information.

preprint2020arXiv

Experimental review of the $Υ(1S,2S,3S)$ physics at $e^+e^-$ colliders and the LHC

The three lowest-lying $Υ$ states, i.e. $Υ(1S)$, $Υ(2S)$, and $Υ(3S)$, composed of $b\bar b$ pairs and below the $B\bar B$ threshold, provide a good platform for the researches of hadronic physics and physics beyond the Standard Model. They can be produced directly in $e^+e^-$ colliding experiments, such as CLEO, Babar, and Belle, with low continuum backgrounds. In these experiments, many measurements of the exclusive $Υ(1S)$ and $Υ(2S)$ decays into light hadrons, which shed light on the &#34;80\% rule&#34; for the Okubo-Zweig-Iizuka suppressed decays in the bottomonium sector, were carried out. Meanwhile, many studies of the charmonium and bottomonium productions in $Υ(1S,2S,3S)$ decays were performed, to distinguish different Quantum Chromodynamics (QCD) models. Besides, exotic states and new physics were also extensively explored in $Υ(1S,2S,3S)$ decays at CLEO, BaBar, and Belle. The $Υ(1S,2S,3S)$ states can also be produced in $pp$ collisions and in collisions involving heavy ions. The precision measurements of their cross sections and polarizations at the large hadron collider (LHC), especially in the CMS, ATLAS, and LHCb experiments, help to understand $Υ$ production mechanisms in $pp$ collisions. The observation of the sequential $Υ$ suppression in heavy ion collisions at CMS is of great importance for verifying the quark-gluon plasma predicted by QCD. In this article, we review the experimental results on $Υ(1S,2S,3S)$ at $e^+e^-$ colliders and the LHC, and summarize their prospects at Belle II and the LHC.

preprint2020arXiv

Multi-Armed Bandits with Local Differential Privacy

This paper investigates the problem of regret minimization for multi-armed bandit (MAB) problems with local differential privacy (LDP) guarantee. In stochastic bandit systems, the rewards may refer to the users&#39; activities, which may involve private information and the users may not want the agent to know. However, in many cases, the agent needs to know these activities to provide better services such as recommendations and news feeds. To handle this dilemma, we adopt differential privacy and study the regret upper and lower bounds for MAB algorithms with a given LDP guarantee. In this paper, we prove a lower bound and propose algorithms whose regret upper bounds match the lower bound up to constant factors. Numerical experiments also confirm our conclusions.

preprint2020arXiv

Optimal Load Balancing in Bipartite Graphs

Applications in cloud platforms motivate the study of efficient load balancing under job-server constraints and server heterogeneity. In this paper, we study load balancing on a bipartite graph where left nodes correspond to job types and right nodes correspond to servers, with each edge indicating that a job type can be served by a server. Thus edges represent locality constraints, i.e., each job can only be served at servers which contained certain data and/or machine learning (ML) models. Servers in this system can have heterogeneous service rates. In this setting, we investigate the performance of two policies named Join-the-Fastest-of-the-Shortest-Queue (JFSQ) and Join-the-Fastest-of-the-Idle-Queue (JFIQ), which are simple variants of Join-the-Shortest-Queue and Join-the-Idle-Queue, where ties are broken in favor of the fastest servers. Under a &#34;well-connected&#34; graph condition, we show that JFSQ and JFIQ are asymptotically optimal in the mean response time when the number of servers goes to infinity. In addition to asymptotic optimality, we also obtain upper bounds on the mean response time for finite-size systems. We further show that the well-connectedness condition can be satisfied by a random bipartite graph construction with relatively sparse connectivity.

preprint2020arXiv

TopoAna: A generic tool for the event type analysis of inclusive Monte-Carlo samples in high energy physics experiments

Inclusive Monte-Carlo samples are indispensable for signal selection and background suppression in many high energy physics experiments. A clear knowledge of the physics processes involved in the samples, including the types of processes and the number of processes in each type, is a great help to investigating signals and backgrounds. To help analysts obtain the physics process information from the truth information of the samples, we develop a physics process analysis program, TopoAna, with C++, ROOT, and LaTeX. The program implements the functionalities of component analysis and signal identification with many kinds of fine, customizable classification and matching algorithms. It tags physics processes in individual events accurately in the output root files, and exports the physics process information at the sample level clearly to the output plain text, tex source, and pdf files. Independent of specific software frameworks, the program is applicable to many experiments. At present, it has come into use in three $e^+e^-$ colliding experiments: the BESIII, Belle, and Belle II experiments. The use of the program in other similar experiments is also prospective.