Researcher profile

Lingxiao Wang

Lingxiao Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
13works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

13 published item(s)

preprint2023arXiv

Exploration in Model-based Reinforcement Learning with Randomized Reward

Model-based Reinforcement Learning (MBRL) has been widely adapted due to its sample efficiency. However, existing worst-case regret analysis typically requires optimistic planning, which is not realistic in general. In contrast, motivated by the theory, empirical study utilizes ensemble of models, which achieve state-of-the-art performance on various testing environments. Such deviation between theory and empirical study leads us to question whether randomized model ensemble guarantee optimism, and hence the optimal worst-case regret? This paper partially answers such question from the perspective of reward randomization, a scarcely explored direction of exploration with MBRL. We show that under the kernelized linear regulator (KNR) model, reward randomization guarantees a partial optimism, which further yields a near-optimal worst-case regret in terms of the number of interactions. We further extend our theory to generalized function approximation and identified conditions for reward randomization to attain provably efficient exploration. Correspondingly, we propose concrete examples of efficient reward randomization. To the best of our knowledge, our analysis establishes the first worst-case regret analysis on randomized MBRL with function approximation.

preprint2022arXiv

Neural network reconstruction of the dense matter equation of state from neutron star observables

The Equation of State (EoS) of strongly interacting cold and hot ultra-dense QCD matter remains a major challenge in the field of nuclear astrophysics. With the advancements in measurements of neutron star masses, radii, and tidal deformabilities, from electromagnetic and gravitational wave observations, neutron stars play an important role in constraining the ultra-dense QCD matter EoS. In this work, we present a novel method that exploits deep learning techniques to reconstruct the neutron star EoS from mass-radius (M-R) observations. We employ neural networks (NNs) to represent the EoS in a model-independent way, within the range of $\sim$1-7 times the nuclear saturation density. The unsupervised Automatic Differentiation (AD) framework is implemented to optimize the EoS, so as to yield through TOV equations, an M-R curve that best fits the observations. We demonstrate that this method works by rebuilding the EoS on mock data, i.e., mass-radius pairs derived from a randomly generated polytropic EoS. The reconstructed EoS fits the mock data with reasonable accuracy, using just 11 mock M-R pairs observations, close to the current number of actual observations.

preprint2022arXiv

Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning

Offline Reinforcement Learning (RL) aims to learn policies from previously collected datasets without exploring the environment. Directly applying off-policy algorithms to offline RL usually fails due to the extrapolation error caused by the out-of-distribution (OOD) actions. Previous methods tackle such problem by penalizing the Q-values of OOD actions or constraining the trained policy to be close to the behavior policy. Nevertheless, such methods typically prevent the generalization of value functions beyond the offline data and also lack precise characterization of OOD data. In this paper, we propose Pessimistic Bootstrapping for offline RL (PBRL), a purely uncertainty-driven offline algorithm without explicit policy constraints. Specifically, PBRL conducts uncertainty quantification via the disagreement of bootstrapped Q-functions, and performs pessimistic updates by penalizing the value function based on the estimated uncertainty. To tackle the extrapolating error, we further propose a novel OOD sampling method. We show that such OOD sampling and pessimistic bootstrapping yields provable uncertainty quantifier in linear MDPs, thus providing the theoretical underpinning for PBRL. Extensive experiments on D4RL benchmark show that PBRL has better performance compared to the state-of-the-art algorithms.

preprint2021arXiv

Adjusted Logistic Propensity Weighting Methods for Population Inference using Nonprobability Volunteer-Based Epidemiologic Cohorts

Many epidemiologic studies forgo probability sampling and turn to nonprobability volunteer-based samples because of cost, response burden, and invasiveness of biological samples. However, finite population inference is difficult to make from the nonprobability samples due to the lack of population representativeness. Aiming for making inferences at the population level using nonprobability samples, various inverse propensity score weighting (IPSW) methods have been studied with the propensity defined by the participation rate of population units in the nonprobability sample. In this paper, we propose an adjusted logistic propensity weighting (ALP) method to estimate the participation rates for nonprobability sample units. Compared to existing IPSW methods, the proposed ALP method is easy to implement by ready-to-use software while producing approximately unbiased estimators for population quantities regardless of the nonprobability sample rate. The efficiency of the ALP estimator can be further improved by scaling the survey sample weights in propensity estimation. Taylor linearization variance estimators are proposed for ALP estimators of finite population means that account for all sources of variability. The proposed ALP methods are evaluated numerically via simulation studies and empirically using the naïve unweighted National Health and Nutrition Examination Survey III sample, while taking the 1997 National Health Interview Survey as the reference, to estimate the 15-year mortality rates.

preprint2021arXiv

Learning Langevin dynamics with QCD phase transition

In this proceeding, the deep Convolutional Neural Networks (CNNs) are deployed to recognize the order of QCD phase transition and predict the dynamical parameters in Langevin processes. To overcome the intrinsic randomness existed in a stochastic process, we treat the final spectra as image-type inputs which preserve sufficient spatiotemporal correlations. As a practical example, we demonstrate this paradigm for the scalar condensation in QCD matter near the critical point, in which the order parameter of chiral phase transition can be characterized in a $1+1$-dimensional Langevin equation for $σ$ field. The well-trained CNNs accurately classify the first-order phase transition and crossover from $σ$ field configurations with fluctuations, in which the noise does not impair the performance of the recognition. In reconstructing the dynamics, we demonstrate it is robust to extract the damping coefficients $η$ from the intricate field configurations.

preprint2021arXiv

Measuring the rationality in evacuation behavior with deep learning

The bounded rationality is a crucial component in human behaviors. It plays a key role in the typical collective behavior of evacuation, in which the heterogeneous information leads to the deviation of rational choices. In this study, we propose a deep learning framework to extract the quantitative deviation which emerges in a cellular automaton (CA) model describing the evacuation. The well-trained deep convolutional neural networks (CNNs) accurately predict the rational factors from multi-frame images generated by the CA model. In addition, it should be noted that the performance of this machine is robust to the incomplete images corresponding to global information loss. Moreover, this framework provides us with a playground in which the rationality is measured in evacuation and the scheme could also be generalized to other well-designed virtual experiments.

preprint2021arXiv

Revisiting Membership Inference Under Realistic Assumptions

We study membership inference in settings where some of the assumptions typically used in previous research are relaxed. First, we consider skewed priors, to cover cases such as when only a small fraction of the candidate pool targeted by the adversary are actually members and develop a PPV-based metric suitable for this setting. This setting is more realistic than the balanced prior setting typically considered by researchers. Second, we consider adversaries that select inference thresholds according to their attack goals and develop a threshold selection procedure that improves inference attacks. Since previous inference attacks fail in imbalanced prior setting, we develop a new inference attack based on the intuition that inputs corresponding to training set members will be near a local minimum in the loss function, and show that an attack that combines this with thresholds on the per-instance loss can achieve high PPV even in settings where other attacks appear to be ineffective. Code for our experiments can be found here: https://github.com/bargavj/EvaluatingDPML.

preprint2020arXiv

Breaking the Curse of Many Agents: Provable Mean Embedding Q-Iteration for Mean-Field Reinforcement Learning

Multi-agent reinforcement learning (MARL) achieves significant empirical successes. However, MARL suffers from the curse of many agents. In this paper, we exploit the symmetry of agents in MARL. In the most generic form, we study a mean-field MARL problem. Such a mean-field MARL is defined on mean-field states, which are distributions that are supported on continuous space. Based on the mean embedding of the distributions, we propose MF-FQI algorithm that solves the mean-field MARL and establishes a non-asymptotic analysis for MF-FQI algorithm. We highlight that MF-FQI algorithm enjoys a "blessing of many agents" property in the sense that a larger number of observed agents improves the performance of MF-FQI algorithm.

preprint2020arXiv

Mode Decomposed Chiral Magnetic Effect and Rotating Fermions

We present a novel perspective to characterize the chiral magnetic and related effects in terms of angular decomposed modes. We find that the vector current and the chirality density are connected through a surprisingly simple relation for all the modes and any mass, which defines the mode decomposed chiral magnetic effect in such a way free from the chiral chemical potential. The mode decomposed formulation is useful also to investigate properties of rotating fermions. For demonstration we give an intuitive account for a nonzero density emerging from a combination of rotation and magnetic field as well as an approach to the chiral vortical effect at finite density.

preprint2020arXiv

Neural Network Statistical Mechanics

We propose a general framework to extract microscopic interactions from raw configurations with deep neural networks. The approach replaces the modeling Hamiltonian by the neural networks, in which the interaction is encoded. It can be trained with data collected from Ab initio computations or experiments. The well-trained neural networks give an accurate estimation of the possibility distribution of the configurations at fixed external parameters. It can be spontaneously extrapolated to detect the phase structures since classical statistical mechanics as prior knowledge here. We apply the approach to a 2D spin system, training at a fixed temperature, and reproducing the phase structure. Scaling the configuration on lattice exhibits the interaction changes with the degree of freedom, which can be naturally applied to the experimental measurements. Our approach bridges the gap between the real configurations and the microscopic dynamics with an autoregressive neural network.

preprint2020arXiv

On the Global Optimality of Model-Agnostic Meta-Learning

Model-agnostic meta-learning (MAML) formulates meta-learning as a bilevel optimization problem, where the inner level solves each subtask based on a shared prior, while the outer level searches for the optimal shared prior by optimizing its aggregated performance over all the subtasks. Despite its empirical success, MAML remains less understood in theory, especially in terms of its global optimality, due to the nonconvexity of the meta-objective (the outer-level objective). To bridge such a gap between theory and practice, we characterize the optimality gap of the stationary points attained by MAML for both reinforcement learning and supervised learning, where the inner-level and outer-level problems are solved via first-order optimization methods. In particular, our characterization connects the optimality gap of such stationary points with (i) the functional geometry of inner-level objectives and (ii) the representation power of function approximators, including linear models and neural networks. To the best of our knowledge, our analysis establishes the global optimality of MAML with nonconvex meta-objectives for the first time.

preprint2020arXiv

Provably Efficient Causal Reinforcement Learning with Confounded Observational Data

Empowered by expressive function approximators such as neural networks, deep reinforcement learning (DRL) achieves tremendous empirical successes. However, learning expressive function approximators requires collecting a large dataset (interventional data) by interacting with the environment. Such a lack of sample efficiency prohibits the application of DRL to critical scenarios, e.g., autonomous driving and personalized medicine, since trial and error in the online setting is often unsafe and even unethical. In this paper, we study how to incorporate the dataset (observational data) collected offline, which is often abundantly available in practice, to improve the sample efficiency in the online setting. To incorporate the possibly confounded observational data, we propose the deconfounded optimistic value iteration (DOVI) algorithm, which incorporates the confounded observational data in a provably efficient manner. More specifically, DOVI explicitly adjusts for the confounding bias in the observational data, where the confounders are partially observed or unobserved. In both cases, such adjustments allow us to construct the bonus based on a notion of information gain, which takes into account the amount of information acquired from the offline setting. In particular, we prove that the regret of DOVI is smaller than the optimal regret achievable in the pure online setting by a multiplicative factor, which decreases towards zero when the confounded observational data are more informative upon the adjustments. Our algorithm and analysis serve as a step towards causal reinforcement learning.

preprint2019arXiv

Neural Policy Gradient Methods: Global Optimality and Rates of Convergence

Policy gradient methods with actor-critic schemes demonstrate tremendous empirical successes, especially when the actors and critics are parameterized by neural networks. However, it remains less clear whether such "neural" policy gradient methods converge to globally optimal policies and whether they even converge at all. We answer both the questions affirmatively in the overparameterized regime. In detail, we prove that neural natural policy gradient converges to a globally optimal policy at a sublinear rate. Also, we show that neural vanilla policy gradient converges sublinearly to a stationary point. Meanwhile, by relating the suboptimality of the stationary points to the representation power of neural actor and critic classes, we prove the global optimality of all stationary points under mild regularity conditions. Particularly, we show that a key to the global optimality and convergence is the "compatibility" between the actor and critic, which is ensured by sharing neural architectures and random initializations across the actor and critic. To the best of our knowledge, our analysis establishes the first global optimality and convergence guarantees for neural policy gradient methods.