Researcher profile

Long Yang

Long Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

CUP: A Conservative Update Policy Algorithm for Safe Reinforcement Learning

Safe reinforcement learning (RL) is still very challenging since it requires the agent to consider both return maximization and safe exploration. In this paper, we propose CUP, a Conservative Update Policy algorithm with a theoretical safety guarantee. We derive the CUP based on the new proposed performance bounds and surrogate functions. Although using bounds as surrogate functions to design safe RL algorithms have appeared in some existing works, we develop them at least three aspects: (i) We provide a rigorous theoretical analysis to extend the surrogate functions to generalized advantage estimator (GAE). GAE significantly reduces variance empirically while maintaining a tolerable level of bias, which is an efficient step for us to design CUP; (ii) The proposed bounds are tighter than existing works, i.e., using the proposed bounds as surrogate functions are better local approximations to the objective and safety constraints. (iii) The proposed CUP provides a non-convex implementation via first-order optimizers, which does not depend on any convex approximation. Finally, extensive experiments show the effectiveness of CUP where the agent satisfies safe constraints. We have opened the source code of CUP at https://github.com/RL-boxes/Safe-RL.

preprint2022arXiv

Optimal Probabilistic Constellation Shaping for Covert Communications

In this paper, we investigate the optimal probabilistic constellation shaping design for covert communication systems from a practical view. Different from conventional covert communications with equiprobable constellations modulation, we propose nonequiprobable constellations modulation schemes to further enhance the covert rate. Specifically, we derive covert rate expressions for practical discrete constellation inputs for the first time. Then, we study the covert rate maximization problem by jointly optimizing the constellation distribution and power allocation. In particular, an approximate gradient descent method is proposed for obtaining the optimal probabilistic constellation shaping. To strike a balance between the computational complexity and the transmission performance, we further develop a framework that maximizes a lower bound on the achievable rate where the optimal probabilistic constellation shaping problem can be solved efficiently using the Frank-Wolfe method. Extensive numerical results show that the optimized probabilistic constellation shaping strategies provide significant gains in the achievable covert rate over the state-of-the-art schemes.

preprint2022arXiv

Penalized Proximal Policy Optimization for Safe Reinforcement Learning

Safe reinforcement learning aims to learn the optimal policy while satisfying safety constraints, which is essential in real-world applications. However, current algorithms still struggle for efficient policy updates with hard constraint satisfaction. In this paper, we propose Penalized Proximal Policy Optimization (P3O), which solves the cumbersome constrained policy iteration via a single minimization of an equivalent unconstrained problem. Specifically, P3O utilizes a simple-yet-effective penalty function to eliminate cost constraints and removes the trust-region constraint by the clipped surrogate objective. We theoretically prove the exactness of the proposed method with a finite penalty factor and provide a worst-case analysis for approximate error when evaluated on sample trajectories. Moreover, we extend P3O to more challenging multi-constraint and multi-agent scenarios which are less studied in previous work. Extensive experiments show that P3O outperforms state-of-the-art algorithms with respect to both reward improvement and constraint satisfaction on a set of constrained locomotive tasks.

preprint2022arXiv

Policy Optimization with Stochastic Mirror Descent

Improving sample efficiency has been a longstanding goal in reinforcement learning. This paper proposes $\mathtt{VRMPO}$ algorithm: a sample efficient policy gradient method with stochastic mirror descent. In $\mathtt{VRMPO}$, a novel variance-reduced policy gradient estimator is presented to improve sample efficiency. We prove that the proposed $\mathtt{VRMPO}$ needs only $\mathcal{O}(ε^{-3})$ sample trajectories to achieve an $ε$-approximate first-order stationary point, which matches the best sample complexity for policy optimization. The extensive experimental results demonstrate that $\mathtt{VRMPO}$ outperforms the state-of-the-art policy gradient methods in various settings.

preprint2022arXiv

Singularity as a diagnostic for secondary eyewall occurrence in tropical cyclones

Secondary eyewalls occur in 70% of major tropical cyclones (TCs), and are associated with rapid changes in storm intensity and rapid broadening of strong winds. While mechanisms of secondary eyewall formation have been investigated from various perspectives, the explicit conditions on which secondary eyewalls occur in TCs remain veiled, leaving substantial uncertainties in TC intensity forecast, especially for the most extreme events. In this study, we present a simple diagnostic, in form of a singularity, for secondary eyewall occurrence in TCs. The diagnostic is solely dependent on three basic storm characteristics (the maximum wind speed, the radius of maximum wind, and the latitude) and shown to compare well with satellite observations. It provides a valuable tool to improve the understanding, modeling and risk assessment of secondary eyewall storms.

preprint2020arXiv

Beetle Swarm Optimization Algorithm:Theory and Application

In this paper, a new meta-heuristic algorithm, called beetle swarm optimization algorithm, is proposed by enhancing the performance of swarm optimization through beetle foraging principles. The performance of 23 benchmark functions is tested and compared with widely used algorithms, including particle swarm optimization algorithm, genetic algorithm (GA) and grasshopper optimization algorithm . Numerical experiments show that the beetle swarm optimization algorithm outperforms its counterparts. Besides, to demonstrate the practical impact of the proposed algorithm, two classic engineering design problems, namely, pressure vessel design problem and himmelblaus optimization problem, are also considered and the proposed beetle swarm optimization algorithm is shown to be competitive in those applications.

preprint2020arXiv

FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

In recent years significant progress has been made in dealing with challenging problems using reinforcement learning.Despite its great success, reinforcement learning still faces challenge in continuous control tasks. Conventional methods always compute the derivatives of the optimal goal with a costly computation resources, and are inefficient, unstable and lack of robust-ness when dealing with such tasks. Alternatively, derivative-based methods treat the optimization process as a blackbox and show robustness and stability in learning continuous control tasks, but not data efficient in learning. The combination of both methods so as to get the best of the both has raised attention. However, most of the existing combination works adopt complex neural networks (NNs) as the policy for control. The double-edged sword of deep NNs can yield better performance, but also makes it difficult for parameter tuning and computation. To this end, in this paper we presents a novel method called FiDi-RL, which incorporates deep RL with Finite-Difference (FiDi) policy search.FiDi-RL combines Deep Deterministic Policy Gradients (DDPG)with Augment Random Search (ARS) and aims at improving the data efficiency of ARS. The empirical results show that FiDi-RL can improves the performance and stability of ARS, and provide competitive results against some existing deep reinforcement learning methods

preprint2020arXiv

Structure-mining: screening structure models by automated fitting to the atomic pair distribution function over large numbers of models

A new approach is presented to obtain candidate structures from atomic pair distribution function (PDF) data in a highly automated way. It fetches, from web-based structural databases, all the structures meeting the experimenter's search criteria and performs structure refinements on them without human intervention. It supports both x-ray and neutron PDFs. Tests on various material systems show the effectiveness and robustness of the algorithm in finding the correct atomic crystal structure. It works on crystalline and nanocrystalline materials including complex oxide nanoparticles and nanowires, low-symmetry and locally distorted structures, and complicated doped and magnetic materials. This approach could greatly reduce the traditional structure searching work and enable the possibility of high-throughput real-time auto analysis PDF experiments in the future.

preprint2020arXiv

Two-orbital degeneracy lifted state as a local precursor to a metal-insulator transition

The recent discovery of a local fluctuating t2g orbital-degeneracy-lifted (ODL) state in CuIr2S4 as a high temperature precursor to the metal-insulator transition (MIT) opens the door to a possible widespread presence of precursor states in scarcely studied high-temperature regimes of transition metal based quantum materials. Although in CuIr2S4 the ODL state comprises one orbital per Ir, there is no fundamental reason to exclude multi-orbital ODL states in general. The MgTi2O4 spinel exhibits a MIT on cooling at Ts ~250 K, accompanied by Ti t2g orbital ordering (OO) and spin dimerization with the average symmetry reducing to tetragonal. It shares with CuIr2S4 the pyrochlore transition metal sublattice with active t2g orbitals. This, together with its different orbital filling (t2g1 vs t2g5.5) make it a candidate for hosting a multi-orbital ODL precursor state. By combining x-ray and neutron pair distribution function analyses to track the evolution of the local atomic structure across the MIT we find that local tetragonality already exists in the metallic globally cubic phase at high temperature. Local distortions exist up to at least 500 K. Significantly, the high temperature local state is not continuously connected to the OO band insulator ground state, and so the transition cannot be characterized as a trivial order-disorder type. The shortest Ti-Ti spin singlet dimer bonds expand abruptly on warming across the transition but remain shorter than those seen in the cubic structure. These seemingly contradictory observations can be understood within the model of a local fluctuating two-orbital t2g ODL precursor state. The ODL state in MgTi2O4 has a correlation length of about 1 nm at high temperature. We discuss that this extended character of the local distortions is consistent with the two-orbital nature of the ODL state imposed by the charge filling and the bond charge repulsion.