Source author record

Baicen Xiao

Baicen Xiao appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Theory math.IT Artificial Intelligence Machine Learning Cryptography and Security eess.SY math.OC Multiagent Systems Systems and Control

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

This paper considers multi-agent reinforcement learning (MARL) tasks where agents receive a shared global reward at the end of an episode. The delayed nature of this reward affects the ability of the agents to assess the quality of their actions at intermediate time-steps. This paper focuses on developing methods to learn a temporal redistribution of the episodic reward to obtain a dense reward signal. Solving such MARL problems requires addressing two challenges: identifying (1) relative importance of states along the length of an episode (along time), and (2) relative importance of individual agents' states at any single time-step (among agents). In this paper, we introduce Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning (AREL) to address these two challenges. AREL uses attention mechanisms to characterize the influence of actions on state transitions along trajectories (temporal attention), and how each agent is affected by other agents at each time-step (agent attention). The redistributed rewards predicted by AREL are dense, and can be integrated with any given MARL algorithm. We evaluate AREL on challenging tasks from the Particle World environment and the StarCraft Multi-Agent Challenge. AREL results in higher rewards in Particle World, and improved win rates in StarCraft compared to three state-of-the-art reward redistribution methods. Our code is available at https://github.com/baicenxiao/AREL.

preprint2020arXiv

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Reinforcement learning has been successful in training autonomous agents to accomplish goals in complex environments. Although this has been adapted to multiple settings, including robotics and computer games, human players often find it easier to obtain higher rewards in some environments than reinforcement learning algorithms. This is especially true of high-dimensional state spaces where the reward obtained by the agent is sparse or extremely delayed. In this paper, we seek to effectively integrate feedback signals supplied by a human operator with deep reinforcement learning algorithms in high-dimensional state spaces. We call this FRESH (Feedback-based REward SHaping). During training, a human operator is presented with trajectories from a replay buffer and then provides feedback on states and actions in the trajectory. In order to generalize feedback signals provided by the human operator to previously unseen states and actions at test-time, we use a feedback neural network. We use an ensemble of neural networks with a shared network architecture to represent model uncertainty and the confidence of the neural network in its output. The output of the feedback neural network is converted to a shaping reward that is augmented to the reward provided by the environment. We evaluate our approach on the Bowling and Skiing Atari games in the arcade learning environment. Although human experts have been able to achieve high scores in these environments, state-of-the-art deep learning algorithms perform poorly. We observe that FRESH is able to achieve much higher scores than state-of-the-art deep learning algorithms in both environments. FRESH also achieves a 21.4% higher score than a human expert in Bowling and does as well as a human expert in Skiing.

preprint2020arXiv

Safety-Critical Online Control with Adversarial Disturbances

This paper studies the control of safety-critical dynamical systems in the presence of adversarial disturbances. We seek to synthesize state-feedback controllers to minimize a cost incurred due to the disturbance, while respecting a safety constraint. The safety constraint is given by a bound on an H-inf norm, while the cost is specified as an upper bound on the H-2 norm of the system. We consider an online setting where costs at each time are revealed only after the controller at that time is chosen. We propose an iterative approach to the synthesis of the controller by solving a modified discrete-time Riccati equation. Solutions of this equation enforce the safety constraint. We compare the cost of this controller with that of the optimal controller when one has complete knowledge of disturbances and costs in hindsight. We show that the regret function, which is defined as the difference between these costs, varies logarithmically with the time horizon. We validate our approach on a process control setup that is subject to two kinds of adversarial attacks.

preprint2016arXiv

On Optimizing Hierarchical Modulation in AWGN channel

Hierarchical modulation (HM) is able to provide different levels of protection for data streams and achieve a rate region that cannot be realized by traditional orthogonal schemes, such as time division (TD). Nevertheless, criterions and algorithms for general HM design are not available in existing literatures. In this paper, we jointly optimize the constellation positions and binary labels for HM to be used in additive white gaussian noise (AWGN) channel. Based on bit-interleaved coded modulation (BICM) with successive interference cancellation (SIC) capacity, our main purpose is to maximize the rate of one data stream, with power constrains and the constrain that the rate of other data streams should be larger than given thresholds. Multi-start interior-point algorithm is used to carry out the constellation optimization problems and methods to reduce optimization complexity are also proposed in this paper. Numerical results verify the performance gains of optimized HM compared with optimized quadrature amplidude modulation (QAM) based HM and other orthogonal transmission methods.

preprint2015arXiv

Iterative detection and decoding for SCMA systems with LDPC codes

Sparse code multiple access (SCMA) is a promising multiplexing approach to achieve high system capacity. In this paper, we develop a novel iterative detection and decoding scheme for SCMA systems combined with Low-density Parity-check (LDPC) decoding. In particular, we decompose the output of the message passing algorithm (MPA) based SCMA multiuser detection into intrinsic part and prior part. Then we design a joint detection and decoding scheme which iteratively exchanges the intrinsic information between the detector and the decoder, yielding a satisfied performance gain. Moreover, the proposed scheme has almost the same complexity compared to the traditional receiver for LDPC-coded SCMA systems. As numerical results demonstrate, the proposed scheme has a substantial gain over the traditional SCMA receiver on AWGN channels and Rayleigh fading channels.

preprint2015arXiv

Simplified Multiuser Detection for SCMA with Sum-Product Algorithm

Sparse code multiple access (SCMA) is a novel non-orthogonal multiple access technique, which fully exploits the shaping gain of multi-dimensional codewords. However, the lack of simplified multiuser detection algorithm prevents further implementation due to the inherently high computation complexity. In this paper, general SCMA detector algorithms based on Sum-product algorithm are elaborated. Then two improved algorithms are proposed, which simplify the detection structure and curtail exponent operations quantitatively in logarithm domain. Furthermore, to analyze these detection algorithms fairly, we derive theoretical expression of the average mutual information (AMI) of SCMA (SCMA-AMI), and employ a statistical method to calculate SCMA-AMI based specific detection algorithm. Simulation results show that the performance is almost as well as the based message passing algorithm in terms of both BER and AMI while the complexity is significantly decreased, compared to the traditional Max-Log approximation method.

Baicen Xiao

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Agent-Temporal Attention for Reward Redistribution in Episodic Multi-Agent Reinforcement Learning

FRESH: Interactive Reward Shaping in High-Dimensional State Spaces using Human Feedback

Safety-Critical Online Control with Adversarial Disturbances

On Optimizing Hierarchical Modulation in AWGN channel

Iterative detection and decoding for SCMA systems with LDPC codes

Simplified Multiuser Detection for SCMA with Sum-Product Algorithm