Source author record

Yunhan Huang

Yunhan Huang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY math.OC Systems and Control Machine Learning Artificial Intelligence Cryptography and Security Computer Science and Game Theory econ.TH

Catalog footprint

What is connected

6works

8topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Pursuit-Evasion Differential Game with Strategic Information Acquisition

This paper studies a two-person linear-quadratic-Gaussian pursuit-evasion differential game with costly but controlled information. One player can decide when to observe the other player's state. However, one observation of another player's state comes with two costs: the direct cost of observing and the implicit cost of exposing his state. We call games of this type a Pursuit-Evasion-Exposure-Concealment (PEEC) game. The PEEC game constitutes two types of strategies: The control strategies and the observation strategies. We fully characterize the Nash control strategies of the PEEC game using techniques such as completing squares and the calculus of variations. We show that the derivation of the Nash observation strategies and the Nash control strategies can be decoupled. We develop a set of necessary conditions that facilitate the numerical computation of the Nash observation strategies. We show, in theory, that players with less maneuverability prefer concealment to exposure. We also show that when the game's horizon goes to infinity, the Nash observation strategy is to observe periodically, and the expected distance between the pursuer and the evader goes to zero with a bounded second moment. We conducted a series of numerical experiments to study the proposed PEEC game. We illustrate the numerical results using both figures and animation. Numerical results show that the pursuer can maintain high-grade performance even when the number of observations is limited. We also show that an evader with low maneuverability can still escape if the evader increases his stealthiness.

preprint2022arXiv

Reinforcement Learning for Linear Quadratic Control is Vulnerable Under Cost Manipulation

In this work, we study the deception of a Linear-Quadratic-Gaussian (LQG) agent by manipulating the cost signals. We show that a small falsification of the cost parameters will only lead to a bounded change in the optimal policy. The bound is linear on the amount of falsification the attacker can apply to the cost parameters. We propose an attack model where the attacker aims to mislead the agent into learning a `nefarious' policy by intentionally falsifying the cost parameters. We formulate the attack's problem as a convex optimization problem and develop necessary and sufficient conditions to check the achievability of the attacker's goal. We showcase the adversarial manipulation on two types of LQG learners: the batch RL learner and the other is the adaptive dynamic programming (ADP) learner. Our results demonstrate that with only 2.296% of falsification on the cost data, the attacker misleads the batch RL into learning the 'nefarious' policy that leads the vehicle to a dangerous position. The attacker can also gradually trick the ADP learner into learning the same `nefarious' policy by consistently feeding the learner a falsified cost signal that stays close to the actual cost signal. The paper aims to raise people's awareness of the security threats faced by RL-enabled control systems.

preprint2022arXiv

The Inverse Problem of Linear-Quadratic Differential Games: When is a Control Strategies Profile Nash?

This paper aims to formulate and study the inverse problem of non-cooperative linear quadratic games: Given a profile of control strategies, find cost parameters for which this profile of control strategies is Nash. We formulate the problem as a leader-followers problem, where a leader aims to implant a desired profile of control strategies among selfish players. In this paper, we leverage frequency-domain techniques to develop a necessary and sufficient condition on the existence of cost parameters for a given profile of stabilizing control strategies to be Nash under a given linear system. The necessary and sufficient condition includes the circle criterion for each player and a rank condition related to the transfer function of each player. The condition provides an analytical method to check the existence of such cost parameters, while previous studies need to solve a convex feasibility problem numerically to answer the same question. We develop an identity in frequency-domain representation to characterize the cost parameters, which we refer to as the Kalman equation. The Kalman equation reduces redundancy in the time-domain analysis that involves solving a convex feasibility problem. Using the Kalman equation, we also show the leader can enforce the same Nash profile by applying penalties on the shared state instead of penalizing the player for other players' actions to avoid the impression of unfairness.

preprint2021arXiv

Self-Triggered Markov Decision Processes

In this paper, we study Markov Decision Processes (MDPs) with self-triggered strategies, where the idea of self-triggered control is extended to more generic MDP models. This extension broadens the application of self-triggering policies to a broader range of systems. We study the co-design problems of the control policy and the triggering policy to optimize two pre-specified cost criteria. The first cost criterion is introduced by incorporating a pre-specified update penalty into the traditional MDP cost criteria to reduce the use of communication resources. Under this criteria, a novel dynamic programming (DP) equation called DP equation with optimized lookahead to proposed to solve for the self-triggering policy under this criteria. The second self-triggering policy is to maximize the triggering time while still guaranteeing a pre-specified level of sub-optimality. Theoretical underpinnings are established for the computation and implementation of both policies. Through a gridworld numerical example, we illustrate the two policies' effectiveness in reducing sources consumption and demonstrate the trade-offs between resource consumption and system performance.

preprint2020arXiv

Infinite-Horizon Linear-Quadratic-Gaussian Control with Costly Measurements

In this paper, we consider an infinite horizon Linear-Quadratic-Gaussian control problem with controlled and costly measurements. A control strategy and a measurement strategy are co-designed to optimize the trade-off among control performance, actuating costs, and measurement costs. We address the co-design and co-optimization problem by establishing a dynamic programming equation with controlled lookahead. By leveraging the dynamic programming equation, we fully characterize the optimal control strategy and the measurement strategy analytically. The optimal control is linear in the state estimate that depends on the measurement strategy. We prove that the optimal measurement strategy is independent of the measured state and is periodic. And the optimal period length is determined by the cost of measurements and system parameters. We demonstrate the potential application of the co-design and co-optimization problem in an optimal self-triggered control paradigm. Two examples are provided to show the effectiveness of the optimal measurement strategy in reducing the overhead of measurements while keeping the system performance.

preprint2020arXiv

Manipulating Reinforcement Learning: Poisoning Attacks on Cost Signals

This chapter studies emerging cyber-attacks on reinforcement learning (RL) and introduces a quantitative approach to analyze the vulnerabilities of RL. Focusing on adversarial manipulation on the cost signals, we analyze the performance degradation of TD($λ$) and $Q$-learning algorithms under the manipulation. For TD($λ$), the approximation learned from the manipulated costs has an approximation error bound proportional to the magnitude of the attack. The effect of the adversarial attacks on the bound does not depend on the choice of $λ$. In $Q$-learning, we show that $Q$-learning algorithms converge under stealthy attacks and bounded falsifications on cost signals. We characterize the relation between the falsified cost and the $Q$-factors as well as the policy learned by the learning agent which provides fundamental limits for feasible offensive and defensive moves. We propose a robust region in terms of the cost within which the adversary can never achieve the targeted policy. We provide conditions on the falsified cost which can mislead the agent to learn an adversary's favored policy. A case study of TD($λ$) learning is provided to corroborate the results.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint