Source author record

Yunjie Gu

Yunjie Gu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.SY Machine Learning Systems and Control Artificial Intelligence Multiagent Systems Computation and Language eess.SP

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Value factorisation is a useful technique for multi-agent reinforcement learning (MARL) in global reward game, however its underlying mechanism is not yet fully understood. This paper studies a theoretical framework for value factorisation with interpretability via Shapley value theory. We generalise Shapley value to Markov convex game called Markov Shapley value (MSV) and apply it as a value factorisation method in global reward game, which is obtained by the equivalence between the two games. Based on the properties of MSV, we derive Shapley-Bellman optimality equation (SBOE) to evaluate the optimal MSV, which corresponds to an optimal joint deterministic policy. Furthermore, we propose Shapley-Bellman operator (SBO) that is proved to solve SBOE. With a stochastic approximation and some transformations, a new MARL algorithm called Shapley Q-learning (SHAQ) is established, the implementation of which is guided by the theoretical results of SBO and MSV. We also discuss the relationship between SHAQ and relevant value factorisation methods. In the experiments, SHAQ exhibits not only superior performances on all tasks but also the interpretability that agrees with the theoretical analysis. The implementation of this paper is on https://github.com/hsvgbkhgbv/shapley-q-learning.

preprint2022arXiv

Impedance-based Root-cause Analysis: Comparative Study of Impedance Models and Calculation of Eigenvalue Sensitivity

Impedance models of power systems are useful when state-space models of apparatus such as inverter-based resources (IBRs) have not been made available and instead only black-box impedance models are available. For tracing the root causes of poor damping and tuning modes of the system, the sensitivity of the modes to components and parameters are needed. The so-called critical admittance-eigenvalue sensitivity based on nodal admittance model has provided a partial solution but omits meaningful directional information. The alternative whole-system impedance model yields participation factors of shunt-connected apparatus with directional information that allows separate tuning for damping and frequency, yet do not cover series-connected components. This paper formalises the relationships between the two forms of impedance models and between the two forms of root-cause analysis. The calculation of system eigenvalue sensitivity in impedance models is further developed, which fills the gaps of previous research and establishes a complete theory of impedance-based root-cause analysis. The theoretical relationships and the tuning of parameters have been illustrated with a three-node passive network, a modified IEEE 14-bus network and a modified NETS-NYPS 68-bus network, showing that tools can be developed for tuning of IBR-rich power systems where only black-box impedance models are available.

preprint2022arXiv

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks

This paper presents a problem in power networks that creates an exciting and yet challenging real-world scenario for application of multi-agent reinforcement learning (MARL). The emerging trend of decarbonisation is placing excessive stress on power distribution networks. Active voltage control is seen as a promising solution to relieve power congestion and improve voltage quality without extra hardware investment, taking advantage of the controllable apparatuses in the network, such as roof-top photovoltaics (PVs) and static var compensators (SVCs). These controllable apparatuses appear in a vast number and are distributed in a wide geographic area, making MARL a natural candidate. This paper formulates the active voltage control problem in the framework of Dec-POMDP and establishes an open-source environment. It aims to bridge the gap between the power community and the MARL community and be a drive force towards real-world applications of MARL algorithms. Finally, we analyse the special characteristics of the active voltage control problems that cause challenges (e.g. interpretability) for state-of-the-art MARL approaches, and summarise the potential directions.

preprint2022arXiv

Revisiting Grid-Forming and Grid-Following Inverters: A Duality Theory

Power electronic converters for integrating renewable energy resources into power systems can be divided into grid-forming and grid-following inverters. They possess certain similarities, but several important differences, which means that the relationship between them is quite subtle and sometimes obscure. In this article, a new perspective based on duality is proposed to create new insights. It successfully unifies the grid interfacing and synchronization characteristics of the two inverter types in a symmetric, elegant, and technology-neutral form. Analysis shows that the grid-forming and grid-following inverters are duals of each other in several ways including a) synchronization controllers: frequency droop control and phase-locked loop (PLL); b) grid-interfacing characteristics: current-following voltage-forming and voltage-following current-forming; c) swing characteristics: current-angle swing and voltage-angle swing; d) inner-loop controllers: output impedance shaping and output admittance shaping; and e) grid strength compatibility: strong-grid instability and weak-grid instability. The swing equations are also derived in dual form, which reveal the dynamic interaction between the grid strength, the synchronization controllers, and the inner-loop controllers. Insights are generated into cases of poor stability in both small-signal and transient/large-signal. The theoretical analysis and simulation results are used to illustrate cases for simple single-inverter-infinite-bus systems and a multi-inverter power network.

preprint2021arXiv

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Designing task-oriented dialogue systems is a challenging research topic, since it needs not only to generate utterances fulfilling user requests but also to guarantee the comprehensibility. Many previous works trained end-to-end (E2E) models with supervised learning (SL), however, the bias in annotated system utterances remains as a bottleneck. Reinforcement learning (RL) deals with the problem through using non-differentiable evaluation metrics (e.g., the success rate) as rewards. Nonetheless, existing works with RL showed that the comprehensibility of generated system utterances could be corrupted when improving the performance on fulfilling user requests. In our work, we (1) propose modelling the hierarchical structure between dialogue policy and natural language generator (NLG) with the option framework, called HDNO, where the latent dialogue act is applied to avoid designing specific dialogue act representations; (2) train HDNO via hierarchical reinforcement learning (HRL), as well as suggest the asynchronous updates between dialogue policy and NLG during training to theoretically guarantee their convergence to a local maximizer; and (3) propose using a discriminator modelled with language models as an additional reward to further improve the comprehensibility. We test HDNO on MultiWoz 2.0 and MultiWoz 2.1, the datasets on multi-domain dialogues, in comparison with word-level E2E model trained with RL, LaRL and HDSA, showing improvements on the performance evaluated by automatic evaluation metrics and human evaluation. Finally, we demonstrate the semantic meanings of latent dialogue acts to show the explanability for HDNO.

preprint2020arXiv

Impedance-Based Whole-System Modeling for a Composite Grid via Frame-Dynamics Embedding

The paper establishes a methodology to overcome the difficulty of dynamic frame alignment and system separation in impedance modeling of ac grids, and thereby enables impedance-based whole-system modeling of generator-converter composite power systems. The methodology is based on a frame-dynamics-embedding transformation via an intermediary steady frame between local and global frames, which yields a locally defined impedance model for each generator or converter that does not rely on a global frame but retains all frame dynamics. The individual impedance model can then be readily combined into a whole-system model even for meshed networks via the proposed closed-loop formulation without network separation. Compared to start-of-the-art impedance-based models, the proposed method retains both frame dynamics and scalability, and is generally applicable to various network topologies (meshed, radial, etc) and combinations of machines (generators, motors, converters, etc). The methodology is used to analyze the dynamic interaction between generators and converters in a composite grid, which yields important findings and potential solutions for unstable oscillation caused by PLL-swing coupling in low-inertia grids.

preprint2019arXiv

Interpreting Frame Transformations as Diagonalization of Harmonic Transfer Functions

Analysis of ac electrical systems can be performed via frame transformations in the time-domain or via harmonic transfer functions (HTFs) in the frequency-domain. The two approaches each have unique advantages but are hard to reconcile because the coupling effect in the frequency-domain leads to infinite dimensional HTF matrices that need to be truncated. This paper explores the relation between the two representations and shows that applying a similarity transformation to an HTF matrix creates a direct equivalence to a frame transformation on the input-output signals. Under certain conditions, such similarity transformations have a diagonalizing effect which, essentially, reduces the HTF matrix order from infinity to two or one, making the matrix tractable mathematically without truncation or approximation. This theory is applied to a droop-controlled voltage source inverter as an illustrative example. A stability criterion is derived in the frequency-domain which agrees with the conventional state-space model but offers greater insights into the mechanism of instability in terms of the negative damping (non-passivity) under droop control. The paper not only establishes a unified view in theory but also offers an effective practical tool for stability assessment.

Yunjie Gu

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

SHAQ: Incorporating Shapley Value Theory into Multi-Agent Q-Learning

Impedance-based Root-cause Analysis: Comparative Study of Impedance Models and Calculation of Eigenvalue Sensitivity

Multi-Agent Reinforcement Learning for Active Voltage Control on Power Distribution Networks

Revisiting Grid-Forming and Grid-Following Inverters: A Duality Theory

Modelling Hierarchical Structure between Dialogue Policy and Natural Language Generator with Option Framework for Task-oriented Dialogue System

Impedance-Based Whole-System Modeling for a Composite Grid via Frame-Dynamics Embedding

Interpreting Frame Transformations as Diagonalization of Harmonic Transfer Functions