Researcher profile

Zaiyue Yang

Zaiyue Yang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2023arXiv

Automated deep reinforcement learning for real-time scheduling strategy of multi-energy system integrated with post-carbon and direct-air carbon captured system

The carbon-capturing process with the aid of CO2 removal technology (CDRT) has been recognised as an alternative and a prominent approach to deep decarbonisation. However, the main hindrance is the enormous energy demand and the economic implication of CDRT if not effectively managed. Hence, a novel deep reinforcement learning agent (DRL), integrated with an automated hyperparameter selection feature, is proposed in this study for the real-time scheduling of a multi-energy system coupled with CDRT. Post-carbon capture systems (PCCS) and direct-air capture systems (DACS) are considered CDRT. Various possible configurations are evaluated using real-time multi-energy data of a district in Arizona and CDRT parameters from manufacturers' catalogues and pilot project documentation. The simulation results validate that an optimised soft-actor critic (SAC) algorithm outperformed the TD3 algorithm due to its maximum entropy feature. We then trained four (4) SAC agents, equivalent to the number of considered case studies, using optimised hyperparameter values and deployed them in real time for evaluation. The results show that the proposed DRL agent can meet the prosumers' multi-energy demand and schedule the CDRT energy demand economically without specified constraints violation. Also, the proposed DRL agent outperformed rule-based scheduling by 23.65%. However, the configuration with PCCS and solid-sorbent DACS is considered the most suitable configuration with a high CO2 captured-released ratio of 38.54, low CO2 released indicator value of 2.53, and a 36.5% reduction in CDR cost due to waste heat utilisation and high absorption capacity of the selected sorbent. However, the adoption of CDRT is not economically viable at the current carbon price. Finally, we showed that CDRT would be attractive at a carbon price of 400-450USD/ton with the provision of tax incentives by the policymakers.

preprint2022arXiv

Incentive-aware Electric Vehicle Routing Problem: a Bi-level Model and a Joint Solution Algorithm

Fixed pickup and delivery times can strongly limit the performance of freight transportation. Against this backdrop, fleet operators can use compensation mechanisms such as monetary incentives to buy delay time from their customers, in order to improve the fleet efficiency and ultimately minimize the costs of operation. To make the most of such an operational model, the fleet activities and the incentives should be jointly optimized accounting for the customers' reactions. Against this backdrop, this paper presents an incentive-aware electric vehicle routing scheme in which the fleet operator actively provides incentives to the customers in exchange of pickup or delivery time flexibility. Specifically, we first devise a bi-level model whereby the fleet operator optimizes the routes and charging schedules of the fleet jointly with an incentive rate to reimburse the delivery delays experienced by the customers. At the same time, the customers choose the admissible delays by minimizing a monetarily-weighted combination of the delays minus the reimbursement offered by the operator. Second, we tackle the complexity resulting from the bi-level and nonlinear problem structure with an equivalent transformation method, reformulating the problem as a single-level optimization problem that can be solved with standard mixed-integer linear programming algorithms. We demonstrate the effectiveness of our framework via extensive numerical experiments using VRP-REP data from Belgium. Our results show that by jointly optimizing routes and incentives subject to the customers' preferences, the operational costs can be reduced by up to 5%, whilst customers can save more than 30% in total delivery fees.

preprint2022arXiv

Joint Routing and Charging Problem of Electric Vehicles with Incentive-aware Customers Considering Spatio-temporal Charging Prices

This paper investigates the scheduling problem of a fleet of electric vehicles, providing mobility as a service to a set of time-specified customers, where the operator needs to solve the routing and charging problem jointly for each EV. Hereby we consider incentive-aware customers and propose that the operator offers monetary incentives to customers in exchange for time flexibility. In this way, the fleet operator can achieve a routing and charging schedule with lower costs, whilst the customers receive monetary compensation for their flexibility. Specifically, we first propose a bi-level optimization model whereby the fleet operator optimizes the routing and charging schedule accounting for the spatio-temporal varying charging price, jointly with a monetary incentive to reimburse the delivery time flexibility experienced by the customers. Concurrently the customers choose their own time flexibility by minimizing their own cost. Second, we cope with the computational burden coming from this nonlinear bi-level optimization model with an accurate reformulation approach consisting of the KKT optimality conditions, a Big-M-based linearization method, and the zero duality gap of convex optimization problems. This way, we convert the proposed problem into a single-level optimization problem, which can be solved by a strengthened generalized Benders decomposition method holding a faster convergence rate than the generalized Benders decomposition method. To evaluate the effectiveness of the proposed mathematical model, we carry out numerous simulation experiments by using the VRP-REP data of Belgium. The numerical results showcase that the proposed mathematical model can reduce the delivery fees for the customers together with the cost of operation incurred by the fleet operator.

preprint2020arXiv

Finite-Sample Analysis of Decentralized Temporal-Difference Learning with Linear Function Approximation

Motivated by the emerging use of multi-agent reinforcement learning (MARL) in engineering applications such as networked robotics, swarming drones, and sensor networks, we investigate the policy evaluation problem in a fully decentralized setting, using temporal-difference (TD) learning with linear function approximation to handle large state spaces in practice. The goal of a group of agents is to collaboratively learn the value function of a given policy from locally private rewards observed in a shared environment, through exchanging local estimates with neighbors. Despite their simplicity and widespread use, our theoretical understanding of such decentralized TD learning algorithms remains limited. Existing results were obtained based on i.i.d. data samples, or by imposing an `additional' projection step to control the `gradient' bias incurred by the Markovian observations. In this paper, we provide a finite-sample analysis of the fully decentralized TD(0) learning under both i.i.d. as well as Markovian samples, and prove that all local estimates converge linearly to a small neighborhood of the optimum. The resultant error bounds are the first of its type---in the sense that they hold under the most practical assumptions ---which is made possible by means of a novel multi-step Lyapunov analysis.