Researcher profile

Shaoyang Wang

Shaoyang Wang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 13 - UnverifiedVerification L1Unclaimed author
2works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

2 published item(s)

preprint2022arXiv

Multi-Agent Deep Reinforcement Learning for Cost- and Delay-Sensitive Virtual Network Function Placement and Routing

This paper proposes an effective and novel multiagent deep reinforcement learning (MADRL)-based method for solving the joint virtual network function (VNF) placement and routing (P&R), where multiple service requests with differentiated demands are delivered at the same time. The differentiated demands of the service requests are reflected by their delay- and cost-sensitive factors. We first construct a VNF P&R problem to jointly minimize a weighted sum of service delay and resource consumption cost, which is NP-complete. Then, the joint VNF P&R problem is decoupled into two iterative subtasks: placement subtask and routing subtask. Each subtask consists of multiple concurrent parallel sequential decision processes. By invoking the deep deterministic policy gradient method and multi-agent technique, an MADRL-P&R framework is designed to perform the two subtasks. The new joint reward and internal rewards mechanism is proposed to match the goals and constraints of the placement and routing subtasks. We also propose the parameter migration-based model-retraining method to deal with changing network topologies. Corroborated by experiments, the proposed MADRL-P&R framework is superior to its alternatives in terms of service cost and delay, and offers higher flexibility for personalized service demands. The parameter migration-based model-retraining method can efficiently accelerate convergence under moderate network topology changes.

preprint2020arXiv

Learning-Based Multi-Channel Access in 5G and Beyond Networks with Fast Time-Varying Channels

We propose a learning-based scheme to investigate the dynamic multi-channel access (DMCA) problem in the fifth generation (5G) and beyond networks with fast time-varying channels wherein the channel parameters are unknown. The proposed learning-based scheme can maintain near-optimal performance for a long time, even in the sharp changing channels. This scheme greatly reduces processing delay, and effectively alleviates the error due to decision lag, which is cased by the non-immediacy of the information acquisition and processing. We first propose a psychology-based personalized quality of service model after introducing the network model with unknown channel parameters and the streaming model. Then, two access criteria are presented for the living streaming model and the buffered streaming model. Their corresponding optimization problems are also formulated. The optimization problems are solved by learning-based DMCA scheme, which combines the recurrent neural network with deep reinforcement learning. In the learning-based DMCA scheme, the agent mainly invokes the proposed prediction-based deep deterministic policy gradient algorithm as the learning algorithm. As a novel technical paradigm, our scheme has strong universality, since it can be easily extended to solve other problems in wireless communications. The real channel data-based simulation results validate that the performance of the learning-based scheme approaches that derived from the exhaustive search when making a decision at each time-slot, and is superior to the exhaustive search method when making a decision at every few time-slots.