Source author record

Yi Ouyang

Yi Ouyang appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Systems and Control math.NT eess.SY Machine Learning Artificial Intelligence Computer Science and Game Theory Information Theory math.IT math.OC Multiagent Systems Networking and Internet Architecture Neural and Evolutionary Computing Robotics

Catalog footprint

What is connected

15works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

An Approach to Stochastic Dynamic Games with Asymmetric Information and Hidden Actions

We consider in discrete time, a general class of sequential stochastic dynamic games with asymmetric information with the following features. The underlying system has Markovian dynamics controlled by the agents' joint actions. Each agent's instantaneous utility depends on the current system state and the agents' joint actions. At each time instant each agent makes a private noisy observation of the current system state and the agents' actions in the previous time instant. In addition, at each time instant all agents have a common noisy observation of the current system state and their actions in the previous time instant. Each agent's actions are part of his private information. The objective is to determine Bayesian Nash Equilibrium (BNE) strategy profiles that are based on a compressed version of the agents' information and can be sequentially computed; such BNE strategy profiles may not always exist. We present an approach/methodology that achieves the above-stated objective, along with an instance of a game where BNE strategy profiles with the above-mentioned characteristics exist. We show that the methodology also works for the case where the agents have no common observations.

preprint2022arXiv

Grasping Core Rules of Time Series through Pure Models

Time series underwent the transition from statistics to deep learning, as did many other machine learning fields. Although it appears that the accuracy has been increasing as the model is updated in a number of publicly available datasets, it typically only increases the scale by several times in exchange for a slight difference in accuracy. Through this experiment, we point out a different line of thinking, time series, especially long-term forecasting, may differ from other fields. It is not necessary to use extensive and complex models to grasp all aspects of time series, but to use pure models to grasp the core rules of time series changes. With this simple but effective idea, we created PureTS, a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks while being nearly the lightest model and having the fastest running speed. On this basis, we discuss the potential of pure linear layers in both phenomena and essence. The ability to understand the core law contributes to the high precision of long-distance prediction, and reasonable fluctuation prevents it from distorting the curve in multi-step prediction like mainstream deep learning models, which is summarized as a pure linear neural network that avoids over-fluctuating. Finally, we suggest the fundamental design standards for lightweight long-step time series tasks: input and output should try to have the same dimension, and the structure avoids fragmentation and complex operations.

preprint2022arXiv

On abelian $2$-ramification torsion modules of quadratic fields

For a number field $F$ and a prime number $p$, the $\mathbb{Z}_p$-torsion module of the Galois group of the maximal abelian pro-$p$ extension of $F$ unramified outside $p$ over $F$, denoted as $\mathcal{T}_p(F)$, is an important subject in abelian $p$-ramification theory. In this paper we study the group $\mathcal{T}_2(F)=\mathcal{T}_2(m)$ of the quadratic field $F=\mathbb{Q}(\sqrt{ m})$. Firstly, assuming $m>0$, we prove an explicit $4$-rank formula for $\mathcal{T}_2(-m)$. Furthermore, applying this formula, we obtain the $4$-rank density of $\mathcal{T}_2$-groups of imaginary quadratic fields. Secondly, for $l$ an odd prime, we obtain results about the $2$-divisibility of orders of $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2l)$. In particular we find that $\#\mathcal{T}_2(l)\equiv 2\# \mathcal{T}_2(2l)\equiv h_2(-2l)\bmod{16}$ if $l\equiv 7\bmod{8}$ where $h_2(-2l)$ is the $2$-class number of $\mathbb{Q}(\sqrt{-2l})$. We then obtain density results for $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2l)$. Finally, based on our density results and numerical data, we propose distribution conjectures about $\mathcal{T}_p(F)$ when $F$ varies over real or imaginary quadratic fields for any prime $p$, and about $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2 l)$ when $l$ varies, in the spirit of Cohen-Lenstra heuristics. Our conjecture in the $\mathcal{T}_2(l)$ case is closely connected to Shanks-Sime-Washington's speculation on the distributions of the zeros of $2$-adic $L$-functions and to the distributions of the fundamental units.

preprint2022arXiv

Training a Resilient Q-Network against Observational Interference

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an essential yet challenging task. In this paper, we consider a deep q-network (DQN) framework training with an auxiliary task of observational interferences such as artificial noises. Inspired by causal inference for observational interference, we propose a causal inference based DQN algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.

preprint2020arXiv

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

preprint2019arXiv

Large-Scale Traffic Signal Offset Optimization

The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this paper, we present a novel algorithm to solve this new formulation to near-global optimality on a large-scale. Specifically, we solve a convex relaxation of the nonconvex problem using a tree decomposition reduction, and use randomized rounding to recover a near-global solution. We prove that the algorithm always delivers solutions of expected value at least 0.785 times the globally optimal value. Moreover, assuming that the topology of the traffic network is "tree-like", we prove that the algorithm has near-linear time complexity with respect to the number of intersections. These theoretical guarantees are experimentally validated on the Berkeley, Manhattan, and Los Angeles traffic networks. In our numerical results, the empirical time complexity of the algorithm is linear, and the solutions have objectives within 0.99 times the globally optimal value.

preprint2016arXiv

Optimal Local and Remote Controllers with Unreliable Communication

We consider a decentralized optimal control problem for a linear plant controlled by two controllers, a local controller and a remote controller. The local controller directly observes the state of the plant and can inform the remote controller of the plant state through a packet-drop channel. We assume that the remote controller is able to send acknowledgments to the local controller to signal the successful receipt of transmitted packets. The objective of the two controllers is to cooperatively minimize a quadratic performance cost. We provide a dynamic program for this decentralized control problem using the common information approach. Although our problem is not a partially nested LQG problem, we obtain explicit optimal strategies for the two controllers. In the optimal strategies, both controllers compute a common estimate of the plant state based on the common information. The remote controller's action is linear in the common estimated state, and the local controller's action is linear in both the actual state and the common estimated state.

preprint2015arXiv

A Common Information-Based Multiple Access Protocol Achieving Full Throughput and Linear Delay

We consider a multiple access communication system where multiple users share a common collision channel. Each user observes its local traffic and the feedback from the channel. At each time instant the feedback from the channel is one of three messages: no transmission, successful transmission, collision. The objective is to design a transmission protocol that coordinates the users' transmissions and achieves high throughput and low delay. We present a decentralized Common Information-Based Multiple Access (CIMA) protocol that has the following features: (i) it achieves the full throughput region of the collision channel; (ii) it results in a delay that is linear in the number of users, and is significantly lower than that of CSMA protocols; (iii) it avoids collisions without channel sensing.

preprint2015arXiv

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

We formulate and analyze a general class of stochastic dynamic games with asymmetric information arising in dynamic systems. In such games, multiple strategic agents control the system dynamics and have different information about the system over time. Because of the presence of asymmetric information, each agent needs to form beliefs about other agents' private information. Therefore, the specification of the agents' beliefs along with their strategies is necessary to study the dynamic game. We use Perfect Bayesian equilibrium (PBE) as our solution concept. A PBE consists of a pair of strategy profile and belief system. In a PBE, every agent's strategy should be a best response under the belief system, and the belief system depends on agents' strategy profile when there is signaling among agents. Therefore, the circular dependence between strategy profile and belief system makes it difficult to compute PBE. Using the common information among agents, we introduce a subclass of PBE called common information based perfect Bayesian equilibria (CIB-PBE), and provide a sequential decomposition of the dynamic game. Such decomposition leads to a backward induction algorithm to compute CIB-PBE. We illustrate the sequential decomposition with an example of a multiple access broadcast game. We prove the existence of CIB-PBE for a subclass of dynamic games.

preprint2015arXiv

Newton polygons of $L$-functions of polynomials $x^d+ax^{d-1}$ with $p\equiv-1\bmod d$

For prime $p\equiv-1\bmod d$ and $q$ a power of $p$, we obtain the slopes of the $q$-adic Newton polygons of $L$-functions of $x^d+ax^{d-1}\in \mathbb{F}_q[x]$ with respect to finite characters $χ$ when $p$ is larger than an explicit bound depending only on $d$ and $\log_p q$. The main tools are Dwork's trace formula and Zhu's rigid transform theorem.

preprint2015arXiv

On a conjecture of Wan about limiting Newton polygons

We show that for a monic polynomial $f(x)$ over a number field $K$ containing a global permutation polynomial of degree $>1$ as its composition factor, the Newton Polygon of $f\mod\mathfrak p$ does not converge for $\mathfrak p$ passing through all finite places of $K$. In the rational number field case, our result is the "only if" part of a conjecture of Wan about limiting Newton polygons.

preprint2015arXiv

Optimal Relay Selection with Non-negligible Probing Time

In this paper an optimal relay selection algorithm with non-negligible probing time is proposed and analyzed for cooperative wireless networks. Relay selection has been introduced to solve the degraded bandwidth efficiency problem in cooperative communication. Yet complete information of relay channels often remain unavailable for complex networks which renders the optimal selection strategies impossible for transmission source without probing the relay channels. Particularly when the number of relay candidate is large, even though probing all relay channels guarantees the finding of the best relays at any time instant, the degradation of bandwidth efficiency due to non-negligible probing times, which was often neglected in past literature, is also significant. In this work, a stopping rule based relay selection strategy is determined for the source node to decide when to stop the probing process and choose one of the probed relays to cooperate with under wireless channels' stochastic uncertainties. This relay selection strategy is further shown to have a simple threshold structure. At the meantime, full diversity order and high bandwidth efficiency can be achieved simultaneously. Both analytical and simulation results are provided to verify the claims.

preprint2014arXiv

Signaling for Decentralized Routing in a Queueing Network

A discrete-time decentralized routing problem in a service system consisting of two service stations and two controllers is investigated. Each controller is affiliated with one station. Each station has an infinite size buffer. Exogenous customer arrivals at each station occur with rate $λ$. Service times at each station have rate $μ$. At any time, a controller can route one of the customers waiting in its own station to the other station. Each controller knows perfectly the queue length in its own station and observes the exogenous arrivals to its own station as well as the arrivals of customers sent from the other station. At the beginning, each controller has a probability mass function (PMF) on the number of customers in the other station. These PMFs are common knowledge between the two controllers. At each time a holding cost is incurred at each station due to the customers waiting at that station. The objective is to determine routing policies for the two controllers that minimize either the total expected holding cost over a finite horizon or the average cost per unit time over an infinite horizon. In this problem there is implicit communication between the two controllers; whenever a controller decides to send or not to send a customer from its own station to the other station it communicates information about its queue length to the other station. This implicit communication through control actions is referred to as signaling in decentralized control. Signaling results in complex communication and decision problems. In spite of the complexity of signaling involved, it is shown that an optimal signaling strategy is described by a threshold policy which depends on the common information between the two controllers; this threshold policy is explicitly determined.

preprint2013arXiv

On The Optimality of Myopic Sensing in Multi-State Channels

We consider the channel sensing problem arising in opportunistic scheduling over fading channels, cognitive radio networks, and resource constrained jamming. The communication system consists of N channels. Each channel is modeled as a multi-state Markov chain (M.C.). At each time instant a user selects one channel to sense and uses it to transmit information. A reward depending on the state of the selected channel is obtained for each transmission. The objective is to design a channel sensing policy that maximizes the expected total reward collected over a finite or infinite horizon. This problem can be viewed as an instance of a restless bandit problem, for which the form of optimal policies is unknown in general. We discover sets of conditions sufficient to guarantee the optimality of a myopic sensing policy; we show that under one particular set of conditions the myopic policy coincides with the Gittins index rule.

preprint2012arXiv

On non-congruent numbers with 1 modulo 4 prime factors

In this paper, we use the 2-decent method to find a series of odd non-congruent numbers $\equiv1\pmod 8$ whose prime factors are $\equiv1\pmod4$ such that the congruent elliptic curves have second lowest Selmer groups, which includes Li and Tian's result (Li and Tian, 2000) as special cases.

Yi Ouyang

What is connected

Connect this record

See the researcher in context

Building this map preview

15 published item(s)

An Approach to Stochastic Dynamic Games with Asymmetric Information and Hidden Actions

Grasping Core Rules of Time Series through Pure Models

On abelian $2$-ramification torsion modules of quadratic fields

Training a Resilient Q-Network against Observational Interference

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Large-Scale Traffic Signal Offset Optimization

Optimal Local and Remote Controllers with Unreliable Communication

A Common Information-Based Multiple Access Protocol Achieving Full Throughput and Linear Delay

Dynamic Games with Asymmetric Information: Common Information Based Perfect Bayesian Equilibria and Sequential Decomposition

Newton polygons of $L$-functions of polynomials $x^d+ax^{d-1}$ with $p\equiv-1\bmod d$

On a conjecture of Wan about limiting Newton polygons

Optimal Relay Selection with Non-negligible Probing Time

Signaling for Decentralized Routing in a Queueing Network

On The Optimality of Myopic Sensing in Multi-State Channels

On non-congruent numbers with 1 modulo 4 prime factors