Researcher profile

Yi Ouyang

Yi Ouyang contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2023arXiv

An Approach to Stochastic Dynamic Games with Asymmetric Information and Hidden Actions

We consider in discrete time, a general class of sequential stochastic dynamic games with asymmetric information with the following features. The underlying system has Markovian dynamics controlled by the agents' joint actions. Each agent's instantaneous utility depends on the current system state and the agents' joint actions. At each time instant each agent makes a private noisy observation of the current system state and the agents' actions in the previous time instant. In addition, at each time instant all agents have a common noisy observation of the current system state and their actions in the previous time instant. Each agent's actions are part of his private information. The objective is to determine Bayesian Nash Equilibrium (BNE) strategy profiles that are based on a compressed version of the agents' information and can be sequentially computed; such BNE strategy profiles may not always exist. We present an approach/methodology that achieves the above-stated objective, along with an instance of a game where BNE strategy profiles with the above-mentioned characteristics exist. We show that the methodology also works for the case where the agents have no common observations.

preprint2022arXiv

Grasping Core Rules of Time Series through Pure Models

Time series underwent the transition from statistics to deep learning, as did many other machine learning fields. Although it appears that the accuracy has been increasing as the model is updated in a number of publicly available datasets, it typically only increases the scale by several times in exchange for a slight difference in accuracy. Through this experiment, we point out a different line of thinking, time series, especially long-term forecasting, may differ from other fields. It is not necessary to use extensive and complex models to grasp all aspects of time series, but to use pure models to grasp the core rules of time series changes. With this simple but effective idea, we created PureTS, a network with three pure linear layers that achieved state-of-the-art in 80% of the long sequence prediction tasks while being nearly the lightest model and having the fastest running speed. On this basis, we discuss the potential of pure linear layers in both phenomena and essence. The ability to understand the core law contributes to the high precision of long-distance prediction, and reasonable fluctuation prevents it from distorting the curve in multi-step prediction like mainstream deep learning models, which is summarized as a pure linear neural network that avoids over-fluctuating. Finally, we suggest the fundamental design standards for lightweight long-step time series tasks: input and output should try to have the same dimension, and the structure avoids fragmentation and complex operations.

preprint2022arXiv

On abelian $2$-ramification torsion modules of quadratic fields

For a number field $F$ and a prime number $p$, the $\mathbb{Z}_p$-torsion module of the Galois group of the maximal abelian pro-$p$ extension of $F$ unramified outside $p$ over $F$, denoted as $\mathcal{T}_p(F)$, is an important subject in abelian $p$-ramification theory. In this paper we study the group $\mathcal{T}_2(F)=\mathcal{T}_2(m)$ of the quadratic field $F=\mathbb{Q}(\sqrt{ m})$. Firstly, assuming $m>0$, we prove an explicit $4$-rank formula for $\mathcal{T}_2(-m)$. Furthermore, applying this formula, we obtain the $4$-rank density of $\mathcal{T}_2$-groups of imaginary quadratic fields. Secondly, for $l$ an odd prime, we obtain results about the $2$-divisibility of orders of $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2l)$. In particular we find that $\#\mathcal{T}_2(l)\equiv 2\# \mathcal{T}_2(2l)\equiv h_2(-2l)\bmod{16}$ if $l\equiv 7\bmod{8}$ where $h_2(-2l)$ is the $2$-class number of $\mathbb{Q}(\sqrt{-2l})$. We then obtain density results for $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2l)$. Finally, based on our density results and numerical data, we propose distribution conjectures about $\mathcal{T}_p(F)$ when $F$ varies over real or imaginary quadratic fields for any prime $p$, and about $\mathcal{T}_2(\pm l)$ and $\mathcal{T}_2(\pm 2 l)$ when $l$ varies, in the spirit of Cohen-Lenstra heuristics. Our conjecture in the $\mathcal{T}_2(l)$ case is closely connected to Shanks-Sime-Washington's speculation on the distributions of the zeros of $2$-adic $L$-functions and to the distributions of the fundamental units.

preprint2022arXiv

Training a Resilient Q-Network against Observational Interference

Deep reinforcement learning (DRL) has demonstrated impressive performance in various gaming simulators and real-world applications. In practice, however, a DRL agent may receive faulty observation by abrupt interferences such as black-out, frozen-screen, and adversarial perturbation. How to design a resilient DRL algorithm against these rare but mission-critical and safety-crucial scenarios is an essential yet challenging task. In this paper, we consider a deep q-network (DQN) framework training with an auxiliary task of observational interferences such as artificial noises. Inspired by causal inference for observational interference, we propose a causal inference based DQN algorithm called causal inference Q-network (CIQ). We evaluate the performance of CIQ in several benchmark DQN environments with different types of interferences as auxiliary labels. Our experimental results show that the proposed CIQ method could achieve higher performance and more resilience against observational interferences.

preprint2020arXiv

Regret Bounds for Decentralized Learning in Cooperative Multi-Agent Dynamical Systems

Regret analysis is challenging in Multi-Agent Reinforcement Learning (MARL) primarily due to the dynamical environments and the decentralized information among agents. We attempt to solve this challenge in the context of decentralized learning in multi-agent linear-quadratic (LQ) dynamical systems. We begin with a simple setup consisting of two agents and two dynamically decoupled stochastic linear systems, each system controlled by an agent. The systems are coupled through a quadratic cost function. When both systems' dynamics are unknown and there is no communication among the agents, we show that no learning policy can generate sub-linear in $T$ regret, where $T$ is the time horizon. When only one system's dynamics are unknown and there is one-directional communication from the agent controlling the unknown system to the other agent, we propose a MARL algorithm based on the construction of an auxiliary single-agent LQ problem. The auxiliary single-agent problem in the proposed MARL algorithm serves as an implicit coordination mechanism among the two learning agents. This allows the agents to achieve a regret within $O(\sqrt{T})$ of the regret of the auxiliary single-agent problem. Consequently, using existing results for single-agent LQ regret, our algorithm provides a $\tilde{O}(\sqrt{T})$ regret bound. (Here $\tilde{O}(\cdot)$ hides constants and logarithmic factors). Our numerical experiments indicate that this bound is matched in practice. From the two-agent problem, we extend our results to multi-agent LQ systems with certain communication patterns.

preprint2019arXiv

Large-Scale Traffic Signal Offset Optimization

The offset optimization problem seeks to coordinate and synchronize the timing of traffic signals throughout a network in order to enhance traffic flow and reduce stops and delays. Recently, offset optimization was formulated into a continuous optimization problem without integer variables by modeling traffic flow as sinusoidal. In this paper, we present a novel algorithm to solve this new formulation to near-global optimality on a large-scale. Specifically, we solve a convex relaxation of the nonconvex problem using a tree decomposition reduction, and use randomized rounding to recover a near-global solution. We prove that the algorithm always delivers solutions of expected value at least 0.785 times the globally optimal value. Moreover, assuming that the topology of the traffic network is "tree-like", we prove that the algorithm has near-linear time complexity with respect to the number of intersections. These theoretical guarantees are experimentally validated on the Berkeley, Manhattan, and Los Angeles traffic networks. In our numerical results, the empirical time complexity of the algorithm is linear, and the solutions have objectives within 0.99 times the globally optimal value.