Researcher profile

Andrew Patterson

Andrew Patterson contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Forager: a lightweight testbed for continual learning with partial observability in RL

In continual reinforcement learning (CRL), good performance requires never-ending learning, acting, and exploration in a big, partially observable world. Most CRL experiments have focused on loss of plasticity -- the inability to keep learning -- in one-off experiments where some unobservable non-stationarity is added to classic fully observable MDPs. Further, these experiments rarely consider the role of partial observability and the importance of CRL agents that use memory or recurrence. One potential reason for this focus on mitigating loss of plasticity without considering partial observability is that many partially-observable CRL environments are prohibitively expensive. In this paper, we introduce Forager, a light-weight partially-observable CRL environment with a constant memory footprint. We provide a set of experiments and sample tasks demonstrating that Forager is challenging for current CRL agents and yet also allows for in-depth study of those agents. We demonstrate that agents exhibit loss of plasticity, proposed mitigations can help, but that most useful is to leverage state construction. We conclude with a variant of Forager that generates an unending stream of new tasks to learn that clearly highlights the limitations of current CRL agents.

preprint2022arXiv

A Temporal-Difference Approach to Policy Gradient Estimation

The policy gradient theorem (Sutton et al., 2000) prescribes the usage of a cumulative discounted state distribution under the target policy to approximate the gradient. Most algorithms based on this theorem, in practice, break this assumption, introducing a distribution shift that can cause the convergence to poor solutions. In this paper, we propose a new approach of reconstructing the policy gradient from the start state without requiring a particular sampling strategy. The policy gradient calculation in this form can be simplified in terms of a gradient critic, which can be recursively estimated due to a new Bellman equation of gradients. By using temporal-difference updates of the gradient critic from an off-policy data stream, we develop the first estimator that sidesteps the distribution shift issue in a model-free way. We prove that, under certain realizability conditions, our estimator is unbiased regardless of the sampling strategy. We empirically show that our technique achieves a superior bias-variance trade-off and performance in presence of off-policy samples.

preprint2021arXiv

General Value Function Networks

State construction is important for learning in partially observable environments. A general purpose strategy for state construction is to learn the state update using a Recurrent Neural Network (RNN), which updates the internal state using the current internal state and the most recent observation. This internal state provides a summary of the observed sequence, to facilitate accurate predictions and decision-making. At the same time, specifying and training RNNs is notoriously tricky, particularly as the common strategy to approximate gradients back in time, called truncated Back-prop Through Time (BPTT), can be sensitive to the truncation window. Further, domain-expertise--which can usually help constrain the function class and so improve trainability--can be difficult to incorporate into complex recurrent units used within RNNs. In this work, we explore how to use multi-step predictions to constrain the RNN and incorporate prior knowledge. In particular, we revisit the idea of using predictions to construct state and ask: does constraining (parts of) the state to consist of predictions about the future improve RNN trainability? We formulate a novel RNN architecture, called a General Value Function Network (GVFN), where each internal state component corresponds to a prediction about the future represented as a value function. We first provide an objective for optimizing GVFNs, and derive several algorithms to optimize this objective. We then show that GVFNs are more robust to the truncation level, in many cases only requiring one-step gradient updates.

preprint2020arXiv

$\mathcal{L}_1$-$\mathcal{GP}$: $\mathcal{L}_1$ Adaptive Control with Bayesian Learning

We present $\mathcal{L}_1$-$\mathcal{GP}$, an architecture based on $\mathcal{L}_1$ adaptive control and Gaussian Process Regression (GPR) for safe simultaneous control and learning. On one hand, the $\mathcal{L}_1$ adaptive control provides stability and transient performance guarantees, which allows for GPR to efficiently and safely learn the uncertain dynamics. On the other hand, the learned dynamics can be conveniently incorporated into the $\mathcal{L}_1$ control architecture without sacrificing robustness and tracking performance. Subsequently, the learned dynamics can lead to less conservative designs for performance/robustness tradeoff. We illustrate the efficacy of the proposed architecture via numerical simulations.

preprint2020arXiv

Gradient Temporal-Difference Learning with Regularized Corrections

It is still common to use Q-learning and temporal difference (TD) learning-even though they have divergence issues and sound Gradient TD alternatives exist-because divergence seems rare and they typically perform well. However, recent work with large neural network learning systems reveals that instability is more common than previously thought. Practitioners face a difficult dilemma: choose an easy to use and performant TD method, or a more complex algorithm that is more sound but harder to tune and all but unexplored with non-linear function approximation or control. In this paper, we introduce a new method called TD with Regularized Corrections (TDRC), that attempts to balance ease of use, soundness, and performance. It behaves as well as TD, when TD performs well, but is sound in cases where TD diverges. We empirically investigate TDRC across a range of problems, for both prediction and control, and for both linear and non-linear function approximation, and show, potentially for the first time, that gradient TD methods could be a better alternative to TD and Q-learning.

preprint2020arXiv

Learning Probabilistic Intersection Traffic Models for Trajectory Prediction

Autonomous agents must be able to safely interact with other vehicles to integrate into urban environments. The safety of these agents is dependent on their ability to predict collisions with other vehicles' future trajectories for replanning and collision avoidance. The information needed to predict collisions can be learned from previously observed vehicle trajectories in a specific environment, generating a traffic model. The learned traffic model can then be incorporated as prior knowledge into any trajectory estimation method being used in this environment. This work presents a Gaussian process based probabilistic traffic model that is used to quantify vehicle behaviors in an intersection. The Gaussian process model provides estimates for the average vehicle trajectory, while also capturing the variance between the different paths a vehicle may take in the intersection. The method is demonstrated on a set of time-series position trajectories. These trajectories are reconstructed by removing object recognition errors and missed frames that may occur due to data source processing. To create the intersection traffic model, the reconstructed trajectories are clustered based on their source and destination lanes. For each cluster, a Gaussian process model is created to capture the average behavior and the variance of the cluster. To show the applicability of the Gaussian model, the test trajectories are classified with only partial observations. Performance is quantified by the number of observations required to correctly classify the vehicle trajectory. Both the intersection traffic modeling computations and the classification procedure are timed. These times are presented as results and demonstrate that the model can be constructed in a reasonable amount of time and the classification procedure can be used for online applications.

preprint2020arXiv

Quantum State Discrimination Using Noisy Quantum Neural Networks

Near-term quantum computers are noisy, and therefore must run algorithms with a low circuit depth and qubit count. Here we investigate how noise affects a quantum neural network (QNN) for state discrimination, applicable on near-term quantum devices as it fulfils the above criteria. We find that when simulating gradient calculation on a noisy device, a large number of parameters is disadvantageous. By introducing a new smaller circuit ansatz we overcome this limitation, and find that the QNN performs well at noise levels of current quantum hardware. We also show that networks trained at higher noise levels can still converge to useful parameters. Our findings show that noisy quantum computers can be used in applications for state discrimination and for classifiers of the output of quantum generative adversarial networks.