Researcher profile

Jeff Schneider

Jeff Schneider contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
20works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

20 published item(s)

preprint2026arXiv

Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stability, confinement, and transport. While the average rotation can be controlled, controlling the full profile is challenging due to high dimensionality, response to multiple actuators and dependence on plasma condition. Learning-based control methods, such as reinforcement learning (RL), provide a potential solution to this challenging problem with ability to model complex interactions leading to effective multi-input multi-output control. However, learning such policies is challenging due to the lack of accurate simulators that can model the rotation profile dynamics. In this work, we investigate the use of offline RL and offline model-based RL algorithms for rotation profile control, training them solely on historical data from the DIII-D tokamak. Our final method uses probabilistic models of plasma dynamics to generate rollouts for RL training. We deploy this policy on the DIII-D Tokamak and observe promising real-world results. We conclude by highlighting key challenges and insights from training and deploying an RL policy on a complex physical device while using only limited past data.

preprint2022arXiv

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.

preprint2022arXiv

An Experimental Design Perspective on Model-Based Reinforcement Learning

In many practical applications of RL, it is expensive to observe state transitions from the environment. For example, in the problem of plasma control for nuclear fusion, computing the next state for a given state-action pair requires querying an expensive transition function which can lead to many hours of computer simulation or dollars of scientific research. Such expensive data collection prohibits application of standard RL algorithms which usually require a large number of observations to learn. In this work, we address the problem of efficiently learning a policy while making a minimal number of state-action queries to the transition function. In particular, we leverage ideas from Bayesian optimal experimental design to guide the selection of state-action queries for efficient learning. We propose an acquisition function that quantifies how much information a state-action pair would provide about the optimal solution to a Markov decision process. At each iteration, our algorithm maximizes this acquisition function, to choose the most informative state-action pair to be queried, thus yielding a data-efficient RL approach. We experiment with a variety of simulated continuous control problems and show that our approach learns an optimal policy with up to $5$ -- $1,000\times$ less data than model-based RL baselines and $10^3$ -- $10^5\times$ less data than model-free RL baselines. We also provide several ablated comparisons which point to substantial improvements arising from the principled method of obtaining data.

preprint2022arXiv

BATS: Best Action Trajectory Stitching

The problem of offline reinforcement learning focuses on learning a good policy from a log of environment interactions. Past efforts for developing algorithms in this area have revolved around introducing constraints to online reinforcement learning algorithms to ensure the actions of the learned policy are constrained to the logged data. In this work, we explore an alternative approach by planning on the fixed dataset directly. Specifically, we introduce an algorithm which forms a tabular Markov Decision Process (MDP) over the logged data by adding new transitions to the dataset. We do this by using learned dynamics models to plan short trajectories between states. Since exact value iteration can be performed on this constructed MDP, it becomes easy to identify which trajectories are advantageous to add to the MDP. Crucially, since most transitions in this MDP come from the logged data, trajectories from the MDP can be rolled out for long periods with confidence. We prove that this property allows one to make upper and lower bounds on the value function up to appropriate distance metrics. Finally, we demonstrate empirically how algorithms that uniformly constrain the learned policy to the entire dataset can result in unwanted behavior, and we show an example in which simply behavior cloning the optimal policy of the MDP created by our algorithm avoids this problem.

preprint2022arXiv

How Useful are Gradients for OOD Detection Really?

One critical challenge in deploying highly performant machine learning models in real-life applications is out of distribution (OOD) detection. Given a predictive model which is accurate on in distribution (ID) data, an OOD detection system will further equip the model with the option to defer prediction when the input is novel and the model has little confidence in prediction. There has been some recent interest in utilizing the gradient information in pre-trained models for OOD detection. While these methods have shown competitive performance, there are misconceptions about the true mechanism underlying them, which conflate their performance with the necessity of gradients. In this work, we provide an in-depth analysis and comparison of gradient based methods and elucidate the key components that warrant their OOD detection performance. We further propose a general, non-gradient based method of OOD detection which improves over previous baselines in both performance and computational efficiency.

preprint2022arXiv

Robust Reinforcement Learning via Genetic Curriculum

Achieving robust performance is crucial when applying deep reinforcement learning (RL) in safety critical systems. Some of the state of the art approaches try to address the problem with adversarial agents, but these agents often require expert supervision to fine tune and prevent the adversary from becoming too challenging to the trainee agent. While other approaches involve automatically adjusting environment setups during training, they have been limited to simple environments where low-dimensional encodings can be used. Inspired by these approaches, we propose genetic curriculum, an algorithm that automatically identifies scenarios in which the agent currently fails and generates an associated curriculum to help the agent learn to solve the scenarios and acquire more robust behaviors. As a non-parametric optimizer, our approach uses a raw, non-fixed encoding of scenarios, reducing the need for expert supervision and allowing our algorithm to adapt to the changing performance of the agent. Our empirical studies show improvement in robustness over the existing state of the art algorithms, providing training curricula that result in agents being 2 - 8x times less likely to fail without sacrificing cumulative reward. We include an ablation study and share insights on why our algorithm outperforms prior approaches.

preprint2022arXiv

UGV-UAV Object Geolocation in Unstructured Environments

A robotic system of multiple unmanned ground vehicles (UGVs) and unmanned aerial vehicles (UAVs) has the potential for advancing autonomous object geolocation performance. Much research has focused on algorithmic improvements on individual components, such as navigation, motion planning, and perception. In this paper, we present a UGV-UAV object detection and geolocation system, which performs perception, navigation, and planning autonomously in real scale in unstructured environment. We designed novel sensor pods equipped with multispectral (visible, near-infrared, thermal), high resolution (181.6 Mega Pixels), stereo (near-infrared pair), wide field of view (192 degree HFOV) array. We developed a novel on-board software-hardware architecture to process the high volume sensor data in real-time, and we built a custom AI subsystem composed of detection, tracking, navigation, and planning for autonomous objects geolocation in real-time. This research is the first real scale demonstration of such high speed data processing capability. Our novel modular sensor pod can boost relevant computer vision and machine learning research. Our novel hardware-software architecture is a solid foundation for system-level and component-level research. Our system is validated through data-driven offline tests as well as a series of field tests in unstructured environments. We present quantitative results as well as discussions on key robotic system level challenges which manifest when we build and test the system. This system is the first step toward a UGV-UAV cooperative reconnaissance system in the future.

preprint2021arXiv

A Machine Learning Approach for Dynamical Mass Measurements of Galaxy Clusters

We present a modern machine learning approach for cluster dynamical mass measurements that is a factor of two improvement over using a conventional scaling relation. Different methods are tested against a mock cluster catalog constructed using halos with mass >= 10^14 Msolar/h from Multidark's publicly-available N-body MDPL halo catalog. In the conventional method, we use a standard M(sigma_v) power law scaling relation to infer cluster mass, M, from line-of-sight (LOS) galaxy velocity dispersion, sigma_v. The resulting fractional mass error distribution is broad, with width=0.87 (68% scatter), and has extended high-error tails. The standard scaling relation can be simply enhanced by including higher-order moments of the LOS velocity distribution. Applying the kurtosis as a correction term to log(sigma_v) reduces the width of the error distribution to 0.74 (16% improvement). Machine learning can be used to take full advantage of all the information in the velocity distribution. We employ the Support Distribution Machines (SDMs) algorithm that learns from distributions of data to predict single values. SDMs trained and tested on the distribution of LOS velocities yield width=0.46 (47% improvement). Furthermore, the problematic tails of the mass error distribution are effectively eliminated. Decreasing cluster mass errors will improve measurements of the growth of structure and lead to tighter constraints on cosmological parameters.

preprint2021arXiv

Affordance-based Reinforcement Learning for Urban Driving

Traditional autonomous vehicle pipelines that follow a modular approach have been very successful in the past both in academia and industry, which has led to autonomy deployed on road. Though this approach provides ease of interpretation, its generalizability to unseen environments is limited and hand-engineering of numerous parameters is required, especially in the prediction and planning systems. Recently, deep reinforcement learning has been shown to learn complex strategic games and perform challenging robotic tasks, which provides an appealing framework for learning to drive. In this work, we propose a deep reinforcement learning framework to learn optimal control policy using waypoints and low-dimensional visual representations, also known as affordances. We demonstrate that our agents when trained from scratch learn the tasks of lane-following, driving around inter-sections as well as stopping in front of other actors or traffic lights even in the dense traffic setting. We note that our method achieves comparable or better performance than the baseline methods on the original and NoCrash benchmarks on the CARLA simulator.

preprint2021arXiv

Deep Mean Maps

The use of distributions and high-level features from deep architecture has become commonplace in modern computer vision. Both of these methodologies have separately achieved a great deal of success in many computer vision tasks. However, there has been little work attempting to leverage the power of these to methodologies jointly. To this end, this paper presents the Deep Mean Maps (DMMs) framework, a novel family of methods to non-parametrically represent distributions of features in convolutional neural network models. DMMs are able to both classify images using the distribution of top-level features, and to tune the top-level features for performing this task. We show how to implement DMMs using a special mean map layer composed of typical CNN operations, making both forward and backward propagation simple. We illustrate the efficacy of DMMs at analyzing distributional patterns in image data in a synthetic data experiment. We also show that we extending existing deep architectures with DMMs improves the performance of existing CNNs on several challenging real-world datasets.

preprint2021arXiv

Kernels on Sample Sets via Nonparametric Divergence Estimates

Most machine learning algorithms, such as classification or regression, treat the individual data point as the object of interest. Here we consider extending machine learning algorithms to operate on groups of data points. We suggest treating a group of data points as an i.i.d. sample set from an underlying feature distribution for that group. Our approach employs kernel machines with a kernel on i.i.d. sample sets of vectors. We define certain kernel functions on pairs of distributions, and then use a nonparametric estimator to consistently estimate those functions based on sample sets. The projection of the estimated Gram matrix to the cone of symmetric positive semi-definite matrices enables us to use kernel machines for classification, regression, anomaly detection, and low-dimensional embedding in the space of distributions. We present several numerical experiments both on real and simulated datasets to demonstrate the advantages of our new approach.

preprint2021arXiv

Linear-time Learning on Distributions with Approximate Kernel Embeddings

Many interesting machine learning problems are best posed by considering instances that are distributions, or sample sets drawn from distributions. Previous work devoted to machine learning tasks with distributional inputs has done so through pairwise kernel evaluations between pdfs (or sample sets). While such an approach is fine for smaller datasets, the computation of an $N \times N$ Gram matrix is prohibitive in large datasets. Recent scalable estimators that work over pdfs have done so only with kernels that use Euclidean metrics, like the $L_2$ distance. However, there are a myriad of other useful metrics available, such as total variation, Hellinger distance, and the Jensen-Shannon divergence. This work develops the first random features for pdfs whose dot product approximates kernels using these non-Euclidean metrics, allowing estimators using such kernels to scale to large datasets by working in a primal space, without computing large Gram matrices. We provide an analysis of the approximation error in using our proposed random features and show empirically the quality of our approximation both in estimating a Gram matrix and in solving learning tasks in real-world and synthetic data.

preprint2021arXiv

Neural Dynamical Systems: Balancing Structure and Flexibility in Physical Prediction

We introduce Neural Dynamical Systems (NDS), a method of learning dynamical models in various gray-box settings which incorporates prior knowledge in the form of systems of ordinary differential equations. NDS uses neural networks to estimate free parameters of the system, predicts residual terms, and numerically integrates over time to predict future states. A key insight is that many real dynamical systems of interest are hard to model because the dynamics may vary across rollouts. We mitigate this problem by taking a trajectory of prior states as the input to NDS and train it to dynamically estimate system parameters using the preceding trajectory. We find that NDS learns dynamics with higher accuracy and fewer samples than a variety of deep learning methods that do not incorporate the prior knowledge and methods from the system identification literature which do. We demonstrate these advantages first on synthetic dynamical systems and then on real data captured from deuterium shots from a nuclear fusion reactor. Finally, we demonstrate that these benefits can be utilized for control in small-scale experiments.

preprint2020arXiv

Asynchronous Multi Agent Active Search

Active search refers to the problem of efficiently locating targets in an unknown environment by actively making data-collection decisions, and has many applications including detecting gas leaks, radiation sources or human survivors of disasters using aerial and/or ground robots (agents). Existing active search methods are in general only amenable to a single agent, or if they extend to multi agent they require a central control system to coordinate the actions of all agents. However, such control systems are often impractical in robotics applications. In this paper, we propose two distinct active search algorithms called SPATS (Sparse Parallel Asynchronous Thompson Sampling) and LATSI (LAplace Thompson Sampling with Information gain) that allow for multiple agents to independently make data-collection decisions without a central coordinator. Throughout we consider that targets are sparsely located around the environment in keeping with compressive sensing assumptions and its applicability in real world scenarios. Additionally, while most common search algorithms assume that agents can sense the entire environment (e.g. compressive sensing) or sense point-wise (e.g. Bayesian Optimization) at all times, we make a realistic assumption that each agent can only sense a contiguous region of space at a time. We provide simulation results as well as theoretical analysis to demonstrate the efficacy of our proposed algorithms.

preprint2020arXiv

Interactive Visualization for Debugging RL

Visualization tools for supervised learning allow users to interpret, introspect, and gain an intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting as these tools address challenges typically found in the supervised learning regime. In this work, we design and implement an interactive visualization tool for debugging and interpreting RL algorithms. Our system addresses many features missing from previous tools such as (1) tools for supervised learning often are not interactive; (2) while debugging RL policies researchers use state representations that are different from those seen by the agent; (3) a framework designed to make the debugging RL policies more conducive. We provide an example workflow of how this system could be used, along with ideas for future extensions.

preprint2020arXiv

Offline Contextual Bayesian Optimization for Nuclear Fusion

Nuclear fusion is regarded as the energy of the future since it presents the possibility of unlimited clean energy. One obstacle in utilizing fusion as a feasible energy source is the stability of the reaction. Ideally, one would have a controller for the reactor that makes actions in response to the current state of the plasma in order to prolong the reaction as long as possible. In this work, we make preliminary steps to learning such a controller. Since learning on a real world reactor is infeasible, we tackle this problem by attempting to learn optimal controls offline via a simulator, where the state of the plasma can be explicitly set. In particular, we introduce a theoretically grounded Bayesian optimization algorithm that recommends a state and action pair to evaluate at every iteration and show that this results in more efficient use of the simulator.

preprint2020arXiv

Predicting Motion of Vulnerable Road Users using High-Definition Maps and Efficient ConvNets

Following detection and tracking of traffic actors, prediction of their future motion is the next critical component of a self-driving vehicle (SDV) technology, allowing the SDV to operate safely and efficiently in its environment. This is particularly important when it comes to vulnerable road users (VRUs), such as pedestrians and bicyclists. These actors need to be handled with special care due to an increased risk of injury, as well as the fact that their behavior is less predictable than that of motorized actors. To address this issue, in the current study we present a deep learning-based method for predicting VRU movement, where we rasterize high-definition maps and actor's surroundings into a bird's-eye view image used as an input to deep convolutional networks. In addition, we propose a fast architecture suitable for real-time inference, and perform an ablation study of various rasterization approaches to find the optimal choice for accurate prediction. The results strongly indicate benefits of using the proposed approach for motion prediction of VRUs, both in terms of accuracy and latency.

preprint2020arXiv

Tuning Hyperparameters without Grad Students: Scalable and Robust Bayesian Optimisation with Dragonfly

Bayesian Optimisation (BO) refers to a suite of techniques for global optimisation of expensive black box functions, which use introspective Bayesian models of the function to efficiently search for the optimum. While BO has been applied successfully in many applications, modern optimisation tasks usher in new challenges where conventional methods fail spectacularly. In this work, we present Dragonfly, an open source Python library for scalable and robust BO. Dragonfly incorporates multiple recently developed methods that allow BO to be applied in challenging real world settings; these include better methods for handling higher dimensional domains, methods for handling multi-fidelity evaluations when cheap approximations of an expensive function are available, methods for optimising over structured combinatorial spaces, such as the space of neural network architectures, and methods for handling parallel evaluations. Additionally, we develop new methodological improvements in BO for selecting the Bayesian model, selecting the acquisition function, and optimising over complex domains with different variable types and additional constraints. We compare Dragonfly to a suite of other packages and algorithms for global optimisation and demonstrate that when the above methods are integrated, they enable significant improvements in the performance of BO. The Dragonfly library is available at dragonfly.github.io.

preprint2020arXiv

Uncertainty-aware Short-term Motion Prediction of Traffic Actors for Autonomous Driving

We address one of the crucial aspects necessary for safe and efficient operations of autonomous vehicles, namely predicting future state of traffic actors in the autonomous vehicle's surroundings. We introduce a deep learning-based approach that takes into account a current world state and produces raster images of each actor's vicinity. The rasters are then used as inputs to deep convolutional models to infer future movement of actors while also accounting for and capturing inherent uncertainty of the prediction task. Extensive experiments on real-world data strongly suggest benefits of the proposed approach. Moreover, following completion of the offline tests the system was successfully tested onboard self-driving vehicles.

preprint2020arXiv

Vizarel: A System to Help Better Understand RL Agents

Visualization tools for supervised learning have allowed users to interpret, introspect, and gain intuition for the successes and failures of their models. While reinforcement learning practitioners ask many of the same questions, existing tools are not applicable to the RL setting. In this work, we describe our initial attempt at constructing a prototype of these ideas, through identifying possible features that such a system should encapsulate. Our design is motivated by envisioning the system to be a platform on which to experiment with interpretable reinforcement learning.