Source author record

Omer Gottesman

Omer Gottesman appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning cond-mat.mtrl-sci cond-mat.soft Artificial Intelligence cond-mat.stat-mech physics.comp-ph Populations and Evolution

Catalog footprint

What is connected

7works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Bayesian Approach to Learning Bandit Structure in Markov Decision Processes

In the reinforcement learning literature, there are many algorithms developed for either Contextual Bandit (CB) or Markov Decision Processes (MDP) environments. However, when deploying reinforcement learning algorithms in the real world, even with domain expertise, it is often difficult to know whether it is appropriate to treat a sequential decision making problem as a CB or an MDP. In other words, do actions affect future states, or only the immediate rewards? Making the wrong assumption regarding the nature of the environment can lead to inefficient learning, or even prevent the algorithm from ever learning an optimal policy, even with infinite data. In this work we develop an online algorithm that uses a Bayesian hypothesis testing approach to learn the nature of the environment. Our algorithm allows practitioners to incorporate prior knowledge about whether the environment is that of a CB or an MDP, and effectively interpolate between classical CB and MDP-based algorithms to mitigate against the effects of misspecifying the environment. We perform simulations and demonstrate that in CB settings our algorithm achieves lower regret than MDP-based algorithms, while in non-bandit MDP settings our algorithm is able to learn the optimal policy, often achieving comparable regret to MDP-based algorithms.

preprint2021arXiv

Learning to search efficiently for causally near-optimal treatments

Finding an effective medical treatment often requires a search by trial and error. Making this search more efficient by minimizing the number of unnecessary trials could lower both costs and patient suffering. We formalize this problem as learning a policy for finding a near-optimal treatment in a minimum number of trials using a causal inference framework. We give a model-based dynamic programming algorithm which learns from observational data while being robust to unmeasured confounding. To reduce time complexity, we suggest a greedy algorithm which bounds the near-optimality constraint. The methods are evaluated on synthetic and real-world healthcare data and compared to model-free reinforcement learning. We find that our methods compare favorably to the model-free baseline while offering a more transparent trade-off between search time and treatment efficacy.

preprint2020arXiv

Interpretable Off-Policy Evaluation in Reinforcement Learning by Highlighting Influential Transitions

Off-policy evaluation in reinforcement learning offers the chance of using observational data to improve future outcomes in domains such as healthcare and education, but safe deployment in high stakes settings requires ways of assessing its validity. Traditional measures such as confidence intervals may be insufficient due to noise, limited data and confounding. In this paper we develop a method that could serve as a hybrid human-AI system, to enable human experts to analyze the validity of policy evaluation estimates. This is accomplished by highlighting observations in the data whose removal will have a large effect on the OPE estimate, and formulating a set of rules for choosing which ones to present to domain experts for validation. We develop methods to compute exactly the influence functions for fitted Q-evaluation with two different function classes: kernel-based and linear least squares, as well as importance sampling methods. Experiments on medical simulations and real-world intensive care unit data demonstrate that our method can be used to identify limitations in the evaluation process and make evaluation more robust.

preprint2019arXiv

Combining Parametric and Nonparametric Models for Off-Policy Evaluation

We consider a model-based approach to perform batch off-policy evaluation in reinforcement learning. Our method takes a mixture-of-experts approach to combine parametric and non-parametric models of the environment such that the final value estimate has the least expected error. We do so by first estimating the local accuracy of each model and then using a planner to select which model to use at every time step as to minimize the return error estimate along entire trajectories. Across a variety of domains, our mixture-based approach outperforms the individual models alone as well as state-of-the-art importance sampling-based estimators.

preprint2016arXiv

On the incompressibility of cylindrical origami patterns

The art and science of folding intricate three-dimensional structures out of paper has occupied artists, designers, engineers, and mathematicians for decades, culminating in the design of deployable structures and mechanical metamaterials. Here we investigate the axial compressibility of origami cylinders, i.e., cylindrical structures folded from rectangular sheets of paper. We prove, using geometric arguments, that a general fold pattern only allows for a finite number of \emph{isometric} cylindrical embeddings. Therefore, compressibility of such structures requires either stretching the material or deforming the folds. Our result considerably restricts the space of constructions that must be searched when designing new types of origami-based rigid-foldable deployable structures and metamaterials.

preprint2013arXiv

Furrows in the wake of propagating d-cones

We investigate the formation dynamics of plastic creases in thin elasto-plastic sheets. In contrast to the commonly accepted description of crumpled thin sheets, which asserts that creases form only by elastic interaction between two d-cones, the creases we study in this letter are created by plastic deformations left in the wake of a single propagating d-cone. Upon application of load, a d-cone initially remains stationary and responds by deforming globally. However, above a critical load, the d-cone undergoes a sharpening transition that focuses the stresses at its tip, allowing it to propagate along the sheet, leaving a furrow-like scar in its wake. Our results show that the dynamics of plastic defect creation are important for predicting the final geometry and statistics of a defect network in a crumpled thin sheet.

preprint2012arXiv

Multiple extinction routes in stochastic population models

Isolated populations ultimately go extinct because of the intrinsic noise of elementary processes. In multi-population systems extinction of a population may occur via more than one route. We investigate this generic situation in a simple predator-prey (or infected-susceptible) model. The predator and prey populations may coexist for a long time but ultimately both go extinct. In the first extinction route the predators go extinct first, whereas the prey thrive for a long time and then also go extinct. In the second route the prey go extinct first causing a rapid extinction of the predators. Assuming large sub-population sizes in the coexistence state, we compare the probabilities of each of the two extinction routes and predict the most likely path of the sub-populations to extinction. We also suggest an effective three-state master equation for the probabilities to observe the coexistence state, the predator-free state and the empty state.