Source author record

Jason Poulos

Jason Poulos appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications econ.GN Machine Learning q-fin.EC Artificial Intelligence econ.EM Methodology Software Engineering

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces

AI agents may soon become capable of autonomously completing valuable, long-horizon tasks in diverse domains. Current benchmarks either do not measure real-world tasks, or are not sufficiently difficult to meaningfully measure frontier models. To this end, we present Terminal-Bench 2.0: a carefully curated hard benchmark composed of 89 tasks in computer terminal environments inspired by problems from real workflows. Each task features a unique environment, human-written solution, and comprehensive tests for verification. We show that frontier models and agents score less than 65\% on the benchmark and conduct an error analysis to identify areas for model and agent improvement. We publish the dataset and evaluation harness to assist developers and researchers in future work at https://www.tbench.ai/ .

preprint2023arXiv

Gender gaps in frontier entrepreneurship? Evidence from 1901 Oklahoma land lottery winners

The paper investigates gender differences in entrepreneurship by exploiting a large-scale land lottery in Oklahoma at the turn of the 20$^{\text{th}}$ century. Lottery winners claimed land in the order in which their names were drawn, so the draw number is an approximate rank ordering of lottery wealth. This mechanism allows for the estimation of a dose-response function, which relates each draw number to the expected outcome under each draw. I estimate dose-response functions on a linked dataset of lottery winners and land patent records, and find the probability of purchasing land from the government to be decreasing as a function of lottery wealth, which is evidence for the presence of liquidity constraints. I find female winners were more effective in leveraging lottery wealth to purchase additional land, as evidenced by significantly higher median dose-responses compared to those of male winners. For a sample of winners linked to the 1910 Census, I find that male winners have higher median dose-responses compared to female winners in terms of farm or home ownership. These results suggest that liquidity constraints may have been more binding for female entrepreneurs in the market economy.

preprint2022arXiv

Are deep learning models superior for missing data imputation in large surveys? Evidence from an empirical comparison

Multiple imputation (MI) is a popular approach for dealing with missing data arising from non-response in sample surveys. Multiple imputation by chained equations (MICE) is one of the most widely used MI algorithms for multivariate data, but it lacks theoretical foundation and is computationally intensive. Recently, missing data imputation methods based on deep learning models have been developed with encouraging results in small studies. However, there has been limited research on evaluating their performance in realistic settings compared to MICE, particularly in big surveys. We conduct extensive simulation studies based on a subsample of the American Community Survey to compare the repeated sampling properties of four machine learning based MI methods: MICE with classification trees, MICE with random forests, generative adversarial imputation networks, and multiple imputation using denoising autoencoders. We find the deep learning imputation methods are superior to MICE in terms of computational time. However, with the default choice of hyperparameters in the common software packages, MICE with classification trees consistently outperforms, often by a large margin, the deep learning imputation methods in terms of bias, mean squared error, and coverage under a range of realistic settings.

preprint2021arXiv

Amnesty Policy and Elite Persistence in the Postbellum South: Evidence from a Regression Discontinuity Design

This paper investigates the impact of Reconstruction-era amnesty policy on the officeholding and wealth of elites in the postbellum South. Amnesty policy restricted the political and economic rights of Southern elites for nearly three years during Reconstruction. I estimate the effect of being excluded from amnesty on elites' future wealth and political power using a regression discontinuity design that compares individuals just above and below a wealth threshold that determined exclusion from amnesty. Results on a sample of Reconstruction convention delegates show that exclusion from amnesty significantly decreased the likelihood of ex-post officeholding. I find no evidence that exclusion impacted later census wealth for Reconstruction delegates or for a larger sample of known slaveholders who lived in the South in 1860. These findings are in line with previous studies evidencing both changes to the identity of the political elite, and the continuity of economic mobility among the planter elite across the Civil War and Reconstruction.

preprint2020arXiv

Estimating population average treatment effects from experiments with noncompliance

Randomized control trials (RCTs) are the gold standard for estimating causal effects, but often use samples that are non-representative of the actual population of interest. We propose a reweighting method for estimating population average treatment effects in settings with noncompliance. Simulations show the proposed compliance-adjusted population estimator outperforms its unadjusted counterpart when compliance is relatively low and can be predicted by observed covariates. We apply the method to evaluate the effect of Medicaid coverage on health care use for a target population of adults who may benefit from expansions to the Medicaid program. We draw RCT data from the Oregon Health Insurance Experiment, where less than one-third of those randomly selected to receive Medicaid benefits actually enrolled.

preprint2018arXiv

Missing Data Imputation for Supervised Learning

Missing data imputation can help improve the performance of prediction models in situations where missing data hide useful information. This paper compares methods for imputing missing categorical data for supervised classification tasks. We experiment on two machine learning benchmark datasets with missing categorical data, comparing classifiers trained on non-imputed (i.e., one-hot encoded) or imputed data with different levels of additional missing-data perturbation. We show imputation methods can increase predictive accuracy in the presence of missing-data perturbation, which can actually improve prediction accuracy by regularizing the classifier. We achieve the state-of-the-art on the Adult dataset with missing-data perturbation and k-nearest-neighbors (k-NN) imputation.