Researcher profile

David Holtz

David Holtz contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 17 - UnverifiedVerification L1Unclaimed author
4works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

4 published item(s)

preprint2026arXiv

Prompt Adaptation as a Dynamic Complement in Generative AI Systems

As generative AI systems rapidly improve, a key question emerges: how do users adapt to these changes, and when does such adaptation matter for realizing performance gains? Drawing on theories of dynamic capabilities and IT complements, we study prompt adaptation--how users adjust their inputs in response to evolving model behavior--using a common experimental design applied to two preregistered tasks with 3,750 total participants who submitted nearly 37,000 prompts. We show that the importance of prompt adaptation depends critically on task structure. In a task with fixed evaluation criteria and an unambiguous goal, user prompt adaptation accounts for roughly half of the performance gains from a model upgrade. In contrast, in an open-ended creative task where the space of acceptable outputs is effectively unbounded and quality is subjective, performance improvements are driven primarily by model capability; prompt adaptation plays a limited role. We further show that automated prompt rewriting cannot generally substitute for human adaptation: when aligned with task objectives, it can modestly improve performance, but when misaligned, it can actively undermine the gains from model improvements. Together, these findings position prompt adaptation as a dynamic complement whose importance depends on task structure and system design, and suggest that without it, a substantial share of the economic value created by advances in generative models may go unrealized.

preprint2020arXiv

Limiting Bias from Test-Control Interference in Online Marketplace Experiments

In an A/B test, the typical objective is to measure the total average treatment effect (TATE), which measures the difference between the average outcome if all users were treated and the average outcome if all users were untreated. However, a simple difference-in-means estimator will give a biased estimate of the TATE when outcomes of control units depend on the outcomes of treatment units, an issue we refer to as test-control interference. Using a simulation built on top of data from Airbnb, this paper considers the use of methods from the network interference literature for online marketplace experimentation. We model the marketplace as a network in which an edge exists between two sellers if their goods substitute for one another. We then simulate seller outcomes, specifically considering a "status quo" context and "treatment" context that forces all sellers to lower their prices. We use the same simulation framework to approximate TATE distributions produced by using blocked graph cluster randomization, exposure modeling, and the Hajek estimator for the difference in means. We find that while blocked graph cluster randomization reduces the bias of the naive difference-in-means estimator by as much as 62%, it also significantly increases the variance of the estimator. On the other hand, the use of more sophisticated estimators produces mixed results. While some provide (small) additional reductions in bias and small reductions in variance, others lead to increased bias and variance. Overall, our results suggest that experiment design and analysis techniques from the network experimentation literature are promising tools for reducing bias due to test-control interference in marketplace experiments.

preprint2020arXiv

Reducing Interference Bias in Online Marketplace Pricing Experiments

Online marketplace designers frequently run A/B tests to measure the impact of proposed product changes. However, given that marketplaces are inherently connected, total average treatment effect estimates obtained through Bernoulli randomized experiments are often biased due to violations of the stable unit treatment value assumption. This can be particularly problematic for experiments that impact sellers' strategic choices, affect buyers' preferences over items in their consideration set, or change buyers' consideration sets altogether. In this work, we measure and reduce bias due to interference in online marketplace experiments by using observational data to create clusters of similar listings, and then using those clusters to conduct cluster-randomized field experiments. We provide a lower bound on the magnitude of bias due to interference by conducting a meta-experiment that randomizes over two experiment designs: one Bernoulli randomized, one cluster randomized. In both meta-experiment arms, treatment sellers are subject to a different platform fee policy than control sellers, resulting in different prices for buyers. By conducting a joint analysis of the two meta-experiment arms, we find a large and statistically significant difference between the total average treatment effect estimates obtained with the two designs, and estimate that 32.60% of the Bernoulli-randomized treatment effect estimate is due to interference bias. We also find weak evidence that the magnitude and/or direction of interference bias depends on extent to which a marketplace is supply- or demand-constrained, and analyze a second meta-experiment to highlight the difficulty of detecting interference bias when treatment interventions require intention-to-treat analysis.

preprint2020arXiv

The Engagement-Diversity Connection: Evidence from a Field Experiment on Spotify

It remains unknown whether personalized recommendations increase or decrease the diversity of content people consume. We present results from a randomized field experiment on Spotify testing the effect of personalized recommendations on consumption diversity. In the experiment, both control and treatment users were given podcast recommendations, with the sole aim of increasing podcast consumption. Treatment users' recommendations were personalized based on their music listening history, whereas control users were recommended popular podcasts among users in their demographic group. We find that, on average, the treatment increased podcast streams by 28.90%. However, the treatment also decreased the average individual-level diversity of podcast streams by 11.51%, and increased the aggregate diversity of podcast streams by 5.96%, indicating that personalized recommendations have the potential to create patterns of consumption that are homogenous within and diverse across users, a pattern reflecting Balkanization. Our results provide evidence of an "engagement-diversity trade-off" when recommendations are optimized solely to drive consumption: while personalized recommendations increase user engagement, they also affect the diversity of consumed content. This shift in consumption diversity can affect user retention and lifetime value, and impact the optimal strategy for content producers. We also observe evidence that our treatment affected streams from sections of Spotify's app not directly affected by the experiment, suggesting that exposure to personalized recommendations can affect the content that users consume organically. We believe these findings highlight the need for academics and practitioners to continue investing in personalization methods that explicitly take into account the diversity of content recommended.