Source author record

James R. Wright

James R. Wright appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Science and Game Theory Artificial Intelligence Cryptography and Security econ.TH Machine Learning physics.flu-dyn

Catalog footprint

What is connected

9works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Disinformation, Stochastic Harm, and Costly Effort: A Principal-Agent Analysis of Regulating Social Media Platforms

The spread of disinformation on social platforms is harmful to society. This harm may manifest as a gradual degradation of public discourse; but it can also take the form of sudden dramatic events such as the 2021 insurrection on Capitol Hill. The platforms themselves are in the best position to prevent the spread of disinformation, as they have the best access to relevant data and the expertise to use it. However, mitigating disinformation is costly, not only for implementing detection algorithms or employing manual effort, but also because limiting such highly viral content impacts user engagement and potential advertising revenue. Since the costs of harmful content are borne by other entities, the platform will therefore have no incentive to exercise the socially-optimal level of effort. This problem is similar to that of environmental regulation, in which the costs of adverse events are not directly borne by a firm, the mitigation effort of a firm is not observable, and the causal link between a harmful consequence and a specific failure is difficult to prove. For environmental regulation, one solution is to perform costly monitoring to ensure that the firm takes adequate precautions according to a specified rule. However, a fixed rule for classifying disinformation becomes less effective over time, as bad actors can learn to sequentially and strategically bypass it. Encoding our domain as a Markov decision process, we demonstrate that no penalty based on a static rule, no matter how large, can incentivize optimal effort. Penalties based on an adaptive rule can incentivize optimal effort, but counter-intuitively, only if the regulator sufficiently overreacts to harmful events by requiring a greater-than-optimal level of effort. We offer novel insights for the effective regulation of social platforms, highlight inherent challenges, and discuss promising avenues for future work.

preprint2022arXiv

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret learning dynamics for individual agents with respect to a set of deviations, and further describes jointly rational behavior among multiple agents with mediated equilibria. To develop hindsight rational learning in sequential decision-making settings, we formalize behavioral deviations as a general class of deviations that respect the structure of extensive-form games. Integrating the idea of time selection into counterfactual regret minimization (CFR), we introduce the extensive-form regret minimization (EFR) algorithm that achieves hindsight rationality for any given set of behavioral deviations with computation that scales closely with the complexity of the set. We identify behavioral deviation subsets, the partial sequence deviation types, that subsume previously studied types and lead to efficient EFR instances in games with moderate lengths. In addition, we present a thorough empirical analysis of EFR instantiated with different deviation types in benchmark games, where we find that stronger types typically induce better performance.

preprint2022arXiv

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

preprint2022arXiv

Game-Theoretic Malware Detection

Malware attacks are costly. To mitigate against such attacks, organizations deploy malware detection tools that help them detect and eventually resolve those threats. While running only the best available tool does not provide enough coverage of the potential attacks, running all available tools is prohibitively expensive in terms of financial cost and computing resources. Therefore, an organization typically runs a set of tools that maximizes their coverage given a limited budget. However, how should an organization choose that set? Attackers are strategic, and will change their behavior to preferentially exploit the gaps left by a deterministic choice of tools. To avoid leaving such easily-exploited gaps, the defender must choose a random set. In this paper, we present an approach to compute an optimal randomization over size-bounded sets of available security analysis tools by modeling the relationship between attackers and security analysts as a leader-follower Stackelberg security game. We estimate the parameters of our model by combining the information from the VirusTotal dataset with the more detailed reports from the National Vulnerability Database. In an empirical comparison, our approach outperforms a set of natural baselines under a wide range of assumptions.

preprint2022arXiv

Hindsight and Sequential Rationality of Correlated Play

Driven by recent successes in two-player, zero-sum game solving and playing, artificial intelligence work on games has increasingly focused on algorithms that produce equilibrium-based strategies. However, this approach has been less effective at producing competent players in general-sum games or those with more than two players than in two-player, zero-sum games. An appealing alternative is to consider adaptive algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. This approach also leads to a game-theoretic analysis, but in the correlated play that arises from joint learning dynamics rather than factored agent behavior at equilibrium. We develop and advocate for this hindsight rationality framing of learning in general sequential decision-making settings. To this end, we re-examine mediated equilibrium and deviation types in extensive-form games, thereby gaining a more complete understanding and resolving past misconceptions. We present a set of examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature, and prove that no tractable concept subsumes all others. This line of inquiry culminates in the definition of the deviation and equilibrium classes that correspond to algorithms in the counterfactual regret minimization (CFR) family, relating them to all others in the literature. Examining CFR in greater detail further leads to a new recursive definition of rationality in correlated play that extends sequential rationality in a way that naturally applies to hindsight evaluation.

preprint2021arXiv

Direct Numerical Simulation of a Turbulent Boundary Layer on a Flat Plate Using Synthetic Turbulence Generation

The turbulent boundary layer over a flat plate is computed by direct numerical simulation (DNS) of the incompressible Navier-Stokes equations as a test bed for a synthetic turbulence generator (STG) inflow boundary condition. The inlet momentum thickness Reynolds number is approximately 1,000. The study provides validation of the ability of the STG to develop accurate turbulence in 5 to 7 boundary layer thicknesses downstream of the boundary condition. Also tested was the effect of changes in the stabilization scheme on the development of the boundary layer. Moreover, the grid resolution required for both the development region and the downstream flow is investigated when using a stabilized finite element method.

preprint2020arXiv

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

Function approximation is a powerful approach for structuring large decision problems that has facilitated great achievements in the areas of reinforcement learning and game playing. Regression counterfactual regret minimization (RCFR) is a simple algorithm for approximately solving imperfect information games with normalized rectified linear unit (ReLU) parameterized policies. In contrast, the more conventional softmax parameterization is standard in the field of reinforcement learning and yields a regret bound with a better dependence on the number of actions. We derive approximation error-aware regret bounds for $(Φ, f)$-regret matching, which applies to a general class of link functions and regret objectives. These bounds recover a tighter bound for RCFR and provide a theoretical justification for RCFR implementations with alternative policy parameterizations ($f$-RCFR), including softmax. We provide exploitability bounds for $f$-RCFR with the polynomial and exponential link functions in zero-sum imperfect information games and examine empirically how the link function interacts with the severity of the approximation. We find that the previously studied ReLU parameterization performs better when the approximation error is small while the softmax parameterization can perform better when the approximation error is large.

preprint2016arXiv

Incentivizing Evaluation via Limited Access to Ground Truth: Peer-Prediction Makes Things Worse

In many settings, an effective way of evaluating objects of interest is to collect evaluations from dispersed individuals and to aggregate these evaluations together. Some examples are categorizing online content and evaluating student assignments via peer grading. For this data science problem, one challenge is to motivate participants to conduct such evaluations carefully and to report them honestly, particularly when doing so is costly. Existing approaches, notably peer-prediction mechanisms, can incentivize truth telling in equilibrium. However, they also give rise to equilibria in which agents do not pay the costs required to evaluate accurately, and hence fail to elicit useful information. We show that this problem is unavoidable whenever agents are able to coordinate using low-cost signals about the items being evaluated (e.g., text labels or pictures). We then consider ways of circumventing this problem by comparing agents' reports to ground truth, which is available in practice when there exist trusted evaluators---such as teaching assistants in the peer grading scenario---who can perform a limited number of unbiased (but noisy) evaluations. Of course, when such ground truth is available, a simpler approach is also possible: rewarding each agent based on agreement with ground truth with some probability, and unconditionally rewarding the agent otherwise. Surprisingly, we show that the simpler mechanism achieves stronger incentive guarantees given less access to ground truth than a large set of peer-prediction mechanisms.

preprint2016arXiv

Models of Level-0 Behavior for Predicting Human Behavior in Games

Behavioral game theory seeks to describe the way actual people (as compared to idealized, "rational" agents) act in strategic situations. Our own recent work has identified iterative models (such as quantal cognitive hierarchy) as the state of the art for predicting human play in unrepeated, simultaneous-move games (Wright & Leyton-Brown 2012, 2016). Iterative models predict that agents reason iteratively about their opponents, building up from a specification of nonstrategic behavior called level-0. The modeler is in principle free to choose any description of level-0 behavior that makes sense for the setting. However, almost all existing work specifies this behavior as a uniform distribution over actions. In most games it is not plausible that even nonstrategic agents would choose an action uniformly at random, nor that other agents would expect them to do so. A more accurate model for level-0 behavior has the potential to dramatically improve predictions of human behavior, since a substantial fraction of agents may play level-0 strategies directly, and furthermore since iterative models ground all higher-level strategies in responses to the level-0 strategy. Our work considers models of the way in which level-0 agents construct a probability distribution over actions, given an arbitrary game. Using a Bayesian optimization package called SMAC (Hutter, Hoos, & Leyton-Brown, 2010, 2011, 2012), we systematically evaluated a large space of such models, each of which makes its prediction based only on general features that can be computed from any normal form game. In the end, we recommend a model that achieved excellent performance across the board: a linear weighting of features that requires the estimation of four weights. We evaluated the effects of combining this new level-0 model with several iterative models, and observed large improvements in the models' predictive accuracies.

James R. Wright

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Disinformation, Stochastic Harm, and Costly Effort: A Principal-Agent Analysis of Regulating Social Media Platforms

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games

Efficient Deviation Types and Learning for Hindsight Rationality in Extensive-Form Games: Corrections

Game-Theoretic Malware Detection

Hindsight and Sequential Rationality of Correlated Play

Direct Numerical Simulation of a Turbulent Boundary Layer on a Flat Plate Using Synthetic Turbulence Generation

Alternative Function Approximation Parameterizations for Solving Games: An Analysis of $f$-Regression Counterfactual Regret Minimization

Incentivizing Evaluation via Limited Access to Ground Truth: Peer-Prediction Makes Things Worse

Models of Level-0 Behavior for Predicting Human Behavior in Games