Researcher profile

Philip B. Stark

Philip B. Stark contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
7topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

ALPHA: Audit that Learns from Previously Hand-Audited Ballots

BRAVO, the most widely tried method for risk-limiting election audits, cannot accommodate sampling without replacement or stratified sampling, which can improve efficiency and may be required by law. It applies only to ballot-polling audits, which are less efficient than comparison audits. It applies to plurality, majority, super-majority, proportional representation, and ranked-choice voting contests, but not to many social choice functions for which there are RLA methods, such as approval voting, STAR-voting, Borda count, and general scoring rules. And while BRAVO has the smallest expected sample size among sequentially valid ballot-polling-with-replacement methods when reported vote shares are exactly right, it can require arbitrarily large samples when the reported reported winner(s) really won but reported vote shares are wrong. ALPHA is a simple generalization of BRAVO that (i) works for sampling with and without replacement and Bernoulli sampling; (ii) increases power for stratified audits by avoiding the need to use a $P$-value combining function or to maximize $P$-values over nuisance parameters within strata, and allowing adaptive sampling across strata; (iii) works not only for ballot-polling but also for ballot-level comparison, batch-polling, and batch-level comparison audits, sampling with or without replacement, uniformly or with weights proportional to size; (iv) works for all social choice functions covered by SHANGRLA; and (v) in situations where both ALPHA and BRAVO apply, requires smaller samples than BRAVO when the reported vote shares are wrong but the outcome is correct--five orders of magnitude in some examples. ALPHA includes the family of betting martingale tests in RiLACS, with a different betting strategy parametrized as an estimator of the population mean and explicit flexibility to accommodate sampling weights and population bounds that vary by draw.

preprint2022arXiv

Assessing the accuracy of the Australian Senate count: Key steps for a rigorous and transparent audit

This paper explains the main principles and some of the technical details for auditing the scanning and digitisation of the Australian Senate ballot papers. We give a short summary of the motivation for auditing paper ballots, explain the necessary supporting steps for a rigorous and transparent audit, and suggest some statistical methods that would be appropriate for the Australian Senate. 22 June 2022 Update: The update includes analysis of Senate preference data from the 2022 Australian election.

preprint2022arXiv

Comment on "The statistics wars and intellectual conflicts of interest" by D. Mayo

While P-values are widely abused, they are a useful tool for many purposes; banning them is analogous to banning scalpels because most people do not know how to perform surgery. Many reported P-values are not genuine P-values, for a variety of reasons. Perhaps the most widespread and pernicious problem is the Type III error of testing a statistical hypothesis that has little or no connection to the scientific hypothesis.

preprint2022arXiv

Sweeter than SUITE: Supermartingale Stratified Union-Intersection Tests of Elections

Stratified sampling can be useful in risk-limiting audits (RLAs), for instance, to accommodate heterogeneous voting equipment or laws that mandate jurisdictions draw their audit samples independently. We combine the union-intersection tests in SUITE, the reduction of RLAs to testing whether the means of a collection of lists are all $\leq 1/2$ of SHANGRLA, and the nonnegative supermartingale (NNSM) tests in ALPHA to improve the efficiency and flexibility of stratified RLAs. A simple, non-adaptive strategy for combining stratumwise NNSMs decreases the measured risk in the 2018 pilot hybrid audit in Kalamazoo, Michigan, USA by more than an order of magnitude, from 0.037 for SUITE to 0.003 for our method. We give a simple, computationally inexpensive, adaptive rule for deciding which stratum to sample next that reduces audit workload by as much as 74% in examples. We also present NNSM-based tests that are computationally tractable even when there are many strata, illustrated with a simulated audit stratified across California's 58 counties.

preprint2022arXiv

They may look and look, yet not see: BMDs cannot be tested adequately

Bugs, misconfiguration, and malware can cause ballot-marking devices (BMDs) to print incorrect votes. Several approaches to testing BMDs have been proposed. In logic and accuracy testing (LAT) and parallel or live testing, auditors input known test votes into the BMD and check the printout. Passive testing monitors the rate of "spoiled" BMD printout, on the theory that if BMDs malfunction, the rate will increase noticeably. We show that these approaches cannot reliably detect outcome-altering problems, because: (i) The number of possible interactions with BMDs is enormous, so testing interactions uniformly at random is hopeless. (ii) To probe the space of interactions intelligently requires an accurate model of voter behavior, but because the space of interactions is so large, building an accurate model requires observing a huge number of voters in every jurisdiction in every election--more voters than there are in most jurisdictions. (iii) Even with a perfect model of voter behavior, the number of tests needed exceeds the number of voters in most jurisdictions. (iv) An attacker can target interactions that are expensive to test, e.g., because they involve voting slowly; or interactions for which tampering is less likely to be noticed, e.g., because the voter uses the audio interface. (v) Whether BMDs misbehave or not, the distribution of spoiled ballots is unknown and varies by election and possibly by ballot style: historical data do not help much. Hence, there is no way to calibrate a threshold for passive testing, e.g., to guarantee at least a 95% chance of noticing that 5% of the votes were altered, with at most a 5% false alarm rate. (vi) Even if the distribution of spoiled ballots were known to be Poisson, the vast majority of jurisdictions do not have enough voters for passive testing to have a large chance of detecting problems but only a small chance of false alarms.

preprint2020arXiv

Sets of Half-Average Nulls Generate Risk-Limiting Audits: SHANGRLA

Risk-limiting audits (RLAs) for many social choice functions can be reduced to testing sets of null hypotheses of the form "the average of this list is not greater than 1/2" for a collection of finite lists of nonnegative numbers. Such social choice functions include majority, super-majority, plurality, multi-winner plurality, Instant Runoff Voting (IRV), Borda count, approval voting, and STAR-Voting, among others. The audit stops without a full hand count iff all the null hypotheses are rejected. The nulls can be tested in many ways. Ballot-polling is particularly simple; two new ballot-polling risk-measuring functions for sampling without replacement are given. Ballot-level comparison audits transform each null into an equivalent assertion that the mean of re-scaled tabulation errors is not greater than 1/2. In turn, that null can then be tested using the same statistical methods used for ballot polling---but applied to different finite lists of nonnegative numbers. SHANGRLA comparison audits are more efficient than previous comparison audits for two reasons: (i) for most social choice functions, the conditions tested are both necessary and sufficient for the reported outcome to be correct, while previous methods tested conditions that were sufficient but not necessary, and (ii) the tests avoid a conservative approximation. The SHANGRLA abstraction simplifies stratified audits, including audits that combine ballot polling with ballot-level comparisons, producing sharper audits than the "SUITE" approach. SHANGRLA works with the "phantoms to evil zombies" strategy to treat missing ballot cards and missing or redacted cast vote records. That also facilitates sampling from "ballot-style manifests," which can dramatically improve efficiency when the audited contests do not appear on every ballot card. Open-source software implementing SHANGRLA ballot-level comparison audits is available.

preprint2020arXiv

You can do RLAs for IRV

The City and County of San Francisco, CA, has used Instant Runoff Voting (IRV) for some elections since 2004. This report describes the first ever process pilot of Risk Limiting Audits for IRV, for the San Francisco District Attorney's race in November, 2019. We found that the vote-by-mail outcome could be efficiently audited to well under the 0.05 risk limit given a sample of only 200 ballots. All the software we developed for the pilot is open source.