Source author record

Guy N. Rothblum

Guy N. Rothblum appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computational Complexity Cryptography and Security Data Structures and Algorithms

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Decision-Making under Miscalibration

ML-based predictions are used to inform consequential decisions about individuals. How should we use predictions (e.g., risk of heart attack) to inform downstream binary classification decisions (e.g., undergoing a medical procedure)? When the risk estimates are perfectly calibrated, the answer is well understood: a classification problem's cost structure induces an optimal treatment threshold $j^{\star}$. In practice, however, some amount of miscalibration is unavoidable, raising a fundamental question: how should one use potentially miscalibrated predictions to inform binary decisions? We formalize a natural (distribution-free) solution concept: given anticipated miscalibration of $α$, we propose using the threshold $j$ that minimizes the worst-case regret over all $α$-miscalibrated predictors, where the regret is the difference in clinical utility between using the threshold in question and using the optimal threshold in hindsight. We provide closed form expressions for $j$ when miscalibration is measured using both expected and maximum calibration error, which reveal that it indeed differs from $j^{\star}$ (the optimal threshold under perfect calibration). We validate our theoretical findings on real data, demonstrating that there are natural cases in which making decisions using $j$ improves the clinical utility.

preprint2020arXiv

Abstracting Fairness: Oracles, Metrics, and Interpretability

It is well understood that classification algorithms, for example, for deciding on loan applications, cannot be evaluated for fairness without taking context into account. We examine what can be learned from a fairness oracle equipped with an underlying understanding of ``true'' fairness. The oracle takes as input a (context, classifier) pair satisfying an arbitrary fairness definition, and accepts or rejects the pair according to whether the classifier satisfies the underlying fairness truth. Our principal conceptual result is an extraction procedure that learns the underlying truth; moreover, the procedure can learn an approximation to this truth given access to a weak form of the oracle. Since every ``truly fair'' classifier induces a coarse metric, in which those receiving the same decision are at distance zero from one another and those receiving different decisions are at distance one, this extraction process provides the basis for ensuring a rough form of metric fairness, also known as individual fairness. Our principal technical result is a higher fidelity extractor under a mild technical constraint on the weak oracle's conception of fairness. Our framework permits the scenario in which many classifiers, with differing outcomes, may all be considered fair. Our results have implications for interpretablity -- a highly desired but poorly defined property of classification systems that endeavors to permit a human arbiter to reject classifiers deemed to be ``unfair'' or illegitimately derived.

preprint2016arXiv

Concentrated Differential Privacy

We introduce Concentrated Differential Privacy, a relaxation of Differential Privacy enjoying better accuracy than both pure differential privacy and its popular "(epsilon,delta)" relaxation without compromising on cumulative privacy loss over multiple computations.

preprint2011arXiv

Private Data Release via Learning Thresholds

This work considers computationally efficient privacy-preserving data release. We study the task of analyzing a database containing sensitive information about individual participants. Given a set of statistical queries on the data, we want to release approximate answers to the queries while also guaranteeing differential privacy---protecting each participant's sensitive data. Our focus is on computationally efficient data release algorithms; we seek algorithms whose running time is polynomial, or at least sub-exponential, in the data dimensionality. Our primary contribution is a computationally efficient reduction from differentially private data release for a class of counting queries, to learning thresholded sums of predicates from a related class. We instantiate this general reduction with a variety of algorithms for learning thresholds. These instantiations yield several new results for differentially private data release. As two examples, taking {0,1}^d to be the data domain (of dimension d), we obtain differentially private algorithms for: (*) Releasing all k-way conjunctions. For any given k, the resulting data release algorithm has bounded error as long as the database is of size at least d^{O(\sqrt{k\log(k\log d)})}. The running time is polynomial in the database size. (*) Releasing a (1-γ)-fraction of all parity queries. For any γ\geq \poly(1/d), the algorithm has bounded error as long as the database is of size at least \poly(d). The running time is polynomial in the database size. Several other instantiations yield further results for privacy-preserving data release. Of the two results highlighted above, the first learning algorithm uses techniques for representing thresholded sums of predicates as low-degree polynomial threshold functions. The second learning algorithm is based on Jackson's Harmonic Sieve algorithm [Jackson 1997].