Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
16topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2023arXiv

Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit

There is mounting evidence of emergent phenomena in the capabilities of deep learning methods as we scale up datasets, model sizes, and training times. While there are some accounts of how these resources modulate statistical capacity, far less is known about their effect on the computational problem of model training. This work conducts such an exploration through the lens of learning a $k$-sparse parity of $n$ bits, a canonical discrete search problem which is statistically easy but computationally hard. Empirically, we find that a variety of neural networks successfully learn sparse parities, with discontinuous phase transitions in the training curves. On small instances, learning abruptly occurs at approximately $n^{O(k)}$ iterations; this nearly matches SQ lower bounds, despite the apparent lack of a sparse prior. Our theoretical analysis shows that these observations are not explained by a Langevin-like mechanism, whereby SGD "stumbles in the dark" until it finds the hidden set of features (a natural algorithm which also runs in $n^{O(k)}$ time). Instead, we show that SGD gradually amplifies the sparse solution via a Fourier gap in the population gradient, making continual progress that is invisible to loss and error metrics.

preprint2023arXiv

Named Tensor Notation

We propose a notation for tensors with named axes, which relieves the author, reader, and future implementers of machine learning models from the burden of keeping track of the order of axes and the purpose of each. The notation makes it easy to lift operations on low-order tensors to higher order ones, for example, from images to minibatches of images, or from an attention mechanism to multiple attention heads. After a brief overview and formal definition of the notation, we illustrate it through several examples from modern machine learning, from building blocks like attention and convolution to full models like Transformers and LeNet. We then discuss differential calculus in our notation and compare with some alternative notations. Our proposals build on ideas from many previous papers and software libraries. We hope that our notation will encourage more authors to use named tensors, resulting in clearer papers and more precise implementations.

preprint2022arXiv

Deconstructing Distributions: A Pointwise Framework of Learning

In machine learning, we traditionally evaluate the performance of a single model, averaged over a collection of test inputs. In this work, we propose a new approach: we measure the performance of a collection of models when evaluated on a $\textit{single input point}$. Specifically, we study a point's $\textit{profile}$: the relationship between models' average performance on the test distribution and their pointwise performance on this individual point. We find that profiles can yield new insights into the structure of both models and data -- in and out-of-distribution. For example, we empirically show that real data distributions consist of points with qualitatively different profiles. On one hand, there are "compatible" points with strong correlation between the pointwise and average performance. On the other hand, there are points with weak and even $\textit{negative}$ correlation: cases where improving overall model accuracy actually $\textit{hurts}$ performance on these inputs. We prove that these experimental observations are inconsistent with the predictions of several simplified models of learning proposed in prior work. As an application, we use profiles to construct a dataset we call CIFAR-10-NEG: a subset of CINIC-10 such that for standard models, accuracy on CIFAR-10-NEG is $\textit{negatively correlated}$ with accuracy on CIFAR-10 test. This illustrates, for the first time, an OOD dataset that completely inverts "accuracy-on-the-line" (Miller, Taori, Raghunathan, Sagawa, Koh, Shankar, Liang, Carmon, and Schmidt 2021)

preprint2022arXiv

Quantum Optimization of Maximum Independent Set using Rydberg Atom Arrays

Realizing quantum speedup for practically relevant, computationally hard problems is a central challenge in quantum information science. Using Rydberg atom arrays with up to 289 qubits in two spatial dimensions, we experimentally investigate quantum algorithms for solving the Maximum Independent Set problem. We use a hardware-efficient encoding associated with Rydberg blockade, realize closed-loop optimization to test several variational algorithms, and subsequently apply them to systematically explore a class of graphs with programmable connectivity. We find the problem hardness is controlled by the solution degeneracy and number of local minima, and experimentally benchmark the quantum algorithm's performance against classical simulated annealing. On the hardest graphs, we observe a superlinear quantum speedup in finding exact solutions in the deep circuit regime and analyze its origins.

preprint2021arXiv

Classical algorithms and quantum limitations for maximum cut on high-girth graphs

We study the performance of local quantum algorithms such as the Quantum Approximate Optimization Algorithm (QAOA) for the maximum cut problem, and their relationship to that of classical algorithms. (1) We prove that every (quantum or classical) one-local algorithm achieves on $D$-regular graphs of girth $> 5$ a maximum cut of at most $1/2 + C/\sqrt{D}$ for $C=1/\sqrt{2} \approx 0.7071$. This is the first such result showing that one-local algorithms achieve a value bounded away from the true optimum for random graphs, which is $1/2 + P_*/\sqrt{D} + o(1/\sqrt{D})$ for $P_* \approx 0.7632$. (2) We show that there is a classical $k$-local algorithm that achieves a value of $1/2 + C/\sqrt{D} - O(1/\sqrt{k})$ for $D$-regular graphs of girth $> 2k+1$, where $C = 2/π\approx 0.6366$. This is an algorithmic version of the existential bound of Lyons and is related to the algorithm of Aizenman, Lebowitz, and Ruelle (ALR) for the Sherrington-Kirkpatrick model. This bound is better than that achieved by the one-local and two-local versions of QAOA on high-girth graphs. (3) Through computational experiments, we give evidence that the ALR algorithm achieves better performance than constant-locality QAOA for random $D$-regular graphs, as well as other natural instances, including graphs that do have short cycles. Our experimental work suggests that it could be possible to extend beyond our theoretical constraints. This points at the tantalizing possibility that $O(1)$-local quantum maximum-cut algorithms might be *pointwise dominated* by polynomial-time classical algorithms, in the sense that there is a classical algorithm outputting cuts of equal or better quality *on every possible instance*. This is in contrast to the evidence that polynomial-time algorithms cannot simulate the probability distributions induced by local quantum algorithms.

preprint2021arXiv

Optimizing testing policies for detecting COVID-19 outbreaks

The COVID-19 pandemic poses challenges for continuing economic activity while reducing health risks. While these challenges can be mitigated through testing, testing budget is often limited. Here we study how institutions, such as nursing homes, should utilize a fixed test budget for early detection of an outbreak. Using an extended network-SEIR model, we show that given a certain budget of tests, it is generally better to test smaller subgroups of the population frequently than to test larger groups but less frequently. The numerical results are consistent with an analytical expression we derive for the size of the outbreak at detection in an exponential spread model. Our work provides a simple guideline for institutions: distribute your total tests over several batches instead of using them all at once. We expect that in the appropriate scenarios, this easy-to-implement policy recommendation will lead to earlier detection and better mitigation of local COVID-19 outbreaks.

preprint2020arXiv

On Higher-Order Cryptography (Long Version)

Type-two constructions abound in cryptography: adversaries for encryption and authentication schemes, if active, are modeled as algorithms having access to oracles, i.e. as second-order algorithms. But how about making cryptographic schemes themselves higher-order? This paper gives an answer to this question, by first describing why higher-order cryptography is interesting as an object of study, then showing how the concept of probabilistic polynomial time algorithm can be generalized so as to encompass algorithms of order strictly higher than two, and finally proving some positive and negative results about the existence of higher-order cryptographic primitives, namely authentication schemes and pseudorandom functions.

preprint2020arXiv

Spoofing Linear Cross-Entropy Benchmarking in Shallow Quantum Circuits

The linear cross-entropy benchmark (Linear XEB) has been used as a test for procedures simulating quantum circuits. Given a quantum circuit $C$ with $n$ inputs and outputs and purported simulator whose output is distributed according to a distribution $p$ over $\{0,1\}^n$, the linear XEB fidelity of the simulator is $\mathcal{F}_{C}(p) = 2^n \mathbb{E}_{x \sim p} q_C(x) -1$ where $q_C(x)$ is the probability that $x$ is output from the distribution $C|0^n\rangle$. A trivial simulator (e.g., the uniform distribution) satisfies $\mathcal{F}_C(p)=0$, while Google's noisy quantum simulation of a 53 qubit circuit $C$ achieved a fidelity value of $(2.24\pm0.21)\times10^{-3}$ (Arute et. al., Nature'19). In this work we give a classical randomized algorithm that for a given circuit $C$ of depth $d$ with Haar random 2-qubit gates achieves in expectation a fidelity value of $Ω(\tfrac{n}{L} \cdot 15^{-d})$ in running time $\textsf{poly}(n,2^L)$. Here $L$ is the size of the \emph{light cone} of $C$: the maximum number of input bits that each output bit depends on. In particular, we obtain a polynomial-time algorithm that achieves large fidelity of $ω(1)$ for depth $O(\sqrt{\log n})$ two-dimensional circuits. To our knowledge, this is the first such result for two dimensional circuits of super-constant depth. Our results can be considered as an evidence that fooling the linear XEB test might be easier than achieving a full simulation of the quantum circuit.