Source author record

Daniel M. Kane

Daniel M. Kane appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

50works

22topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Rigorous Implications of the Low-Degree Heuristic

Over the past decade, the low-degree heuristic has been used to estimate the algorithmic thresholds for a wide range of average-case planted vs null distinguishing problems. Such results rely on the hypothesis that if the low-degree moments of the planted and null distributions are sufficiently close, then no efficient (noise-tolerant) algorithm can distinguish between them. This hypothesis is appealing due to the simplicity of calculating the low-degree likelihood ratio (LDLR) -- a quantity that measures the similarity between low-degree moments. However, despite sustained interest in the area, it remains unclear whether low-degree indistinguishability actually rules out any interesting class of algorithms. In this work, we initiate the study and develop technical tools for translating LDLR upper bounds to rigorous lower bounds against concrete algorithms. As a consequence, we prove: for any permutation-invariant distribution $\mathsf{P}$, 1. If $\mathsf{P}$ is over $\{0,1\}^n$ and is low-degree indistinguishable from $U = \mathrm{Unif}(\{0,1\}^n)$, then a noisy version of $\mathsf{P}$ is statistically indistinguishable from $U$. 2. If $\mathsf{P}$ is over $\mathbb{R}^n$ and is low-degree indistinguishable from the standard Gaussian ${N}(0, 1)^n$, then no statistic based on symmetric polynomials of degree at most $O(\log n/\log \log n)$ can distinguish between a noisy version of $\mathsf{P}$ from ${N}(0, 1)^n$. 3. If $\mathsf{P}$ is over $\mathbb{R}^{n\times n}$ and is low-degree indistinguishable from ${N}(0,1)^{n\times n}$, then no constant-sized subgraph statistic can distinguish between a noisy version of $\mathsf{P}$ and ${N}(0, 1)^{n\times n}$.

preprint2022arXiv

Cryptographic Hardness of Learning Halfspaces with Massart Noise

We study the complexity of PAC learning halfspaces in the presence of Massart noise. In this problem, we are given i.i.d. labeled examples $(\mathbf{x}, y) \in \mathbb{R}^N \times \{ \pm 1\}$, where the distribution of $\mathbf{x}$ is arbitrary and the label $y$ is a Massart corruption of $f(\mathbf{x})$, for an unknown halfspace $f: \mathbb{R}^N \to \{ \pm 1\}$, with flipping probability $η(\mathbf{x}) \leq η< 1/2$. The goal of the learner is to compute a hypothesis with small 0-1 error. Our main result is the first computational hardness result for this learning problem. Specifically, assuming the (widely believed) subexponential-time hardness of the Learning with Errors (LWE) problem, we show that no polynomial-time Massart halfspace learner can achieve error better than $Ω(η)$, even if the optimal 0-1 error is small, namely $\mathrm{OPT} = 2^{-\log^{c} (N)}$ for any universal constant $c \in (0, 1)$. Prior work had provided qualitatively similar evidence of hardness in the Statistical Query model. Our computational hardness result essentially resolves the polynomial PAC learnability of Massart halfspaces, by showing that known efficient learning algorithms for the problem are nearly best possible.

preprint2022arXiv

Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde Θ(\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.

preprint2022arXiv

Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models

We establish optimal Statistical Query (SQ) lower bounds for robustly learning certain families of discrete high-dimensional distributions. In particular, we show that no efficient SQ algorithm with access to an $ε$-corrupted binary product distribution can learn its mean within $\ell_2$-error $o(ε\sqrt{\log(1/ε)})$. Similarly, we show that no efficient SQ algorithm with access to an $ε$-corrupted ferromagnetic high-temperature Ising model can learn the model to total variation distance $o(ε\log(1/ε))$. Our SQ lower bounds match the error guarantees of known algorithms for these problems, providing evidence that current upper bounds for these tasks are best possible. At the technical level, we develop a generic SQ lower bound for discrete high-dimensional distributions starting from low dimensional moment matching constructions that we believe will find other applications. Additionally, we introduce new ideas to analyze these moment-matching constructions for discrete univariate distributions.

preprint2021arXiv

Agnostic Proper Learning of Halfspaces under Gaussian Marginals

We study the problem of agnostically learning halfspaces under the Gaussian distribution. Our main result is the {\em first proper} learning algorithm for this problem whose sample complexity and computational complexity qualitatively match those of the best known improper agnostic learner. Building on this result, we also obtain the first proper polynomial-time approximation scheme (PTAS) for agnostically learning homogeneous halfspaces. Our techniques naturally extend to agnostically learning linear models with respect to other non-linear activations, yielding in particular the first proper agnostic algorithm for ReLU regression.

preprint2021arXiv

Outlier-Robust Learning of Ising Models Under Dobrushin's Condition

We study the problem of learning Ising models satisfying Dobrushin's condition in the outlier-robust setting where a constant fraction of the samples are adversarially corrupted. Our main result is to provide the first computationally efficient robust learning algorithm for this problem with near-optimal error guarantees. Our algorithm can be seen as a special case of an algorithm for robustly learning a distribution from a general exponential family. To prove its correctness for Ising models, we establish new anti-concentration results for degree-$2$ polynomials of Ising models that may be of independent interest.

preprint2021arXiv

The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals

We study the problem of agnostic learning under the Gaussian distribution. We develop a method for finding hard families of examples for a wide class of problems by using LP duality. For Boolean-valued concept classes, we show that the $L^1$-regression algorithm is essentially best possible, and therefore that the computational difficulty of agnostically learning a concept class is closely related to the polynomial degree required to approximate any function from the class in $L^1$-norm. Using this characterization along with additional analytic tools, we obtain optimal SQ lower bounds for agnostically learning linear threshold functions and the first non-trivial SQ lower bounds for polynomial threshold functions and intersections of halfspaces. We also develop an analogous theory for agnostically learning real-valued functions, and as an application prove near-optimal SQ lower bounds for agnostically learning ReLUs and sigmoids.

preprint2020arXiv

Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks

We study the problem of PAC learning one-hidden-layer ReLU networks with $k$ hidden units on $\mathbb{R}^d$ under Gaussian marginals in the presence of additive label noise. For the case of positive coefficients, we give the first polynomial-time algorithm for this learning problem for $k$ up to $\tilde{O}(\sqrt{\log d})$. Previously, no polynomial time algorithm was known, even for $k=3$. This answers an open question posed by~\cite{Kliv17}. Importantly, our algorithm does not require any assumptions about the rank of the weight matrix and its complexity is independent of its condition number. On the negative side, for the more general task of PAC learning one-hidden-layer ReLU networks with arbitrary real coefficients, we prove a Statistical Query lower bound of $d^{Ω(k)}$. Thus, we provide a separation between the two classes in terms of efficient learnability. Our upper and lower bounds are general, extending to broader families of activation functions.

preprint2020arXiv

List-Decodable Mean Estimation via Iterative Multi-Filtering

We study the problem of {\em list-decodable mean estimation} for bounded covariance distributions. Specifically, we are given a set $T$ of points in $\mathbb{R}^d$ with the promise that an unknown $α$-fraction of points in $T$, where $0< α< 1/2$, are drawn from an unknown mean and bounded covariance distribution $D$, and no assumptions are made on the remaining points. The goal is to output a small list of hypothesis vectors such that at least one of them is close to the mean of $D$. We give the first practically viable estimator for this problem. In more detail, our algorithm is sample and computationally efficient, and achieves information-theoretically near-optimal error. While the only prior algorithm for this setting inherently relied on the ellipsoid method, our algorithm is iterative and only uses spectral techniques. Our main technical innovation is the design of a soft outlier removal procedure for high-dimensional heavy-tailed datasets with a majority of outliers.

preprint2020arXiv

Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

We study the fundamental problems of agnostically learning halfspaces and ReLUs under Gaussian marginals. In the former problem, given labeled examples $(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times \{ \pm 1\}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with 0-1 loss $\mathrm{OPT}+ε$, where $\mathrm{OPT}$ is the 0-1 loss of the best-fitting halfspace. In the latter problem, given labeled examples $(\mathbf{x}, y)$ from an unknown distribution on $\mathbb{R}^d \times \mathbb{R}$, whose marginal distribution on $\mathbf{x}$ is the standard Gaussian and the labels $y$ can be arbitrary, the goal is to output a hypothesis with square loss $\mathrm{OPT}+ε$, where $\mathrm{OPT}$ is the square loss of the best-fitting ReLU. We prove Statistical Query (SQ) lower bounds of $d^{\mathrm{poly}(1/ε)}$ for both of these problems. Our SQ lower bounds provide strong evidence that current upper bounds for these tasks are essentially best possible.

preprint2020arXiv

Optimal Testing of Discrete Distributions with High Probability

We study the problem of testing discrete distributions with a focus on the high probability regime. Specifically, given samples from one or more discrete distributions, a property $\mathcal{P}$, and parameters $0< ε, δ<1$, we want to distinguish {\em with probability at least $1-δ$} whether these distributions satisfy $\mathcal{P}$ or are $ε$-far from $\mathcal{P}$ in total variation distance. Most prior work in distribution testing studied the constant confidence case (corresponding to $δ= Ω(1)$), and provided sample-optimal testers for a range of properties. While one can always boost the confidence probability of any such tester by black-box amplification, this generic boosting method typically leads to sub-optimal sample bounds. Here we study the following broad question: For a given property $\mathcal{P}$, can we {\em characterize} the sample complexity of testing $\mathcal{P}$ as a function of all relevant problem parameters, including the error probability $δ$? Prior to this work, uniformity testing was the only statistical task whose sample complexity had been characterized in this setting. As our main results, we provide the first algorithms for closeness and independence testing that are sample-optimal, within constant factors, as a function of all relevant parameters. We also show matching information-theoretic lower bounds on the sample complexity of these problems. Our techniques naturally extend to give optimal testers for related problems. To illustrate the generality of our methods, we give optimal algorithms for testing collections of distributions and testing closeness with unequal sized samples.

preprint2020arXiv

Point Location and Active Learning: Learning Halfspaces Almost Optimally

Given a finite set $X \subset \mathbb{R}^d$ and a binary linear classifier $c: \mathbb{R}^d \to \{0,1\}$, how many queries of the form $c(x)$ are required to learn the label of every point in $X$? Known as \textit{point location}, this problem has inspired over 35 years of research in the pursuit of an optimal algorithm. Building on the prior work of Kane, Lovett, and Moran (ICALP 2018), we provide the first nearly optimal solution, a randomized linear decision tree of depth $\tilde{O}(d\log(|X|))$, improving on the previous best of $\tilde{O}(d^2\log(|X|))$ from Ezra and Sharir (Discrete and Computational Geometry, 2019). As a corollary, we also provide the first nearly optimal algorithm for actively learning halfspaces in the membership query model. En route to these results, we prove a novel characterization of Barthe's Theorem (Inventiones Mathematicae, 1998) of independent interest. In particular, we show that $X$ may be transformed into approximate isotropic position if and only if there exists no $k$-dimensional subspace with more than a $k/d$-fraction of $X$, and provide a similar characterization for exact isotropic position.

preprint2020arXiv

Prisoners, Rooms, and Lightswitches

We examine a new variant of the classic prisoners and lightswitches puzzle: A warden leads his $n$ prisoners in and out of $r$ rooms, one at a time, in some order, with each prisoner eventually visiting every room an arbitrarily large number of times. The rooms are indistinguishable, except that each one has $s$ lightswitches; the prisoners win their freedom if at some point a prisoner can correctly declare that each prisoner has been in every room at least once. What is the minimum number of switches per room, $s$, such that the prisoners can manage this? We show that if the prisoners do not know the switches' starting configuration, then they have no chance of escape -- but if the prisoners do know the starting configuration, then the minimum sufficient $s$ is surprisingly small. The analysis gives rise to a number of puzzling open questions, as well.

preprint2020arXiv

Robust Learning of Mixtures of Gaussians

We resolve one of the major outstanding problems in robust statistics. In particular, if $X$ is an evenly weighted mixture of two arbitrary $d$-dimensional Gaussians, we devise a polynomial time algorithm that given access to samples from $X$ an $\eps$-fraction of which have been adversarially corrupted, learns $X$ to error $\poly(\eps)$ in total variation distance.

preprint2020arXiv

The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

We study the computational complexity of adversarially robust proper learning of halfspaces in the distribution-independent agnostic PAC model, with a focus on $L_p$ perturbations. We give a computationally efficient learning algorithm and a nearly matching computational hardness result for this problem. An interesting implication of our findings is that the $L_{\infty}$ perturbations case is provably computationally harder than the case $2 \leq p < \infty$.

preprint2020arXiv

The Power of Comparisons for Actively Learning Linear Classifiers

In the world of big data, large but costly to label datasets dominate many fields. Active learning, a semi-supervised alternative to the standard PAC-learning model, was introduced to explore whether adaptive labeling could learn concepts with exponentially fewer labeled samples. While previous results show that active learning performs no better than its supervised alternative for important concept classes such as linear separators, we show that by adding weak distributional assumptions and allowing comparison queries, active learning requires exponentially fewer samples. Further, we show that these results hold as well for a stronger model of learning called Reliable and Probably Useful (RPU) learning. In this model, our learner is not allowed to make mistakes, but may instead answer "I don't know." While previous negative results showed this model to have intractably large sample complexity for label queries, we show that comparison queries make RPU-learning at worst logarithmically more expensive in both the passive and active regimes.

preprint2020arXiv

The Sample Complexity of Robust Covariance Testing

We study the problem of testing the covariance matrix of a high-dimensional Gaussian in a robust setting, where the input distribution has been corrupted in Huber's contamination model. Specifically, we are given i.i.d. samples from a distribution of the form $Z = (1-ε) X + εB$, where $X$ is a zero-mean and unknown covariance Gaussian $\mathcal{N}(0, Σ)$, $B$ is a fixed but unknown noise distribution, and $ε>0$ is an arbitrarily small constant representing the proportion of contamination. We want to distinguish between the cases that $Σ$ is the identity matrix versus $γ$-far from the identity in Frobenius norm. In the absence of contamination, prior work gave a simple tester for this hypothesis testing task that uses $O(d)$ samples. Moreover, this sample upper bound was shown to be best possible, within constant factors. Our main result is that the sample complexity of covariance testing dramatically increases in the contaminated setting. In particular, we prove a sample complexity lower bound of $Ω(d^2)$ for $ε$ an arbitrarily small constant and $γ= 1/2$. This lower bound is best possible, as $O(d^2)$ samples suffice to even robustly {\em learn} the covariance. The conceptual implication of our result is that, for the natural setting we consider, robust hypothesis testing is at least as hard as robust estimation.

preprint2016arXiv

A New Approach for Testing Properties of Discrete Distributions

In this work, we give a novel general approach for distribution testing. We describe two techniques: our first technique gives sample-optimal testers, while our second technique gives matching sample lower bounds. As a consequence, we resolve the sample complexity of a wide variety of testing problems. Our upper bounds are obtained via a modular reduction-based approach. Our approach yields optimal testers for numerous problems by using a standard $\ell_2$-identity tester as a black-box. Using this recipe, we obtain simple estimators for a wide range of problems, encompassing most problems previously studied in the TCS literature, namely: (1) identity testing to a fixed distribution, (2) closeness testing between two unknown distributions (with equal/unequal sample sizes), (3) independence testing (in any number of dimensions), (4) closeness testing for collections of distributions, and (5) testing histograms. For all of these problems, our testers are sample-optimal, up to constant factors. With the exception of (1), ours are the {\em first sample-optimal testers for the corresponding problems.} Moreover, our estimators are significantly simpler to state and analyze compared to previous results. As an application of our reduction-based technique, we obtain the first {\em nearly instance-optimal} algorithm for testing equivalence between two {\em unknown} distributions. Moreover, our technique naturally generalizes to other metrics beyond the $\ell_1$-distance. Our lower bounds are obtained via a direct information-theoretic approach: Given a candidate hard instance, our proof proceeds by bounding the mutual information between appropriate random variables. While this is a classical method in information theory, prior to our work, it had not been used in distribution property testing.

preprint2016arXiv

Fourier-sparse interpolation without a frequency gap

We consider the problem of estimating a Fourier-sparse signal from noisy samples, where the sampling is done over some interval $[0, T]$ and the frequencies can be "off-grid". Previous methods for this problem required the gap between frequencies to be above 1/T, the threshold required to robustly identify individual frequencies. We show the frequency gap is not necessary to estimate the signal as a whole: for arbitrary $k$-Fourier-sparse signals under $\ell_2$ bounded noise, we show how to estimate the signal with a constant factor growth of the noise and sample complexity polynomial in $k$ and logarithmic in the bandwidth and signal-to-noise ratio. As a special case, we get an algorithm to interpolate degree $d$ polynomials from noisy measurements, using $O(d)$ samples and increasing the noise by a constant factor in $\ell_2$.

preprint2016arXiv

Minimal models of compact symplectic semitoric manifolds

A symplectic semitoric manifold is a symplectic $4$-manifold endowed with a Hamiltonian $(S^1 \times \mathbb{R})$-action satisfying certain conditions. The goal of this paper is to construct a new symplectic invariant of symplectic semitoric manifolds, the helix, and give applications. The helix is a symplectic analogue of the fan of a nonsingular complete toric variety in algebraic geometry, that takes into account the effects of the monodromy near focus-focus singularities. We give two applications of the helix: first, we use it to give a classification of the minimal models of symplectic semitoric manifolds, where "minimal" is in the sense of not admitting any blowdowns. The second application is an extension to the compact case of a well known result of Vũ Ngoc about the constraints posed on a symplectic semitoric manifold by the existence of focus-focus singularities. The helix permits to translate a symplectic geometric problem into an algebraic problem, and the paper describes a method to solve this type of algebraic problem.

preprint2016arXiv

The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications

An $(n, k)$-Poisson Multinomial Distribution (PMD) is a random variable of the form $X = \sum_{i=1}^n X_i$, where the $X_i$'s are independent random vectors supported on the set of standard basis vectors in $\mathbb{R}^k.$ In this paper, we obtain a refined structural understanding of PMDs by analyzing their Fourier transform. As our core structural result, we prove that the Fourier transform of PMDs is {\em approximately sparse}, i.e., roughly speaking, its $L_1$-norm is small outside a small set. By building on this result, we obtain the following applications: {\bf Learning Theory.} We design the first computationally efficient learning algorithm for PMDs with respect to the total variation distance. Our algorithm learns an arbitrary $(n, k)$-PMD within variation distance $ε$ using a near-optimal sample size of $\widetilde{O}_k(1/ε^2),$ and runs in time $\widetilde{O}_k(1/ε^2) \cdot \log n.$ Previously, no algorithm with a $\mathrm{poly}(1/ε)$ runtime was known, even for $k=3.$ {\bf Game Theory.} We give the first efficient polynomial-time approximation scheme (EPTAS) for computing Nash equilibria in anonymous games. For normalized anonymous games with $n$ players and $k$ strategies, our algorithm computes a well-supported $ε$-Nash equilibrium in time $n^{O(k^3)} \cdot (k/ε)^{O(k^3\log(k/ε)/\log\log(k/ε))^{k-1}}.$ The best previous algorithm for this problem had running time $n^{(f(k)/ε)^k},$ where $f(k) = Ω(k^{k^2})$, for any $k>2.$ {\bf Statistics.} We prove a multivariate central limit theorem (CLT) that relates an arbitrary PMD to a discretized multivariate Gaussian with the same mean and covariance, in total variation distance. Our new CLT strengthens the CLT of Valiant and Valiant by completely removing the dependence on $n$ in the error bound.

preprint2015arXiv

Central Limit Theorems for some Set Partition Statistics

We prove the conjectured limiting normality for the number of crossings of a uniformly chosen set partition of [n] = {1,2,...,n}. The arguments use a novel stochastic representation and are also used to prove central limit theorems for the dimension index and the number of levels.

preprint2015arXiv

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

We give a general unified method that can be used for $L_1$ {\em closeness testing} of a wide range of univariate structured distribution families. More specifically, we design a sample optimal and computationally efficient algorithm for testing the equivalence of two unknown (potentially arbitrary) univariate distributions under the $\mathcal{A}_k$-distance metric: Given sample access to distributions with density functions $p, q: I \to \mathbb{R}$, we want to distinguish between the cases that $p=q$ and $\|p-q\|_{\mathcal{A}_k} \ge ε$ with probability at least $2/3$. We show that for any $k \ge 2, ε>0$, the {\em optimal} sample complexity of the $\mathcal{A}_k$-closeness testing problem is $Θ(\max\{ k^{4/5}/ε^{6/5}, k^{1/2}/ε^2 \})$. This is the first $o(k)$ sample algorithm for this problem, and yields new, simple $L_1$ closeness testers, in most cases with optimal sample complexity, for broad classes of structured distributions.

preprint2015arXiv

Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

We study the structure and learnability of sums of independent integer random variables (SIIRVs). For $k \in \mathbb{Z}_{+}$, a $k$-SIIRV of order $n \in \mathbb{Z}_{+}$ is the probability distribution of the sum of $n$ independent random variables each supported on $\{0, 1, \dots, k-1\}$. We denote by ${\cal S}_{n,k}$ the set of all $k$-SIIRVs of order $n$. In this paper, we tightly characterize the sample and computational complexity of learning $k$-SIIRVs. More precisely, we design a computationally efficient algorithm that uses $\widetilde{O}(k/ε^2)$ samples, and learns an arbitrary $k$-SIIRV within error $ε,$ in total variation distance. Moreover, we show that the {\em optimal} sample complexity of this learning problem is $Θ((k/ε^2)\sqrt{\log(1/ε)}).$ Our algorithm proceeds by learning the Fourier transform of the target $k$-SIIRV in its effective support. Its correctness relies on the {\em approximate sparsity} of the Fourier transform of $k$-SIIRVs -- a structural property that we establish, roughly stating that the Fourier transform of $k$-SIIRVs has small magnitude outside a small set. Along the way we prove several new structural results about $k$-SIIRVs. As one of our main structural contributions, we give an efficient algorithm to construct a sparse {\em proper} $ε$-cover for ${\cal S}_{n,k},$ in total variation distance. We also obtain a novel geometric characterization of the space of $k$-SIIRVs. Our characterization allows us to prove a tight lower bound on the size of $ε$-covers for ${\cal S}_{n,k}$, and is the key ingredient in our tight sample complexity lower bound. Our approach of exploiting the sparsity of the Fourier transform in distribution learning is general, and has recently found additional applications.

preprint2015arXiv

Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

We give an algorithm for properly learning Poisson binomial distributions. A Poisson binomial distribution (PBD) of order $n$ is the discrete probability distribution of the sum of $n$ mutually independent Bernoulli random variables. Given $\widetilde{O}(1/ε^2)$ samples from an unknown PBD $\mathbf{p}$, our algorithm runs in time $(1/ε)^{O(\log \log (1/ε))}$, and outputs a hypothesis PBD that is $ε$-close to $\mathbf{p}$ in total variation distance. The previously best known running time for properly learning PBDs was $(1/ε)^{O(\log(1/ε))}$. As one of our main contributions, we provide a novel structural characterization of PBDs. We prove that, for all $ε>0,$ there exists an explicit collection $\cal{M}$ of $(1/ε)^{O(\log \log (1/ε))}$ vectors of multiplicities, such that for any PBD $\mathbf{p}$ there exists a PBD $\mathbf{q}$ with $O(\log(1/ε))$ distinct parameters whose multiplicities are given by some element of ${\cal M}$, such that $\mathbf{q}$ is $ε$-close to $\mathbf{p}$. Our proof combines tools from Fourier analysis and algebraic geometry. Our approach to the proper learning problem is as follows: Starting with an accurate non-proper hypothesis, we fit a PBD to this hypothesis. More specifically, we essentially start with the hypothesis computed by the computationally efficient non-proper learning algorithm in our recent work~\cite{DKS15}. Our aforementioned structural characterization allows us to reduce the corresponding fitting problem to a collection of $(1/ε)^{O(\log \log(1/ε))}$ systems of low-degree polynomial inequalities. We show that each such system can be solved in time $(1/ε)^{O(\log \log(1/ε))}$, which yields the overall running time of our algorithm.

preprint2015arXiv

The Average Sensitivity of an Intersection of Half Spaces

We prove new bounds on the average sensitivity of the indicator function of an intersection of $k$ halfspaces. In particular, we prove the optimal bound of $O(\sqrt{n\log(k)})$. This generalizes a result of Nazarov, who proved the analogous result in the Gaussian case, and improves upon a result of Harsha, Klivans and Meka. Furthermore, our result has implications for the runtime required to learn intersections of halfspaces.

preprint2014arXiv

A Polylogarithmic PRG for Degree $2$ Threshold Functions in the Gaussian Setting

We devise a new pseudorandom generator against degree 2 polynomial threshold functions in the Gaussian setting. We manage to achieve $ε$ error with seed length polylogarithmic in $ε$ and the dimension, and exponential improvement over previous constructions.

preprint2014arXiv

Asymptotic Improvements of Lower Bounds for the Least Common Multiples of Arithmetic Progressions

For relatively prime positive integers $u_0$ and $r$, we consider the least common multiple $L_n:=\mathrm{lcm}(u_0,u_1,\ldots, u_n)$ of the finite arithmetic progression $\{u_k:=u_0+kr\}_{k=0}^n$. We derive new lower bounds on $L_n$ which improve upon those obtained previously when either $u_0$ or $n$ is large. When $r$ is prime, our best bound is sharp up to a factor of $n+1$ for $u_0$ properly chosen, and is also nearly sharp as $n\to\infty$.

preprint2014arXiv

On the Number of ABC Solutions with Restricted Radical Sizes

We consider a variant of the ABC Conjecture, attempting to count the number of solutions to $A+B+C=0$, in relatively prime integers $A,B,C$ each of absolute value less than $N$ with $r(A)<|A|^a, r(B)<|B|^b, r(C)<|C|^c.$ The ABC Conjecture is equivalent to the statement that for $a+b+c<1$, the number of solutions is bounded independently of $N$. If $a+b+c \geq 1$, it is conjectured that the number of solutions is asymptotically $N^{a+b+c-1 \pm ε}.$ We prove this conjecture as long as $a+b+c \geq 2.$

preprint2014arXiv

Sparser Johnson-Lindenstrauss Transforms

We give two different and simple constructions for dimensionality reduction in $\ell_2$ via linear mappings that are sparse: only an $O(\varepsilon)$-fraction of entries in each column of our embedding matrices are non-zero to achieve distortion $1+\varepsilon$ with high probability, while still achieving the asymptotically optimal number of rows. These are the first constructions to provide subconstant sparsity for all values of parameters, improving upon previous works of Achlioptas (JCSS 2003) and Dasgupta, Kumar, and Sarlós (STOC 2010). Such distributions can be used to speed up applications where $\ell_2$ dimensionality reduction is used.

preprint2014arXiv

Testing Identity of Structured Distributions

We study the question of identity testing for structured distributions. More precisely, given samples from a {\em structured} distribution $q$ over $[n]$ and an explicit distribution $p$ over $[n]$, we wish to distinguish whether $q=p$ versus $q$ is at least $ε$-far from $p$, in $L_1$ distance. In this work, we present a unified approach that yields new, simple testers, with sample complexity that is information-theoretically optimal, for broad classes of structured distributions, including $t$-flat distributions, $t$-modal distributions, log-concave distributions, monotone hazard rate (MHR) distributions, and mixtures thereof.

preprint2013arXiv

A Low-Depth Monotone Function that is not an Approximate Junta

We provide an example of a monotone Boolean function on the hypercube given by a low depth decision tree that is not well approximated by any k-junta for small k.

preprint2013arXiv

Closed expressions for averages of set partition statistics

In studying the enumerative theory of super characters' of the group of upper triangular matrices over a finite field we found that the moments (mean, variance and higher moments) of novel statistics on set partitions have simple closed expressions as linear combinations of shifted bell numbers. It is shown here that families of other statistics have similar moments. The coefficients in the linear combinations are polynomials in $n$. This allows exact enumeration of the moments for small $n$ to determine exact formulae for all $n$.

preprint2013arXiv

On the Crossing Number of Complete Graphs with an Uncrossed Hamiltonian Cycle

We prove new lower bounds on the crossing number of a complete graphs assuming that it is drawn in such a way that it contains a Hamiltonian cycle with no crossings.

preprint2013arXiv

On the Ranks of the 2-Selmer Groups of Twists of a Given Elliptic Curve

We extend work of Swinnerton-Dyer on the density of the number of twists of a given elliptic curve that have 2-Selmer group of a particular rank.

preprint2012arXiv

A proof of Andrews' conjecture on Partitions with no short sequences

Holroyd, Liggett, and Romik introduced the following probability model. Let $C_1, C_2,...$ be independent events with probabilities $¶_s(C_n)= 1-e^{-ns}$ under a probability measure $¶_s$ with $0<s<1$. Let $A_k$ be the event that there is no sequence of $k$ consecutive $C_i$ that do not occur. We given an asymptotic for $¶_s(A_k)$ with a relative error term that goes to 0 as $s\to 0$. This establishes a conjecture of Andrews.

preprint2012arXiv

A Pseudorandom Generator for Polynomial Threshold Functions of Gaussian with Subpolynomial Seed Length

We develop a pseudorandom generator that fools degree-$d$ polynomial threshold functions in $n$ variables with respect to the Gaussian distribution and has seed length $O_{c,d}(\log(n) ε^{-c})$.

preprint2012arXiv

A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions

We prove a structural result for degree-$d$ polynomials. In particular, we show that any degree-$d$ polynomial, $p$ can be approximated by another polynomial, $p_0$, which can be decomposed as some function of polynomials $q_1,...,q_m$ with $q_i$ normalized and $m=O_d(1)$, so that if $X$ is a Gaussian random variable, the probability distribution on $(q_1(X),...,q_m(X))$ does not have too much mass in any small box. Using this result, we prove improved versions of a number of results about polynomial threshold functions, including producing better pseudorandom generators, obtaining a better invariance principle, and proving improved bounds on noise sensitivity.

preprint2012arXiv

An Asymptotic for the Number of Solutions to Linear Equations in Prime Numbers from Specified Chebotarev Classes

We extend known results on the number of solutions to a linear equation in at least three prime numbers when the primes involved are required to lie in specified Chebotarev classes. We prove asymptotic results similar to previous ones only now taking into account corrections coming form the Chebotarev Density Theorem and Global Class Field Theory. We then apply these results to find elliptic curves whose discriminants split completely of a given number field.

preprint2012arXiv

Small Designs for Path Connected Spaces and Path Connected Homogeneous Spaces

We prove the existence of designs of small size in a number of contexts. In particular our techniques can be applied to prove the existence of $n$-designs on $S^{d}$ of size $O_d(n^{d}\log(n)^{d-1})$.

preprint2012arXiv

The Correct Exponent for the Gotsman-Linial Conjecture

We prove a new bound on the average sensitivity of polynomial threshold functions. In particular we show that a polynomial threshold function of degree $d$ in at most $n$ variables has average sensitivity at most $\sqrt{n}(\log(n))^{O(d\log(d))}2^{O(d^2\log(d)}$. For fixed $d$ the exponent in terms of $n$ in this bound is known to be optimal. This bound makes significant progress towards the Gotsman-Linial Conjecture which would put the correct bound at $Θ(d\sqrt{n})$.

preprint2011arXiv

$k$-Independent Gaussians Fool Polynomial Threshold Functions

We show that any $O_d(ε^{-4d 7^d})$-independent family of Gaussians $ε$-fools any degree-$d$ polynomial threshold function.

preprint2011arXiv

A Small PRG for Polynomial Threshold Functions of Gaussians

We develop a pseudo-random generator to fool degree-$d$ polynomial threshold functions with respect to the Gaussian distribution. For $c>0$ any constant, we construct a pseudo-random generator that fools such functions to within $ε$ and has seed length $\log(n) 2^{O(d)} ε^{-4-c}$.

preprint2011arXiv

Minimal S-universality criteria may vary in size

In this note, we give simple examples of sets S of quadratic forms that have minimal S-universality criteria of multiple cardinalities. This answers a question of Kim, Kim, and Oh in the negative.

preprint2010arXiv

A Derandomized Sparse Johnson-Lindenstrauss Transform

Recent work of [Dasgupta-Kumar-Sarlos, STOC 2010] gave a sparse Johnson-Lindenstrauss transform and left as a main open question whether their construction could be efficiently derandomized. We answer their question affirmatively by giving an alternative proof of their result requiring only bounded independence hash functions. Furthermore, the sparsity bound obtained in our proof is improved. The main ingredient in our proof is a spectral moment bound for quadratic forms that was recently used in [Diakonikolas-Kane-Nelson, FOCS 2010].

preprint2010arXiv

Bounded Independence Fools Degree-2 Threshold Functions

Let x be a random vector coming from any k-wise independent distribution over {-1,1}^n. For an n-variate degree-2 polynomial p, we prove that E[sgn(p(x))] is determined up to an additive epsilon for k = poly(1/epsilon). This answers an open question of Diakonikolas et al. (FOCS 2009). Using standard constructions of k-wise independent distributions, we obtain a broad class of explicit generators that epsilon-fool the class of degree-2 threshold functions with seed length log(n)*poly(1/epsilon). Our approach is quite robust: it easily extends to yield that the intersection of any constant number of degree-2 threshold functions is epsilon-fooled by poly(1/epsilon)-wise independence. Our results also hold if the entries of x are k-wise independent standard normals, implying for example that bounded independence derandomizes the Goemans-Williamson hyperplane rounding scheme. To achieve our results, we introduce a technique we dub multivariate FT-mollification, a generalization of the univariate form introduced by Kane et al. (SODA 2010) in the context of streaming algorithms. Along the way we prove a generalized hypercontractive inequality for quadratic forms which takes the operator norm of the associated matrix into account. These techniques may be of independent interest.

preprint2010arXiv

Canonical Projective Embeddings of the Deligne-Lusztig Curves Associated to $2A2$, $2B2$ and $2G2$

The Deligne-Lusztig varieties associated to the Coxeter classes of the algebraic groups 2A2, 2B2 and 2G2 are affine algebraic curves. We produce explicit projective models of the closures of these curves. Furthermore for $d$ the Coxeter number of these groups, we find polynomials for each of these models that cut out the $\F_q$-points, the $\F_{q^d}$-points and the $\F_{q^{d+1}}$-points, and demonstrate a relation satisfied by these polynomials.

preprint2010arXiv

Fast Moment Estimation in Data Streams in Optimal Space

We give a space-optimal algorithm with update time O(log^2(1/eps)loglog(1/eps)) for (1+eps)-approximating the pth frequency moment, 0 < p < 2, of a length-n vector updated in a data stream. This provides a nearly exponential improvement in the update time complexity over the previous space-optimal algorithm of [Kane-Nelson-Woodruff, SODA 2010], which had update time Omega(1/eps^2).

preprint2010arXiv

Quantum interpolation of polynomials

We consider quantum interpolation of polynomials. We imagine a quantum computer with black-box access to input/output pairs (x_i, f(x_i)), where f is a degree-d polynomial, and we wish to compute f(0). We give asymptotically tight quantum lower bounds for this problem, even in the case where 0 is among the possible values of x_i.

preprint2008arXiv

Ergodic Properties of a Class of Discrete Abelian Group Extensions of Rank-One Transformations

We define a class of discrete abelian group extensions of rank-one transformations and establish necessary and sufficient conditions for these extensions to be power weakly mixing. We show that all members of this class are multiply recurrent. We then study conditions sufficient for showing that cartesian products of transformations are conservative for a class of invertible infinite measure-preserving transformations and provide examples of these transformations.

Daniel M. Kane

What is connected

Connect this record

See the researcher in context

Building this map preview

50 published item(s)

Rigorous Implications of the Low-Degree Heuristic

Cryptographic Hardness of Learning Halfspaces with Massart Noise

Near-Optimal Bounds for Testing Histogram Distributions

Optimal SQ Lower Bounds for Robustly Learning Discrete Product Distributions and Ising Models

Agnostic Proper Learning of Halfspaces under Gaussian Marginals

Outlier-Robust Learning of Ising Models Under Dobrushin's Condition

The Optimality of Polynomial Regression for Agnostic Learning under Gaussian Marginals

Algorithms and SQ Lower Bounds for PAC Learning One-Hidden-Layer ReLU Networks

List-Decodable Mean Estimation via Iterative Multi-Filtering

Near-Optimal SQ Lower Bounds for Agnostically Learning Halfspaces and ReLUs under Gaussian Marginals

Optimal Testing of Discrete Distributions with High Probability

Point Location and Active Learning: Learning Halfspaces Almost Optimally

Prisoners, Rooms, and Lightswitches

Robust Learning of Mixtures of Gaussians

The Complexity of Adversarially Robust Proper Learning of Halfspaces with Agnostic Noise

The Power of Comparisons for Actively Learning Linear Classifiers

The Sample Complexity of Robust Covariance Testing

A New Approach for Testing Properties of Discrete Distributions

Fourier-sparse interpolation without a frequency gap

Minimal models of compact symplectic semitoric manifolds

The Fourier Transform of Poisson Multinomial Distributions and its Algorithmic Applications

Central Limit Theorems for some Set Partition Statistics

Optimal Algorithms and Lower Bounds for Testing Closeness of Structured Distributions

Optimal Learning via the Fourier Transform for Sums of Independent Integer Random Variables

Properly Learning Poisson Binomial Distributions in Almost Polynomial Time

The Average Sensitivity of an Intersection of Half Spaces

A Polylogarithmic PRG for Degree $2$ Threshold Functions in the Gaussian Setting

Asymptotic Improvements of Lower Bounds for the Least Common Multiples of Arithmetic Progressions

On the Number of ABC Solutions with Restricted Radical Sizes

Sparser Johnson-Lindenstrauss Transforms

Testing Identity of Structured Distributions

A Low-Depth Monotone Function that is not an Approximate Junta

Closed expressions for averages of set partition statistics

On the Crossing Number of Complete Graphs with an Uncrossed Hamiltonian Cycle

On the Ranks of the 2-Selmer Groups of Twists of a Given Elliptic Curve

A proof of Andrews' conjecture on Partitions with no short sequences

A Pseudorandom Generator for Polynomial Threshold Functions of Gaussian with Subpolynomial Seed Length

A Structure Theorem for Poorly Anticoncentrated Gaussian Chaoses and Applications to the Study of Polynomial Threshold Functions

An Asymptotic for the Number of Solutions to Linear Equations in Prime Numbers from Specified Chebotarev Classes

Small Designs for Path Connected Spaces and Path Connected Homogeneous Spaces

The Correct Exponent for the Gotsman-Linial Conjecture

$k$-Independent Gaussians Fool Polynomial Threshold Functions

A Small PRG for Polynomial Threshold Functions of Gaussians

Minimal S-universality criteria may vary in size

A Derandomized Sparse Johnson-Lindenstrauss Transform

Bounded Independence Fools Degree-2 Threshold Functions

Canonical Projective Embeddings of the Deligne-Lusztig Curves Associated to $2A2$, $2B2$ and $2G2$

Fast Moment Estimation in Data Streams in Optimal Space

Quantum interpolation of polynomials

Ergodic Properties of a Class of Discrete Abelian Group Extensions of Rank-One Transformations