Source author record

Clément L. Canonne

Clément L. Canonne appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics math.ST Statistics Theory Machine Learning math.PR Information Theory math.IT Computational Complexity Cryptography and Security

Catalog footprint

What is connected

14works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Independence Testing for Bounded Degree Bayesian Network

We study the following independence testing problem: given access to samples from a distribution $P$ over $\{0,1\}^n$, decide whether $P$ is a product distribution or whether it is $\varepsilon$-far in total variation distance from any product distribution. For arbitrary distributions, this problem requires $\exp(n)$ samples. We show in this work that if $P$ has a sparse structure, then in fact only linearly many samples are required. Specifically, if $P$ is Markov with respect to a Bayesian network whose underlying DAG has in-degree bounded by $d$, then $\tildeΘ(2^{d/2}\cdot n/\varepsilon^2)$ samples are necessary and sufficient for independence testing.

preprint2022arXiv

Near-Optimal Bounds for Testing Histogram Distributions

We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over $[n]$, are probability distributions that are piecewise constant over a set of $k$ intervals. The histogram testing problem is the following: Given samples from an unknown distribution $\mathbf{p}$ on $[n]$, we want to distinguish between the cases that $\mathbf{p}$ is a $k$-histogram versus $\varepsilon$-far from any $k$-histogram, in total variation distance. Our main result is a sample near-optimal and computationally efficient algorithm for this testing problem, and a nearly-matching (within logarithmic factors) sample complexity lower bound. Specifically, we show that the histogram testing problem has sample complexity $\widetilde Θ(\sqrt{nk} / \varepsilon + k / \varepsilon^2 + \sqrt{n} / \varepsilon^2)$.

preprint2022arXiv

Optimal Closeness Testing of Discrete Distributions Made (Complex) Simple

In this note, we revisit the recent work of Diakonikolas, Gouleakis, Kane, Peebles, and Price (2021), and provide an alternative proof of their main result. Our argument does not rely on any specific property of Poisson random variables (such as stability and divisibility) nor on any "clever trick," but instead on an identity relating the expectation of the absolute value of any random variable to the integral of its characteristic function: \[ \mathbb{E}[|X|] = \frac{2}π\int_0^\infty \frac{1-\Re(\mathbb{E}[e^{i tX}])}{t^2}\, dt \] Our argument, while not devoid of technical aspects, is arguably conceptually simpler and more general; and we hope this technique can find additional applications in distribution testing.

preprint2022arXiv

Private Identity Testing for High-Dimensional Distributions

In this work we present novel differentially private identity (goodness-of-fit) testers for natural and widely studied classes of multivariate product distributions: Gaussians in $\mathbb{R}^d$ with known covariance and product distributions over $\{\pm 1\}^{d}$. Our testers have improved sample complexity compared to those derived from previous techniques, and are the first testers whose sample complexity matches the order-optimal minimax sample complexity of $O(d^{1/2}/α^2)$ in many parameter regimes. We construct two types of testers, exhibiting tradeoffs between sample complexity and computational complexity. Finally, we provide a two-way reduction between testing a subclass of multivariate product distributions and testing univariate distributions, and thereby obtain upper and lower bounds for testing this subclass of product distributions.

preprint2022arXiv

The Role of Interactivity in Structured Estimation

We study high-dimensional sparse estimation under three natural constraints: communication constraints, local privacy constraints, and linear measurements (compressive sensing). Without sparsity assumptions, it has been established that interactivity cannot improve the minimax rates of estimation under these information constraints. The question of whether interactivity helps with natural inference tasks has been a topic of active research. We settle this question in the affirmative for the prototypical problems of high-dimensional sparse mean estimation and compressive sensing, by demonstrating a gap between interactive and noninteractive protocols. We further establish that the gap increases when we have more structured sparsity: for block sparsity this gap can be as large as polynomial in the dimensionality. Thus, the more structured the sparsity is, the greater is the advantage of interaction. Proving the lower bounds requires a careful breaking of a sum of correlated random variables into independent components using Baranyai's theorem on decomposition of hypergraphs, which might be of independent interest.

preprint2021arXiv

Inference under Information Constraints III: Local Privacy Constraints

We study goodness-of-fit and independence testing of discrete distributions in a setting where samples are distributed across multiple users. The users wish to preserve the privacy of their data while enabling a central server to perform the tests. Under the notion of local differential privacy, we propose simple, sample-optimal, and communication-efficient protocols for these two questions in the noninteractive setting, where in addition users may or may not share a common random seed. In particular, we show that the availability of shared (public) randomness greatly reduces the sample complexity. Underlying our public-coin protocols are privacy-preserving mappings which, when applied to the samples, minimally contract the distance between their respective probability distributions.

preprint2021arXiv

Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

We give a nearly-optimal algorithm for testing uniformity of distributions supported on $\{-1,1\}^n$, which makes $\tilde O (\sqrt{n}/\varepsilon^2)$ queries to a subcube conditional sampling oracle (Bhattacharyya and Chakraborty (2018)). The key technical component is a natural notion of random restriction for distributions on $\{-1,1\}^n$, and a quantitative analysis of how such a restriction affects the mean vector of the distribution. Along the way, we consider the problem of mean testing with independent samples and provide a nearly-optimal algorithm.

preprint2020arXiv

A short note on learning discrete distributions

The goal of this short note is to provide simple proofs for the "folklore facts" on the sample complexity of learning a discrete probability distribution over a known domain of size $k$ to various distances $\varepsilon$, with error probability $δ$.

preprint2020arXiv

Testing Data Binnings

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both over $[n]=\{1,2,\dots,n\}$, whether $\mathbf{p}$ equals $\mathbf{q}$, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.

preprint2016arXiv

Testing $k$-Monotonicity

A Boolean $k$-monotone function defined over a finite poset domain ${\cal D}$ alternates between the values $0$ and $1$ at most $k$ times on any ascending chain in ${\cal D}$. Therefore, $k$-monotone functions are natural generalizations of the classical monotone functions, which are the $1$-monotone functions. Motivated by the recent interest in $k$-monotone functions in the context of circuit complexity and learning theory, and by the central role that monotonicity testing plays in the context of property testing, we initiate a systematic study of $k$-monotone functions, in the property testing model. In this model, the goal is to distinguish functions that are $k$-monotone (or are close to being $k$-monotone) from functions that are far from being $k$-monotone. Our results include the following: - We demonstrate a separation between testing $k$-monotonicity and testing monotonicity, on the hypercube domain $\{0,1\}^d$, for $k\geq 3$; - We demonstrate a separation between testing and learning on $\{0,1\}^d$, for $k=ω(\log d)$: testing $k$-monotonicity can be performed with $2^{O(\sqrt d \cdot \log d\cdot \log{1/\varepsilon})}$ queries, while learning $k$-monotone functions requires $2^{Ω(k\cdot \sqrt d\cdot{1/\varepsilon})}$ queries (Blais et al. (RANDOM 2015)). - We present a tolerant test for functions $f\colon[n]^d\to \{0,1\}$ with complexity independent of $n$, which makes progress on a problem left open by Berman et al. (STOC 2014). Our techniques exploit the testing-by-learning paradigm, use novel applications of Fourier analysis on the grid $[n]^d$, and draw connections to distribution testing techniques.

preprint2016arXiv

Testing Shape Restrictions of Discrete Distributions

We study the question of testing structured properties (classes) of discrete distributions. Specifically, given sample access to an arbitrary distribution $D$ over $[n]$ and a property $\mathcal{P}$, the goal is to distinguish between $D\in\mathcal{P}$ and $\ell_1(D,\mathcal{P})>\varepsilon$. We develop a general algorithm for this question, which applies to a large range of "shape-constrained" properties, including monotone, log-concave, $t$-modal, piecewise-polynomial, and Poisson Binomial distributions. Moreover, for all cases considered, our algorithm has near-optimal sample complexity with regard to the domain size and is computationally efficient. For most of these classes, we provide the first non-trivial tester in the literature. In addition, we also describe a generic method to prove lower bounds for this problem, and use it to show our upper bounds are nearly tight. Finally, we extend some of our techniques to tolerant testing, deriving nearly-tight upper and lower bounds for the corresponding questions.

preprint2016arXiv

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism

A function $f\colon \{-1,1\}^n \to \{-1,1\}$ is a $k$-junta if it depends on at most $k$ of its variables. We consider the problem of tolerant testing of $k$-juntas, where the testing algorithm must accept any function that is $ε$-close to some $k$-junta and reject any function that is $ε'$-far from every $k'$-junta for some $ε'= O(ε)$ and $k' = O(k)$. Our first result is an algorithm that solves this problem with query complexity polynomial in $k$ and $1/ε$. This result is obtained via a new polynomial-time approximation algorithm for submodular function minimization (SFM) under large cardinality constraints, which holds even when only given an approximate oracle access to the function. Our second result considers the case where $k'=k$. We show how to obtain a smooth tradeoff between the amount of tolerance and the query complexity in this setting. Specifically, we design an algorithm that given $ρ\in(0,1/2)$ accepts any function that is $\frac{ερ}{16}$-close to some $k$-junta and rejects any function that is $ε$-far from every $k$-junta. The query complexity of the algorithm is $O\big( \frac{k\log k}{ερ(1-ρ)^k} \big)$. Finally, we show how to apply the second result to the problem of tolerant isomorphism testing between two unknown Boolean functions $f$ and $g$. We give an algorithm for this problem whose query complexity only depends on the (unknown) smallest $k$ such that either $f$ or $g$ is close to being a $k$-junta.

preprint2015arXiv

Big Data on the Rise: Testing monotonicity of distributions

The field of property testing of probability distributions, or distribution testing, aims to provide fast and (most likely) correct answers to questions pertaining to specific aspects of very large datasets. In this work, we consider a property of particular interest, monotonicity of distributions. We focus on the complexity of monotonicity testing across different models of access to the distributions; and obtain results in these new settings that differ significantly from the known bounds in the standard sampling model.

preprint2014arXiv

Learning circuits with few negations

Monotone Boolean functions, and the monotone Boolean circuits that compute them, have been intensively studied in complexity theory. In this paper we study the structure of Boolean functions in terms of the minimum number of negations in any circuit computing them, a complexity measure that interpolates between monotone functions and the class of all functions. We study this generalization of monotonicity from the vantage point of learning theory, giving near-matching upper and lower bounds on the uniform-distribution learnability of circuits in terms of the number of negations they contain. Our upper bounds are based on a new structural characterization of negation-limited circuits that extends a classical result of A. A. Markov. Our lower bounds, which employ Fourier-analytic tools from hardness amplification, give new results even for circuits with no negations (i.e. monotone functions).

Clément L. Canonne

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Independence Testing for Bounded Degree Bayesian Network

Near-Optimal Bounds for Testing Histogram Distributions

Optimal Closeness Testing of Discrete Distributions Made (Complex) Simple

Private Identity Testing for High-Dimensional Distributions

The Role of Interactivity in Structured Estimation

Inference under Information Constraints III: Local Privacy Constraints

Random Restrictions of High-Dimensional Distributions and Uniformity Testing with Subcube Conditioning

A short note on learning discrete distributions

Testing Data Binnings

Testing $k$-Monotonicity

Testing Shape Restrictions of Discrete Distributions

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism

Big Data on the Rise: Testing monotonicity of distributions

Learning circuits with few negations