Graph explorer

Testing Data Binnings

Motivated by the question of data quantization and "binning," we revisit the problem of identity testing of discrete probability distributions. Identity testing (a.k.a. one-sample testing), a fundamental and by now well-understood problem in distribution testing, asks, given a reference distribution (model) $\mathbf{q}$ and samples from an unknown distribution $\mathbf{p}$, both over $[n]=\{1,2,\dots,n\}$, whether $\mathbf{p}$ equals $\mathbf{q}$, or is significantly different from it. In this paper, we introduce the related question of 'identity up to binning,' where the reference distribution $\mathbf{q}$ is over $k \ll n$ elements: the question is then whether there exists a suitable binning of the domain $[n]$ into $k$ intervals such that, once "binned," $\mathbf{p}$ is equal to $\mathbf{q}$. We provide nearly tight upper and lower bounds on the sample complexity of this new question, showing both a quantitative and qualitative difference with the vanilla identity testing one, and answering an open question of Canonne (2019). Finally, we discuss several extensions and related research directions.

5 nodes4 linksoverview previewTesting Data Binnings
5 nodes4 links
Testing Data Binnings5 visible / 5 total nodes / 5 links
Co-authorshipAuthorshipAuthorshipTopic signalTopic signalWTesting Data Binningspreprint / 2020AClément L. CanonneResearcherAKarl WimmerResearcherTData Structures and Alg...3564 worksTDiscrete Mathematics1775 works
PaperSignal 104 links

Testing Data Binnings

preprint / 2020

Open