Source author record

Omri Ben-Eliezer

Omri Ben-Eliezer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Data Structures and Algorithms math.ST Statistics Theory Computer Vision Databases Graphics Information Retrieval Information Theory Machine Learning math.CO math.IT math.MG

Catalog footprint

What is connected

6works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions

The last few years have seen a surge of work on high dimensional statistics under privacy constraints, mostly following two main lines of work: the ``worst case'' line, which does not make any distributional assumptions on the input data; and the ``strong assumptions'' line, which assumes that the data is generated from specific families, e.g., subgaussian distributions. In this work we take a middle ground, obtaining new differentially private algorithms with polynomial sample complexity for estimating quantiles in high-dimensions, as well as estimating and sampling points of high Tukey depth, all working under very mild distributional assumptions. From the technical perspective, our work relies upon deep robustness results in the convex geometry literature, demonstrating how such results can be used in a private context. Our main object of interest is the (convex) floating body (FB), a notion going back to Archimedes, which is a robust and well studied high-dimensional analogue of the interquantile range. We show how one can privately, and with polynomially many samples, (a) output an approximate interior point of the FB -- e.g., ``a typical user'' in a high-dimensional database -- by leveraging the robustness of the Steiner point of the FB; and at the expense of polynomially many more samples, (b) produce an approximate uniform sample from the FB, by constructing a private noisy projection oracle.

preprint2022arXiv

Bounded Space Differentially Private Quantiles

Estimating the quantiles of a large dataset is a fundamental problem in both the streaming algorithms literature and the differential privacy literature. However, all existing private mechanisms for distribution-independent quantile computation require space at least linear in the input size $n$. In this work, we devise a differentially private algorithm for the quantile estimation problem, with strongly sublinear space complexity, in the one-shot and continual observation settings. Our basic mechanism estimates any $α$-approximate quantile of a length-$n$ stream over a data universe $\mathcal{X}$ with probability $1-β$ using $O\left( \frac{\log (|\mathcal{X}|/β) \log (αεn)}{αε} \right)$ space while satisfying $ε$-differential privacy at a single time point. Our approach builds upon deterministic streaming algorithms for non-private quantile estimation instantiating the exponential mechanism using a utility function defined on sketch items, while (privately) sampling from intervals defined by the sketch. We also present another algorithm based on histograms that is especially suited to the multiple quantiles case. We implement our algorithms and experimentally evaluate them on synthetic and real-world datasets.

preprint2021arXiv

A Framework for Adversarially Robust Streaming Algorithms

We investigate the adversarial robustness of streaming algorithms. In this context, an algorithm is considered robust if its performance guarantees hold even if the stream is chosen adaptively by an adversary that observes the outputs of the algorithm along the stream and can react in an online manner. While deterministic streaming algorithms are inherently robust, many central problems in the streaming literature do not admit sublinear-space deterministic algorithms; on the other hand, classical space-efficient randomized algorithms for these problems are generally not adversarially robust. This raises the natural question of whether there exist efficient adversarially robust (randomized) streaming algorithms for these problems. In this work, we show that the answer is positive for various important streaming problems in the insertion-only model, including distinct elements and more generally $F_p$-estimation, $F_p$-heavy hitters, entropy estimation, and others. For all of these problems, we develop adversarially robust $(1+\varepsilon)$-approximation algorithms whose required space matches that of the best known non-robust algorithms up to a $\text{poly}(\log n, 1/\varepsilon)$ multiplicative factor (and in some cases even up to a constant factor). Towards this end, we develop several generic tools allowing one to efficiently transform a non-robust streaming algorithm into a robust one in various scenarios.

preprint2021arXiv

Adversarial Laws of Large Numbers and Optimal Regret in Online Classification

Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are \emph{online learnable}. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of \emph{Littlestone's dimension}, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015).

preprint2020arXiv

READ: Recursive Autoencoders for Document Layout Generation

Layout is a fundamental component of any graphic design. Creating large varieties of plausible document layouts can be a tedious task, requiring numerous constraints to be satisfied, including local ones relating different semantic elements and global constraints on the general appearance and spacing. In this paper, we present a novel framework, coined READ, for REcursive Autoencoders for Document layout generation, to generate plausible 2D layouts of documents in large quantities and varieties. First, we devise an exploratory recursive method to extract a structural decomposition of a single document. Leveraging a dataset of documents annotated with labeled bounding boxes, our recursive neural network learns to map the structural representation, given in the form of a simple hierarchy, to a compact code, the space of which is approximated by a Gaussian distribution. Novel hierarchies can be sampled from this space, obtaining new document layouts. Moreover, we introduce a combinatorial metric to measure structural similarity among document layouts. We deploy it to show that our method is able to generate highly variable and realistic layouts. We further demonstrate the utility of our generated layouts in the context of standard detection tasks on documents, showing that detection performance improves when the training data is augmented with generated documents whose layouts are produced by READ.

preprint2020arXiv

The hat guessing number of graphs

Consider the following hat guessing game: $n$ players are placed on $n$ vertices of a graph, each wearing a hat whose color is arbitrarily chosen from a set of $q$ possible colors. Each player can see the hat colors of his neighbors, but not his own hat color. All of the players are asked to guess their own hat colors simultaneously, according to a predetermined guessing strategy and the hat colors they see, where no communication between them is allowed. Given a graph $G$, its hat guessing number ${\rm{HG}}(G)$ is the largest integer $q$ such that there exists a guessing strategy guaranteeing at least one correct guess for any hat assignment of $q$ possible colors. In 2008, Butler et al. asked whether the hat guessing number of the complete bipartite graph $K_{n,n}$ is at least some fixed positive (fractional) power of $n$. We answer this question affirmatively, showing that for sufficiently large $n$, the complete $r$-partite graph $K_{n,\ldots,n}$ satisfies ${\rm{HG}}(K_{n,\ldots,n})=Ω(n^{\frac{r-1}{r}-o(1)})$. Our guessing strategy is based on a probabilistic construction and other combinatorial ideas, and can be extended to show that ${\rm{HG}}(\vec{C}_{n,\ldots,n})=Ω(n^{\frac{1}{r}-o(1)})$, where $\vec{C}_{n,\ldots,n}$ is the blow-up of a directed $r$-cycle, and where for directed graphs each player sees only the hat colors of his outneighbors.

Omri Ben-Eliezer

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Archimedes Meets Privacy: On Privately Estimating Quantiles in High Dimensions Under Minimal Assumptions

Bounded Space Differentially Private Quantiles

A Framework for Adversarially Robust Streaming Algorithms

Adversarial Laws of Large Numbers and Optimal Regret in Online Classification

READ: Recursive Autoencoders for Document Layout Generation

The hat guessing number of graphs