Source author record

Shay Moran

Shay Moran appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.CO Discrete Mathematics Data Structures and Algorithms Computational Geometry Computer Science and Game Theory Computational Complexity Cryptography and Security Distributed, Parallel, and Cluster Computing Information Theory Logic in Computer Science math.AG math.GM math.LO math.ST Statistics Theory

Catalog footprint

What is connected

31works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

On the Learning Curves of Revenue Maximization

Learning curves are a fundamental primitive in supervised learning, describing how an algorithm's performance improves with more data and providing a quantitative measure of its generalization ability. Formally, a learning curve plots the decay of an algorithm's error for a fixed underlying distribution as a function of the number of training samples. Prior work on revenue-maximizing learning algorithms, starting with the seminal work of Cole and Roughgarden [STOC, 2014], adopts a distribution-free perspective, which parallels the PAC learning framework in learning theory. This approach evaluates performance against the hardest possible sequence of valuation distributions, one for each sample size, effectively defining the upper envelope of learning curves over all possible distributions, thus leading to error bounds that do not capture the shape of the learning curves. In this work we initiate the study of learning curves for revenue maximization and provide a near-complete characterization of their rate of decay in the basic setting of a single item and a single buyer. In the absence of any restriction on the valuation distribution, we show that there exists a Bayes-consistent algorithm, meaning that its learning curve converges to zero for any arbitrary valuation distribution as the number of samples $n \to \infty$. However, this convergence must be arbitrarily slow, even if the optimal revenue is finite. In contrast, if the optimal revenue is achieved by a finite price, then the optimal rate of decay is roughly $1/\sqrt{n}$. Finally, for distributions supported on discrete sets of values, we show that learning curves decay almost exponentially fast, a rate unattainable under the PAC framework.

preprint2026arXiv

Online Set Learning from Precision and Recall Feedback

We consider the problem of learning an unknown subset $N_\text{target}$ of a domain in an online setting. In each round $t$, the learner predicts a set of items ${N}_t$ and receives one of two types of feedback, each with equal probability: precision feedback, in which a randomly chosen item from the predicted set $N_t$ is revealed and the learner is told whether it belongs to $N_\text{target}$ (incurring a reward if it does), or recall feedback, in which a randomly chosen item from the target set $N_\text{target}$ is revealed and the learner is told whether it belongs to $N_t$ (incurring a reward if it does). The goal is to maximize the cumulative reward over time. This simple online set learning problem abstracts a variety of learning scenarios with precision- and recall-type feedback. We show that a hypothesis class (a family of subsets of the domain) is learnable in this setting if and only if it has finite Vapnik-Chervonenkis (VC) dimension, mirroring the classical PAC characterization. However, the resulting algorithmic structure is markedly more intricate: in contrast to standard Probably Approximately Correct (PAC) learning -- where the algorithmic landscape is governed by the simple principle of Empirical Risk Minimization (ERM) -- our partial feedback model can invalidate ERM and even all proper learning rules. We develop algorithms to address the dependencies induced by the feedback, obtaining regret guarantees in both the realizable and agnostic settings. Our results provide a qualitative characterization of learnability in this model, addressing its most basic question, while pointing to a range of natural and intriguing open questions, including the determination of optimal regret rates.

preprint2026arXiv

Optimal Reconstruction from Linear Queries

We study the problem of reconstructing an unknown point in $\mathbb{R}^d$ from approximate linear queries. This setting arises naturally in applications ranging from low-dimensional remote sensing and signal recovery to high-dimensional data analysis and privacy-sensitive inference. Our main goal is to characterize the optimal reconstruction error as a function of the number of queries $T$, the ambient dimension $d$, and the noise parameter $δ$. We first analyze the limit $T \to \infty$ and show that the optimal reconstruction error converges to the explicit value $\sqrt{2d/(d+1)} δ$, which plays a role analogous to the Bayes optimal error in supervised learning. When the dimension is fixed, we show that the excess error above this limit decays doubly exponentially fast as $T \to \infty$, a rate that is significantly faster than those typically encountered in learning curves. When the dimension grows, we show that a number of queries on the order of $\exp(d)$ is necessary and sufficient to achieve vanishing excess error. Finally, we introduce and analyze an improper variant of the reconstruction problem. From a technical perspective, our main contribution is a generalization of Jung's theorem (1901). The classical theorem bounds the maximum possible radius of a set of diameter 1 and characterizes extremal bodies. Our generalization provides a robust variant that characterizes near-extremal bodies and is proved via geometric and dynamical arguments exploiting symmetry and Lie group actions.

preprint2026arXiv

Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale

We study the optimal scale at which real-valued function classes exhibit uniform convergence and learnability. Our main result establishes a scale-sensitive generalization of the fundamental theorem of PAC learning: for every bounded real-valued class and every $γ>0$, uniform convergence at scale $γ$, agnostic learnability at scale $γ/2$, and finiteness of the fat-shattering dimension at every scale $γ'>γ$ are equivalent. This resolves a question by Anthony and Bartlett (Cambridge Univ. Press 1999) on the precise scales governing learnability, refuting a conjecture attributed there to Phil Long that a multiplicative 2-factor gap is unavoidable, and improves the upper bounds of Bartlett and Long (JCSS 1998), which incur such a loss. The key technical ingredient is a direct bound on empirical $\ell_\infty$ covering numbers, avoiding the standard detour through packing numbers. As a consequence, we obtain sharp asymptotic metric-entropy bounds in terms of the fat-shattering scale $γ$: an $O(\log^2 n)$ bound holds already at scale $γ/2$, while an $O(\log n)$ bound holds at scale $2γ$. We further show that the $O(\log^2 n)$ bound is sometimes tight. These results resolve open questions by Alon et al. (JACM 1997) and Rudelson and Vershynin (Ann. of Math. 2006). As an application, we establish a sharp dichotomy for bounded integral probability metrics: every such IPM is either estimable or cannot be weakly evaluated within any multiplicative factor $c<3$, while $3$-weak evaluability always holds, resolving an open question from Aiyer et al. (ICML 2026). We also highlight several open questions on quantitative sample complexity and evaluability.

preprint2026arXiv

Strategic PAC Learnability via Geometric Definability

Strategic classification studies learning settings in which individuals can modify their features, at a cost, in order to influence the classifier's decision. A central question is how the sample complexity of the induced (strategic) hypothesis class depends on the complexities of the underlying hypothesis class and the cost structure governing feasible manipulations. Prior work has shown that in several natural settings, such as linear classifiers with norm costs, the induced complexity can be controlled. We begin by showing that such guarantees fail in general - even in simple cases: there exist hypothesis classes of VC dimension $1$ on the real line such that, even under the simplest interval neighborhoods, the induced class has infinite VC dimension. Thus, strategic behavior can turn an easy learning problem into a non-learnable one. To overcome this, we introduce structure via a geometric definability assumption: both the hypothesis class and the cost-induced neighborhood relation can be defined by first-order formulas over $\mathbb{R}_{\mathtt{exp}}$. Intuitively, this means that hypotheses and costs can be described using arithmetic operations, exponentiation, logarithms, and comparisons. This captures a broad range of natural classes and cost functions, including $\ell_p$ distances, Wasserstein distance, and information-theoretic divergences. Under this assumption, we prove that learnability is preserved, with sample complexity controlled by the complexity of the defining formulas.

preprint2022arXiv

Active Learning with Label Comparisons

Supervised learning typically relies on manual annotation of the true labels. When there are many potential classes, searching for the best one can be prohibitive for a human annotator. On the other hand, comparing two candidate labels is often much easier. We focus on this type of pairwise supervision and ask how it can be used effectively in learning, and in particular in active learning. We obtain several insightful results in this context. In principle, finding the best of $k$ labels can be done with $k-1$ active queries. We show that there is a natural class where this approach is sub-optimal, and that there is a more comparison-efficient active learning scheme. A key element in our analysis is the "label neighborhood graph" of the true distribution, which has an edge between two classes if they share a decision boundary. We also show that in the PAC setting, pairwise comparisons cannot provide improved sample complexity in the worst case. We complement our theoretical results with experiments, clearly demonstrating the effect of the neighborhood graph on sample complexity.

preprint2022arXiv

How Expressive Are Friendly School Partitions?

A natural procedure for assigning students to classes in the beginning of the school-year is to let each student write down a list of $d$ other students with whom she/he wants to be in the same class (typically $d=3$). The teachers then gather all the lists and try to assign the students to classes in a way that each student is assigned to the same class with at least one student from her/his list. We refer to such partitions as friendly. In realistic scenarios, the teachers may also consider other constraints when picking the friendly partition: e.g. there may be a group of students whom the teachers wish to avoid assigning to the same class; alternatively, there may be two close friends whom the teachers want to put together; etc. Inspired by such challenges, we explore questions concerning the expressiveness of friendly partitions. For example: Does there always exist a friendly partition? More generally, how many friendly partitions are there? Can every student $u$ be separated from any other student $v$? Does there exist a student $u$ that can be separated from any other student $v$? We show that when $d\geq 3$ there always exist at least $2$ friendly partitions and when $d\geq 15$ there always exists a student $u$ which can be separated from any other student $v$. The question regarding separability of each pair of students is left open, but we give a positive answer under the additional assumption that each student appears in at most roughly $\exp(d)$ lists. We further suggest several open questions and present some preliminary findings towards resolving them.

preprint2022arXiv

Integral Probability Metrics PAC-Bayes Bounds

We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and improved bounds in favorable cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm- and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.

preprint2022arXiv

Understanding Generalization via Leave-One-Out Conditional Mutual Information

We study the mutual information between (certain summaries of) the output of a learning algorithm and its $n$ training data, conditional on a supersample of $n+1$ i.i.d. data from which the training data is chosen at random without replacement. These leave-one-out variants of the conditional mutual information (CMI) of an algorithm (Steinke and Zakynthinou, 2020) are also seen to control the mean generalization error of learning algorithms with bounded loss functions. For learning algorithms achieving zero empirical risk under 0-1 loss (i.e., interpolating algorithms), we provide an explicit connection between leave-one-out CMI and the classical leave-one-out error estimate of the risk. Using this connection, we obtain upper and lower bounds on risk in terms of the (evaluated) leave-one-out CMI. When the limiting risk is constant or decays polynomially, the bounds converge to within a constant factor of two. As an application, we analyze the population risk of the one-inclusion graph algorithm, a general-purpose transductive learning algorithm for VC classes in the realizable setting. Using leave-one-out CMI, we match the optimal bound for learning VC classes in the realizable setting, answering an open challenge raised by Steinke and Zakynthinou (2020). Finally, in order to understand the role of leave-one-out CMI in studying generalization, we place leave-one-out CMI in a hierarchy of measures, with a novel unconditional mutual information at the root. For 0-1 loss and interpolating learning algorithms, this mutual information is observed to be precisely the risk.

preprint2022arXiv

Unlabeled sample compression schemes and corner peelings for ample and maximum classes

We examine connections between combinatorial notions that arise in machine learning and topological notions in cubical/simplicial geometry. These connections enable to export results from geometry to machine learning. Our first main result is based on a geometric construction by Tracy Hall (2004) of a partial shelling of the cross-polytope which can not be extended. We use it to derive a maximum class of VC dimension 3 that has no corners. This refutes several previous works in machine learning from the past 11 years. In particular, it implies that all previous constructions of optimal unlabeled sample compression schemes for maximum classes are erroneous. On the positive side we present a new construction of an unlabeled sample compression scheme for maximum classes. We leave as open whether our unlabeled sample compression scheme extends to ample (a.k.a. lopsided or extremal) classes, which represent a natural and far-reaching generalization of maximum classes. Towards resolving this question, we provide a geometric characterization in terms of unique sink orientations of the 1-skeletons of associated cubical complexes.

preprint2021arXiv

Adversarial Laws of Large Numbers and Optimal Regret in Online Classification

Laws of large numbers guarantee that given a large enough sample from some population, the measure of any fixed sub-population is well-estimated by its frequency in the sample. We study laws of large numbers in sampling processes that can affect the environment they are acting upon and interact with it. Specifically, we consider the sequential sampling model proposed by Ben-Eliezer and Yogev (2020), and characterize the classes which admit a uniform law of large numbers in this model: these are exactly the classes that are \emph{online learnable}. Our characterization may be interpreted as an online analogue to the equivalence between learnability and uniform convergence in statistical (PAC) learning. The sample-complexity bounds we obtain are tight for many parameter regimes, and as an application, we determine the optimal regret bounds in online learning, stated in terms of \emph{Littlestone's dimension}, thus resolving the main open question from Ben-David, Pál, and Shalev-Shwartz (2009), which was also posed by Rakhlin, Sridharan, and Tewari (2015).

preprint2021arXiv

Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games

Which classes can be learned properly in the online model? -- that is, by an algorithm that at each round uses a predictor from the concept class. While there are simple and natural cases where improper learning is necessary, it is natural to ask how complex must the improper predictors be in such cases. Can one always achieve nearly optimal mistake/regret bounds using "simple" predictors? In this work, we give a complete characterization of when this is possible, thus settling an open problem which has been studied since the pioneering works of Angluin (1987) and Littlestone (1988). More precisely, given any concept class C and any hypothesis class H, we provide nearly tight bounds (up to a log factor) on the optimal mistake bounds for online learning C using predictors from H. Our bound yields an exponential improvement over the previously best known bound by Chase and Freitag (2020). As applications, we give constructive proofs showing that (i) in the realizable setting, a near-optimal mistake bound (up to a constant factor) can be attained by a sparse majority-vote of proper predictors, and (ii) in the agnostic setting, a near-optimal regret bound (up to a log factor) can be attained by a randomized proper algorithm. A technical ingredient of our proof which may be of independent interest is a generalization of the celebrated Minimax Theorem (von Neumann, 1928) for binary zero-sum games. A simple game which fails to satisfy Minimax is "Guess the Larger Number", where each player picks a number and the larger number wins. The payoff matrix is infinite triangular. We show this is the only obstruction: if a game does not contain triangular submatrices of unbounded sizes then the Minimax Theorem holds. This generalizes von Neumann's Minimax Theorem by removing requirements of finiteness (or compactness), and captures precisely the games of interest in online learning.

preprint2020arXiv

A Sauer-Shelah-Perles Lemma for Lattices

We study lattice-theoretical extensions of the celebrated Sauer-Shelah-Perles Lemma. We conjecture that a general Sauer-Shelah-Perlem Lemma holds for a lattice $L$ if and only if $L$ is relatively complemented, and prove partial results towards this conjecture.

preprint2020arXiv

Closure Properties for Private Classification and Online Prediction

Let~$\cH$ be a class of boolean functions and consider a {\it composed class} $\cH'$ that is derived from~$\cH$ using some arbitrary aggregation rule (for example, $\cH'$ may be the class of all 3-wise majority-votes of functions in $\cH$). We upper bound the Littlestone dimension of~$\cH'$ in terms of that of~$\cH$. As a corollary, we derive closure properties for online learning and private PAC learning. The derived bounds on the Littlestone dimension exhibit an undesirable exponential dependence. For private learning, we prove close to optimal bounds that circumvents this suboptimal dependency. The improved bounds on the sample complexity of private learning are derived algorithmically via transforming a private learner for the original class $\cH$ to a private learner for the composed class~$\cH'$. Using the same ideas we show that any ({\em proper or improper}) private algorithm that learns a class of functions $\cH$ in the realizable case (i.e., when the examples are labeled by some function in the class) can be transformed to a private algorithm that learns the class $\cH$ in the agnostic case.

preprint2020arXiv

Elementary Derivations of the Real Composition Algebras

``Real Normed Algebras Revisited,'' the last paper of the late Gadi Moran, attempts to reconstruct the discovery of the complex numbers, the quaternions and the octonions, as well as proofs of their properties, using only what was known to 19th century mathematicians. In his research, Gadi had discovered simple and elegant proofs of the above-mentioned classical results using only basic properties of the geometry of Euclidean spaces and tools from high school geometry. His reconstructions underline an interesting connection between Euclidean geometry and these algebras, and avoid the advanced machinery used in previous derivations of these results. The goal of this article is to present Gadi's derivations in a way that is accessible to a wide audience of readers.

preprint2020arXiv

Learning from Mixtures of Private and Public Populations

We initiate the study of a new model of supervised learning under privacy constraints. Imagine a medical study where a dataset is sampled from a population of both healthy and unhealthy individuals. Suppose healthy individuals have no privacy concerns (in such case, we call their data "public") while the unhealthy individuals desire stringent privacy protection for their data. In this example, the population (data distribution) is a mixture of private (unhealthy) and public (healthy) sub-populations that could be very different. Inspired by the above example, we consider a model in which the population $\mathcal{D}$ is a mixture of two sub-populations: a private sub-population $\mathcal{D}_{\sf priv}$ of private and sensitive data, and a public sub-population $\mathcal{D}_{\sf pub}$ of data with no privacy concerns. Each example drawn from $\mathcal{D}$ is assumed to contain a privacy-status bit that indicates whether the example is private or public. The goal is to design a learning algorithm that satisfies differential privacy only with respect to the private examples. Prior works in this context assumed a homogeneous population where private and public data arise from the same distribution, and in particular designed solutions which exploit this assumption. We demonstrate how to circumvent this assumption by considering, as a case study, the problem of learning linear classifiers in $\mathbb{R}^d$. We show that in the case where the privacy status is correlated with the target label (as in the above example), linear classifiers in $\mathbb{R}^d$ can be learned, in the agnostic as well as the realizable setting, with sample complexity which is comparable to that of the classical (non-private) PAC-learning. It is known that this task is impossible if all the data is considered private.

preprint2020arXiv

Online Agnostic Boosting via Regret Minimization

Boosting is a widely used machine learning approach based on the idea of aggregating weak learning rules. While in statistical learning numerous boosting methods exist both in the realizable and agnostic settings, in online learning they exist only in the realizable case. In this work we provide the first agnostic online boosting algorithm; that is, given a weak learner with only marginally-better-than-trivial regret guarantees, our algorithm boosts it to a strong learner with sublinear regret. Our algorithm is based on an abstract (and simple) reduction to online convex optimization, which efficiently converts an arbitrary online convex optimizer to an online booster. Moreover, this reduction extends to the statistical as well as the online realizable settings, thus unifying the 4 cases of statistical/online and agnostic/realizable boosting.

preprint2020arXiv

Private Query Release Assisted by Public Data

We study the problem of differentially private query release assisted by access to public data. In this problem, the goal is to answer a large class $\mathcal{H}$ of statistical queries with error no more than $α$ using a combination of public and private samples. The algorithm is required to satisfy differential privacy only with respect to the private samples. We study the limits of this task in terms of the private and public sample complexities. First, we show that we can solve the problem for any query class $\mathcal{H}$ of finite VC-dimension using only $d/α$ public samples and $\sqrt{p}d^{3/2}/α^2$ private samples, where $d$ and $p$ are the VC-dimension and dual VC-dimension of $\mathcal{H}$, respectively. In comparison, with only private samples, this problem cannot be solved even for simple query classes with VC-dimension one, and without any private samples, a larger public sample of size $d/α^2$ is needed. Next, we give sample complexity lower bounds that exhibit tight dependence on $p$ and $α$. For the class of decision stumps, we give a lower bound of $\sqrt{p}/α$ on the private sample complexity whenever the public sample size is less than $1/α^2$. Given our upper bounds, this shows that the dependence on $\sqrt{p}$ is necessary in the private sample complexity. We also give a lower bound of $1/α$ on the public sample complexity for a broad family of query classes, which by our upper bound, is tight in $α$.

preprint2020arXiv

Proper Learning, Helly Number, and an Optimal SVM Bound

The classical PAC sample complexity bounds are stated for any Empirical Risk Minimizer (ERM) and contain an extra logarithmic factor $\log(1/ε)$ which is known to be necessary for ERM in general. It has been recently shown by Hanneke (2016) that the optimal sample complexity of PAC learning for any VC class C is achieved by a particular improper learning algorithm, which outputs a specific majority-vote of hypotheses in C. This leaves the question of when this bound can be achieved by proper learning algorithms, which are restricted to always output a hypothesis from C. In this paper we aim to characterize the classes for which the optimal sample complexity can be achieved by a proper learning algorithm. We identify that these classes can be characterized by the dual Helly number, which is a combinatorial parameter that arises in discrete geometry and abstract convexity. In particular, under general conditions on C, we show that the dual Helly number is bounded if and only if there is a proper learner that obtains the optimal joint dependence on $ε$ and $δ$. As further implications of our techniques we resolve a long-standing open problem posed by Vapnik and Chervonenkis (1974) on the performance of the Support Vector Machine by proving that the sample complexity of SVM in the realizable case is $Θ((n/ε)+(1/ε)\log(1/δ))$, where $n$ is the dimension. This gives the first optimal PAC bound for Halfspaces achieved by a proper learning algorithm, and moreover is computationally efficient.

preprint2020arXiv

Shattered Sets and the Hilbert Function

We study complexity measures on subsets of the boolean hypercube and exhibit connections between algebra (the Hilbert function) and combinatorics (VC theory). These connections yield results in both directions. Our main complexity-theoretic result proves that most linear program feasibility problems cannot be computed by polynomial-sized constant-depth circuits. Moreover, our result applies to a stronger regime in which the hyperplanes are fixed and only the directions of the inequalities are given as input to the circuit. We derive this result by proving that a rich class of extremal functions in VC theory cannot be approximated by low-degree polynomials. We also present applications of algebra to combinatorics. We provide a new algebraic proof of the Sandwich Theorem, which is a generalization of the well-known Sauer-Perles-Shelah Lemma. Finally, we prove a structural result about downward-closed sets, related to the Chvátal conjecture in extremal combinatorics.

preprint2016arXiv

Hitting Set for hypergraphs of low VC-dimension

We study the complexity of the Hitting Set problem in set systems (hypergraphs) that avoid certain sub-structures. In particular, we characterize the classical and parameterized complexity of the problem when the Vapnik-Chervonenkis dimension (VC-dimension) of the input is small. VC-dimension is a natural measure of complexity of set systems. Several tractable instances of Hitting Set with a geometric or graph-theoretical flavor are known to have low VC-dimension. In set systems of bounded VC-dimension, Hitting Set is known to admit efficient and almost optimal approximation algorithms (Brönnimann and Goodrich, 1995; Even, Rawitz, and Shahar, 2005; Agarwal and Pan, 2014). In contrast to these approximation-results, a low VC-dimension does not necessarily imply tractability in the parameterized sense. In fact, we show that Hitting Set is W[1]-hard already on inputs with VC-dimension 2, even if the VC-dimension of the dual set system is also 2. Thus, Hitting Set is very unlikely to be fixed-parameter tractable even in this arguably simple case. This answers an open question raised by King in 2010. For set systems whose (primal or dual) VC-dimension is 1, we show that Hitting Set is solvable in polynomial time. To bridge the gap in complexity between the classes of inputs with VC-dimension 1 and 2, we use a measure that is more fine-grained than VC-dimension. In terms of this measure, we identify a sharp threshold where the complexity of Hitting Set transitions from polynomial-time-solvable to NP-hard. The tractable class that lies just under the threshold is a generalization of Edge Cover, and thus extends the domain of polynomial-time tractability of Hitting Set.

preprint2016arXiv

Labeled compression schemes for extremal classes

It is a long-standing open problem whether there always exists a compression scheme whose size is of the order of the Vapnik-Chervonienkis (VC) dimension $d$. Recently compression schemes of size exponential in $d$ have been found for any concept class of VC dimension $d$. Previously, compression schemes of size $d$ have been given for maximum classes, which are special concept classes whose size equals an upper bound due to Sauer-Shelah. We consider a generalization of maximum classes called extremal classes. Their definition is based on a powerful generalization of the Sauer-Shelah bound called the Sandwich Theorem, which has been studied in several areas of combinatorics and computer science. The key result of the paper is a construction of a sample compression scheme for extremal classes of size equal to their VC dimension. We also give a number of open problems concerning the combinatorial structure of extremal classes and the existence of unlabeled compression schemes for them.

preprint2016arXiv

On statistical learning via the lens of compression

This work continues the study of the relationship between sample compression schemes and statistical learning, which has been mostly investigated within the framework of binary classification. The central theme of this work is establishing equivalences between learnability and compressibility, and utilizing these equivalences in the study of statistical learning theory. We begin with the setting of multiclass categorization (zero/one loss). We prove that in this case learnability is equivalent to compression of logarithmic sample size, and that uniform convergence implies compression of constant size. We then consider Vapnik's general learning setting: we show that in order to extend the compressibility-learnability equivalence to this case, it is necessary to consider an approximate variant of compression. Finally, we provide some applications of the compressibility-learnability equivalences: (i) Agnostic-case learnability and realizable-case learnability are equivalent in multiclass categorization problems (in terms of sample complexity). (ii) This equivalence between agnostic-case learnability and realizable-case learnability does not hold for general learning problems: There exists a learning problem whose loss function takes just three values, under which agnostic-case and realizable-case learnability are not equivalent. (iii) Uniform convergence implies compression of constant size in multiclass categorization problems. Part of the argument includes an analysis of the uniform convergence rate in terms of the graph dimension, in which we improve upon previous bounds. (iv) A dichotomy for sample compression in multiclass categorization problems: If a non-trivial compression exists then a compression of logarithmic size exists. (v) A compactness theorem for multiclass categorization problems.

preprint2016arXiv

Sign rank versus VC dimension

This work studies the maximum possible sign rank of $N \times N$ sign matrices with a given VC dimension $d$. For $d=1$, this maximum is {three}. For $d=2$, this maximum is $\tildeΘ(N^{1/2})$. For $d >2$, similar but slightly less accurate statements hold. {The lower bounds improve over previous ones by Ben-David et al., and the upper bounds are novel.} The lower bounds are obtained by probabilistic constructions, using a theorem of Warren in real algebraic topology. The upper bounds are obtained using a result of Welzl about spanning trees with low stabbing number, and using the moment curve. The upper bound technique is also used to: (i) provide estimates on the number of classes of a given VC dimension, and the number of maximum classes of a given VC dimension -- answering a question of Frankl from '89, and (ii) design an efficient algorithm that provides an $O(N/\log(N))$ multiplicative approximation for the sign rank. We also observe a general connection between sign rank and spectral gaps which is based on Forster's argument. Consider the $N \times N$ adjacency matrix of a $Δ$ regular graph with a second eigenvalue of absolute value $λ$ and $Δ\leq N/2$. We show that the sign rank of the signed version of this matrix is at least $Δ/λ$. We use this connection to prove the existence of a maximum class $C\subseteq\{\pm 1\}^N$ with VC dimension $2$ and sign rank $\tildeΘ(N^{1/2})$. This answers a question of Ben-David et al.~regarding the sign rank of large VC classes. We also describe limitations of this approach, in the spirit of the Alon-Boppana theorem. We further describe connections to communication complexity, geometry, learning theory, and combinatorics.

preprint2016arXiv

Teaching and compressing for low VC-dimension

In this work we study the quantitative relation between VC-dimension and two other basic parameters related to learning and teaching. Namely, the quality of sample compression schemes and of teaching sets for classes of low VC-dimension. Let $C$ be a binary concept class of size $m$ and VC-dimension $d$. Prior to this work, the best known upper bounds for both parameters were $\log(m)$, while the best lower bounds are linear in $d$. We present significantly better upper bounds on both as follows. Set $k = O(d 2^d \log \log |C|)$. We show that there always exists a concept $c$ in $C$ with a teaching set (i.e. a list of $c$-labeled examples uniquely identifying $c$ in $C$) of size $k$. This problem was studied by Kuhlmann (1999). Our construction implies that the recursive teaching (RT) dimension of $C$ is at most $k$ as well. The RT-dimension was suggested by Zilles et al. and Doliwa et al. (2010). The same notion (under the name partial-ID width) was independently studied by Wigderson and Yehudayoff (2013). An upper bound on this parameter that depends only on $d$ is known just for the very simple case $d=1$, and is open even for $d=2$. We also make small progress towards this seemingly modest goal. We further construct sample compression schemes of size $k$ for $C$, with additional information of $k \log(k)$ bits. Roughly speaking, given any list of $C$-labelled examples of arbitrary length, we can retain only $k$ labeled examples in a way that allows to recover the labels of all others examples in the list, using additional $k\log (k)$ information bits. This problem was first suggested by Littlestone and Warmuth (1986).

preprint2015arXiv

Matchings vs hitting sets among half-spaces in low dimensional euclidean spaces

Let $\mathcal{F}$ be any collection of linearly separable sets of a set $P$ of $n$ points either in $\mathbb{R}^2$, or in $\mathbb{R}^3$. We show that for every natural number $k$ either one can find $k$ pairwise disjoint sets in $\mathcal{F}$, or there are $O(k)$ points in $P$ that together hit all sets in $\mathcal{F}$. The proof is based on showing a similar result for families $\mathcal{F}$ of sets separable by pseudo-discs in $\mathbb{R}^2$. We complement these statements by showing that analogous result fails to hold for collections of linearly separable sets in $\mathbb{R}^4$ and higher dimensional euclidean spaces.

preprint2015arXiv

Node-balancing by edge-increments

Suppose you are given a graph $G=(V,E)$ with a weight assignment $w:V\rightarrow\mathbb{Z}$ and that your objective is to modify $w$ using legal steps such that all vertices will have the same weight, where in each legal step you are allowed to choose an edge and increment the weights of its end points by $1$. In this paper we study several variants of this problem for graphs and hypergraphs. On the combinatorial side we show connections with fundamental results from matching theory such as Hall's Theorem and Tutte's Theorem. On the algorithmic side we study the computational complexity of associated decision problems. Our main results are a characterization of the graphs for which any initial assignment can be balanced by edge-increments and a strongly polynomial-time algorithm that computes a balancing sequence of increments if one exists.

preprint2015arXiv

Sample compression schemes for VC classes

Sample compression schemes were defined by Littlestone and Warmuth (1986) as an abstraction of the structure underlying many learning algorithms. Roughly speaking, a sample compression scheme of size $k$ means that given an arbitrary list of labeled examples, one can retain only $k$ of them in a way that allows to recover the labels of all other examples in the list. They showed that compression implies PAC learnability for binary-labeled classes, and asked whether the other direction holds. We answer their question and show that every concept class $C$ with VC dimension $d$ has a sample compression scheme of size exponential in $d$. The proof uses an approximate minimax phenomenon for binary matrices of low VC dimension, which may be of interest in the context of game theory.

preprint2015arXiv

Simple and Optimal Randomized Fault-Tolerant Rumor Spreading

We revisit the classic problem of spreading a piece of information in a group of $n$ fully connected processors. By suitably adding a small dose of randomness to the protocol of Gasienic and Pelc (1996), we derive for the first time protocols that (i) use a linear number of messages, (ii) are correct even when an arbitrary number of adversarially chosen processors does not participate in the process, and (iii) with high probability have the asymptotically optimal runtime of $O(\log n)$ when at least an arbitrarily small constant fraction of the processors are working. In addition, our protocols do not require that the system is synchronized nor that all processors are simultaneously woken up at time zero, they are fully based on push-operations, and they do not need an a priori estimate on the number of failed nodes. Our protocols thus overcome the typical disadvantages of the two known approaches, algorithms based on random gossip (typically needing a large number of messages due to their unorganized nature) and algorithms based on fair workload splitting (which are either not {time-efficient} or require intricate preprocessing steps plus synchronization).

preprint2012arXiv

Shattering-Extremal Systems

The Shatters relation and the VC dimension have been investigated since the early seventies. These concepts have found numerous applications in statistics, combinatorics, learning theory and computational geometry. Shattering extremal systems are set-systems with a very rich structure and many different characterizations. The goal of this thesis is to elaborate on the structure of these systems.

preprint2012arXiv

Shattering, Graph Orientations, and Connectivity

We present a connection between two seemingly disparate fields: VC-theory and graph theory. This connection yields natural correspondences between fundamental concepts in VC-theory, such as shattering and VC-dimension, and well-studied concepts of graph theory related to connectivity, combinatorial optimization, forbidden subgraphs, and others. In one direction, we use this connection to derive results in graph theory. Our main tool is a generalization of the Sauer-Shelah Lemma. Using this tool we obtain a series of inequalities and equalities related to properties of orientations of a graph. Some of these results appear to be new, for others we give new and simple proofs. In the other direction, we present new illustrative examples of shattering-extremal systems - a class of set-systems in VC-theory whose understanding is considered by some authors to be incomplete. These examples are derived from properties of orientations related to distances and flows in networks.

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint

Fields this researcher appears in

Source provenance

Where this author record came from

arxivconfidence 95%

external id: arxiv:2605.09565:author:3:shay-moran

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.13684:author:3:shay-moran

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.13426:author:2:shay-moran

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2605.19625:author:2:shay-moran

Imported May 20, 2026Synced May 21, 2026

arxivconfidence 95%

external id: arxiv:2604.26922:author:3:shay-moran

Imported May 20, 2026Synced May 20, 2026

4 works

Amir Yehudayoff

Researcher

Amir Yehudayoff contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Noga Alon

Researcher

Noga Alon contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Steve Hanneke

Researcher

Steve Hanneke contributes to research discovery and scholarly infrastructure.

Open to collaborate

3 works

Yuval Filmus

Researcher

Yuval Filmus contributes to research discovery and scholarly infrastructure.

Open to collaborate

Shay Moran

What is connected

Connect this record

See the researcher in context

Building this map preview

31 published item(s)

On the Learning Curves of Revenue Maximization

Online Set Learning from Precision and Recall Feedback

Optimal Reconstruction from Linear Queries

Scale-Sensitive Shattering: Learnability and Evaluability at Optimal Scale

Strategic PAC Learnability via Geometric Definability

Active Learning with Label Comparisons

How Expressive Are Friendly School Partitions?

Integral Probability Metrics PAC-Bayes Bounds

Understanding Generalization via Leave-One-Out Conditional Mutual Information

Unlabeled sample compression schemes and corner peelings for ample and maximum classes

Adversarial Laws of Large Numbers and Optimal Regret in Online Classification

Online Learning with Simple Predictors and a Combinatorial Characterization of Minimax in 0/1 Games

A Sauer-Shelah-Perles Lemma for Lattices

Closure Properties for Private Classification and Online Prediction

Elementary Derivations of the Real Composition Algebras

Learning from Mixtures of Private and Public Populations

Online Agnostic Boosting via Regret Minimization

Private Query Release Assisted by Public Data

Proper Learning, Helly Number, and an Optimal SVM Bound

Shattered Sets and the Hilbert Function

Hitting Set for hypergraphs of low VC-dimension

Labeled compression schemes for extremal classes

On statistical learning via the lens of compression

Sign rank versus VC dimension

Teaching and compressing for low VC-dimension

Matchings vs hitting sets among half-spaces in low dimensional euclidean spaces

Node-balancing by edge-increments

Sample compression schemes for VC classes

Simple and Optimal Randomized Fault-Tolerant Rumor Spreading

Shattering-Extremal Systems

Shattering, Graph Orientations, and Connectivity