Source author record

Amit Daniely

Amit Daniely appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computational Complexity Artificial Intelligence Computer Science and Game Theory Data Structures and Algorithms Discrete Mathematics astro-ph.EP astro-ph.IM astro-ph.SR math.CO math.OC physics.chem-ph

Catalog footprint

What is connected

27works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Deep Networks Learn Deep Hierarchical Models

We consider supervised learning with $n$ labels and show that layerwise SGD on residual networks can efficiently learn a class of hierarchical models. This model class assumes the existence of an (unknown) label hierarchy $L_1 \subseteq L_2 \subseteq \dots \subseteq L_r = [n]$, where labels in $L_1$ are simple functions of the input, while for $i > 1$, labels in $L_i$ are simple functions of simpler labels. Our class surpasses models that were previously shown to be learnable by deep learning algorithms, in the sense that it reaches the depth limit of efficient learnability. That is, there are models in this class that require polynomial depth to express, whereas previous models can be computed by log-depth circuits. Furthermore, we suggest that learnability of such hierarchical models might eventually form a basis for understanding deep learning. Beyond their natural fit for domains where deep learning excels, we argue that the mere existence of human ``teachers" supports the hypothesis that hierarchical structures are inherently available. By providing granular labels, teachers effectively reveal ``hints'' or ``snippets'' of the internal algorithms used by the brain. We formalize this intuition, showing that in a simplified model where a teacher is partially aware of their internal logic, a hierarchical structure emerges that facilitates efficient learnability.

preprint2026arXiv

Functionalization of Benzene Ices by Atomic Oxygen

Small aromatic molecules, including functionalized derivatives of benzene, are known to be present throughout the different stages of star and planet formation. In particular, oxygen-bearing monosubstituted aromatics, likely including phenol, have been identified in the coma of comet 67P. This suggests that, earlier in the star and planet formation evolution, icy grains may act as both reservoirs and sites of functionalization for these small aromatics. We investigate the ice-phase reactivity of singlet oxygen atoms (O($^1$D)) with benzene, using ozone as a precursor that is readily photodissociated by relatively low-energy. Our experiments show that O($^1$D) efficiently reacts with benzene, forming phenol, benzene oxide, and oxepine as the main products. Phenol formation is temperature-independent, consistent with a barrierless insertion mechanism. In contrast, the formation of benzene oxide/oxepine shows a slight temperature dependence, suggesting that additional reaction pathways involving either ground-state or excited-state oxygen atoms may contribute. In H$_2$O and \COO ice matrices we find that dilution does not suppress formation of phenol. We extrapolate an experimental upper limit for the benzene-to-phenol conversion fraction of 27-44$\%$ during the lifetime of an interstellar cloud, assuming O($^1$D) production rates based on CO$_2$ ice abundances and a cosmic-ray induced UV field. We compare these estimates with a new analysis of data from the comet 67P, where the C$_6$H$_6$O/C$_6$H$_6$ ratio is 20$\pm$6$\%$. This value lies within our estimated range, suggesting that O($^1$D)-mediated chemistry is a viable pathway for producing oxygenated aromatics in cold astrophysical ices, potentially enriching icy planetesimals with phenol and other biorelevant compounds.

preprint2021arXiv

Planning and Learning with Stochastic Action Sets

In many practical uses of reinforcement learning (RL) the set of actions available at a given state is a random variable, with realizations governed by an exogenous stochastic process. Somewhat surprisingly, the foundations for such sequential decision processes have been unaddressed. In this work, we formalize and investigate MDPs with stochastic action sets (SAS-MDPs) to provide these foundations. We show that optimal policies and value functions in this model have a structure that admits a compact representation. From an RL perspective, we show that Q-learning with sampled action sets is sound. In model-based settings, we consider two important special cases: when individual actions are available with independent probabilities; and a sampling-based model for unknown distributions. We develop poly-time value and policy iteration methods for both cases; and in the first, we offer a poly-time linear programming solution.

preprint2020arXiv

Learning Parities with Neural Networks

In recent years we see a rapidly growing line of research which shows learnability of various models via common neural network algorithms. Yet, besides a very few outliers, these results show learnability of models that can be learned using linear methods. Namely, such results show that learning neural-networks with gradient-descent is competitive with learning a linear classifier on top of a data-independent representation of the examples. This leaves much to be desired, as neural networks are far more successful than linear methods. Furthermore, on the more conceptual level, linear models don't seem to capture the "deepness" of deep networks. In this paper we make a step towards showing leanability of models that are inherently non-linear. We show that under certain distributions, sparse parities are learnable via gradient decent on depth-two network. On the other hand, under the same distributions, these parities cannot be learned efficiently by linear methods.

preprint2020arXiv

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

We prove that a single step of gradient decent over depth two network, with $q$ hidden neurons, starting from orthogonal initialization, can memorize $Ω\left(\frac{dq}{\log^4(d)}\right)$ independent and randomly labeled Gaussians in $\mathbb{R}^d$. The result is valid for a large class of activation functions, which includes the absolute value.

preprint2020arXiv

On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

Recent advances in randomized incremental methods for minimizing $L$-smooth $μ$-strongly convex finite sums have culminated in tight complexity of $\tilde{O}((n+\sqrt{n L/μ})\log(1/ε))$ and $O(n+\sqrt{nL/ε})$, where $μ>0$ and $μ=0$, respectively, and $n$ denotes the number of individual functions. Unlike incremental methods, stochastic methods for finite sums do not rely on an explicit knowledge of which individual function is being addressed at each iteration, and as such, must perform at least $Ω(n^2)$ iterations to obtain $O(1/n^2)$-optimal solutions. In this work, we exploit the finite noise structure of finite sums to derive a matching $O(n^2)$-upper bound under the global oracle model, showing that this lower bound is indeed tight. Following a similar approach, we propose a novel adaptation of SVRG which is both \emph{compatible with stochastic oracles}, and achieves complexity bounds of $\tilde{O}((n^2+n\sqrt{L/μ})\log(1/ε))$ and $O(n\sqrt{L/ε})$, for $μ>0$ and $μ=0$, respectively. Our bounds hold w.h.p. and match in part existing lower bounds of $\tildeΩ(n^2+\sqrt{nL/μ}\log(1/ε))$ and $\tildeΩ(n^2+\sqrt{nL/ε})$, for $μ>0$ and $μ=0$, respectively.

preprint2020arXiv

On the Optimality of Trees Generated by ID3

Since its inception in the 1980s, ID3 has become one of the most successful and widely used algorithms for learning decision trees. However, its theoretical properties remain poorly understood. In this work, we introduce a novel metric of a decision tree algorithm's performance, called mean iteration statistical consistency (MIC), which measures optimality of trees generated by ID3. As opposed to previous metrics, MIC can differentiate between different decision tree algorithms and compare their performance. We provide theoretical and empirical evidence that the TopDown variant of ID3, introduced by Kearns and Mansour (1996), has near-optimal MIC in various settings for learning read-once DNFs under product distributions. In contrast, another widely used variant of ID3 has MIC which is not near-optimal. We show that the MIC analysis predicts well the performance of these algorithms in practice. Our results present a novel view of decision tree algorithms which may lead to better and more practical guarantees for these algorithms.

preprint2019arXiv

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

A leading hypothesis for the surprising generalization of neural networks is that the dynamics of gradient descent bias the model towards simple solutions, by searching through the solution space in an incremental order of complexity. We formally define the notion of incremental learning dynamics and derive the conditions on depth and initialization for which this phenomenon arises in deep linear models. Our main theoretical contribution is a dynamical depth separation result, proving that while shallow models can exhibit incremental learning dynamics, they require the initialization to be exponentially small for these dynamics to present themselves. However, once the model becomes deeper, the dependence becomes polynomial and incremental learning can arise in more natural settings. We complement our theoretical findings by experimenting with deep matrix sensing, quadratic neural networks and with binary classification using diagonal and convolutional linear networks, showing all of these models exhibit incremental learning.

preprint2016arXiv

Behavior-Based Machine-Learning: A Hybrid Approach for Predicting Human Decision Making

A large body of work in behavioral fields attempts to develop models that describe the way people, as opposed to rational agents, make decisions. A recent Choice Prediction Competition (2015) challenged researchers to suggest a model that captures 14 classic choice biases and can predict human decisions under risk and ambiguity. The competition focused on simple decision problems, in which human subjects were asked to repeatedly choose between two gamble options. In this paper we present our approach for predicting human decision behavior: we suggest to use machine learning algorithms with features that are based on well-established behavioral theories. The basic idea is that these psychological features are essential for the representation of the data and are important for the success of the learning process. We implement a vanilla model in which we train SVM models using behavioral features that rely on the psychological properties underlying the competition baseline model. We show that this basic model captures the 14 choice biases and outperforms all the other learning-based models in the competition. The preliminary results suggest that such hybrid models can significantly improve the prediction of human decision making, and are a promising direction for future research.

preprint2016arXiv

Complexity Theoretic Limitations on Learning Halfspaces

We study the problem of agnostically learning halfspaces which is defined by a fixed but unknown distribution $\mathcal{D}$ on $\mathbb{Q}^n\times \{\pm 1\}$. We define $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D})$ as the least error of a halfspace classifier for $\mathcal{D}$. A learner who can access $\mathcal{D}$ has to return a hypothesis whose error is small compared to $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D})$. Using the recently developed method of the author, Linial and Shalev-Shwartz we prove hardness of learning results under a natural assumption on the complexity of refuting random $K$-$\mathrm{XOR}$ formulas. We show that no efficient learning algorithm has non-trivial worst-case performance even under the guarantees that $\mathrm{Err}_{\mathrm{HALF}}(\mathcal{D}) \le η$ for arbitrarily small constant $η>0$, and that $\mathcal{D}$ is supported in $\{\pm 1\}^n\times \{\pm 1\}$. Namely, even under these favorable conditions its error must be $\ge \frac{1}{2}-\frac{1}{n^c}$ for every $c>0$. In particular, no efficient algorithm can achieve a constant approximation ratio. Under a stronger version of the assumption (where $K$ can be poly-logarithmic in $n$), we can take $η= 2^{-\log^{1-ν}(n)}$ for arbitrarily small $ν>0$. Interestingly, this is even stronger than the best known lower bounds (Arora et. al. 1993, Feldamn et. al. 2006, Guruswami and Raghavendra 2006) for the case that the learner is restricted to return a halfspace classifier (i.e. proper learning).

preprint2016arXiv

Distribution Free Learning with Local Queries

The model of learning with \emph{local membership queries} interpolates between the PAC model and the membership queries model by allowing the learner to query the label of any example that is similar to an example in the training set. This model, recently proposed and studied by Awasthi, Feldman and Kanade, aims to facilitate practical use of membership queries. We continue this line of work, proving both positive and negative results in the {\em distribution free} setting. We restrict to the boolean cube $\{-1, 1\}^n$, and say that a query is $q$-local if it is of a hamming distance $\le q$ from some training example. On the positive side, we show that $1$-local queries already give an additional strength, and allow to learn a certain type of DNF formulas. On the negative side, we show that even $\left(n^{0.99}\right)$-local queries cannot help to learn various classes including Automata, DNFs and more. Likewise, $q$-local queries for any constant $q$ cannot help to learn Juntas, Decision Trees, Sparse Polynomials and more. Moreover, for these classes, an algorithm that uses $\left(\log^{0.99}(n)\right)$-local queries would lead to a breakthrough in the best known running times.

preprint2016arXiv

Sketching and Neural Networks

High-dimensional sparse data present computational and statistical challenges for supervised learning. We propose compact linear sketches for reducing the dimensionality of the input, followed by a single layer neural network. We show that any sparse polynomial function can be computed, on nearly all sparse binary vectors, by a single layer neural network that takes a compact sketch of the vector as input. Consequently, when a set of sparse binary vectors is approximately separable using a sparse polynomial, there exists a single-layer neural network that takes a short sketch as input and correctly classifies nearly all the points. Previous work has proposed using sketches to reduce dimensionality while preserving the hypothesis class. However, the sketch size has an exponential dependence on the degree in the case of polynomial classifiers. In stark contrast, our approach of using improper learning, using a larger hypothesis class allows the sketch size to have a logarithmic dependence on the degree. Even in the linear case, our approach allows us to improve on the pesky $O({1}/{γ^2})$ dependence of random projections, on the margin $γ$. We empirically show that our approach leads to more compact neural networks than related methods such as feature hashing at equal or better performance.

preprint2015arXiv

A PTAS for Agnostically Learning Halfspaces

We present a PTAS for agnostically learning halfspaces w.r.t. the uniform distribution on the $d$ dimensional sphere. Namely, we show that for every $μ>0$ there is an algorithm that runs in time $\mathrm{poly}(d,\frac{1}ε)$, and is guaranteed to return a classifier with error at most $(1+μ)\mathrm{opt}+ε$, where $\mathrm{opt}$ is the error of the best halfspace classifier. This improves on Awasthi, Balcan and Long [ABL14] who showed an algorithm with an (unspecified) constant approximation ratio. Our algorithm combines the classical technique of polynomial regression (e.g. [LMN89, KKMS05]), together with the new localization technique of [ABL14].

preprint2015arXiv

Inapproximability of Truthful Mechanisms via Generalizations of the VC Dimension

Algorithmic mechanism design (AMD) studies the delicate interplay between computational efficiency, truthfulness, and optimality. We focus on AMD's paradigmatic problem: combinatorial auctions. We present a new generalization of the VC dimension to multivalued collections of functions, which encompasses the classical VC dimension, Natarajan dimension, and Steele dimension. We present a corresponding generalization of the Sauer-Shelah Lemma and harness this VC machinery to establish inapproximability results for deterministic truthful mechanisms. Our results essentially unify all inapproximability results for deterministic truthful mechanisms for combinatorial auctions to date and establish new separation gaps between truthful and non-truthful algorithms.

preprint2015arXiv

Strongly Adaptive Online Learning

Strongly adaptive algorithms are algorithms whose performance on every time interval is close to optimal. We present a reduction that can transform standard low-regret algorithms to strongly adaptive. As a consequence, we derive simple, yet efficient, strongly adaptive algorithms for a handful of problems.

preprint2014arXiv

Complexity theoretic limitations on learning DNF's

Using the recently developed framework of [Daniely et al, 2014], we show that under a natural assumption on the complexity of refuting random K-SAT formulas, learning DNF formulas is hard. Furthermore, the same assumption implies the hardness of learning intersections of $ω(\log(n))$ halfspaces, agnostically learning conjunctions, as well as virtually all (distribution free) learning problems that were previously shown hard (under complexity assumptions).

preprint2014arXiv

From average case complexity to improper learning complexity

The basic problem in the PAC model of computational learning theory is to determine which hypothesis classes are efficiently learnable. There is presently a dearth of results showing hardness of learning problems. Moreover, the existing lower bounds fall short of the best known algorithms. The biggest challenge in proving complexity results is to establish hardness of {\em improper learning} (a.k.a. representation independent learning).The difficulty in proving lower bounds for improper learning is that the standard reductions from $\mathbf{NP}$-hard problems do not seem to apply in this context. There is essentially only one known approach to proving lower bounds on improper learning. It was initiated in (Kearns and Valiant 89) and relies on cryptographic assumptions. We introduce a new technique for proving hardness of improper learning, based on reductions from problems that are hard on average. We put forward a (fairly strong) generalization of Feige's assumption (Feige 02) about the complexity of refuting random constraint satisfaction problems. Combining this assumption with our new technique yields far reaching implications. In particular, 1. Learning $\mathrm{DNF}$'s is hard. 2. Agnostically learning halfspaces with a constant approximation ratio is hard. 3. Learning an intersection of $ω(1)$ halfspaces is hard.

preprint2014arXiv

Learning Economic Parameters from Revealed Preferences

A recent line of work, starting with Beigman and Vohra (2006) and Zadimoghaddam and Roth (2012), has addressed the problem of {\em learning} a utility function from revealed preference data. The goal here is to make use of past data describing the purchases of a utility maximizing agent when faced with certain prices and budget constraints in order to produce a hypothesis function that can accurately forecast the {\em future} behavior of the agent. In this work we advance this line of work by providing sample complexity guarantees and efficient algorithms for a number of important classes. By drawing a connection to recent advances in multi-class learning, we provide a computationally efficient algorithm with tight sample complexity guarantees ($Θ(d/ε)$ for the case of $d$ goods) for learning linear utility functions under a linear price model. This solves an open question in Zadimoghaddam and Roth (2012). Our technique yields numerous generalizations including the ability to learn other well-studied classes of utility functions, to deal with a misspecified model, and with non-linear prices.

preprint2014arXiv

Multiclass learnability and the ERM principle

We study the sample complexity of multiclass prediction in several learning settings. For the PAC setting our analysis reveals a surprising phenomenon: In sharp contrast to binary classification, we show that there exist multiclass hypothesis classes for which some Empirical Risk Minimizers (ERM learners) have lower sample complexity than others. Furthermore, there are classes that are learnable by some ERM learners, while other ERM learners will fail to learn them. We propose a principle for designing good ERM learners, and use this principle to prove tight bounds on the sample complexity of learning {\em symmetric} multiclass hypothesis classes---classes that are invariant under permutations of label names. We further provide a characterization of mistake and regret bounds for multiclass learning in the online setting and the bandit setting, using new generalizations of Littlestone's dimension.

preprint2014arXiv

Optimal Learners for Multiclass Problems

The fundamental theorem of statistical learning states that for binary classification problems, any Empirical Risk Minimization (ERM) learning rule has close to optimal sample complexity. In this paper we seek for a generic optimal learner for multiclass prediction. We start by proving a surprising result: a generic optimal multiclass learner must be improper, namely, it must have the ability to output hypotheses which do not belong to the hypothesis class, even though it knows that all the labels are generated by some hypothesis from the class. In particular, no ERM learner is optimal. This brings back the fundmamental question of "how to learn"? We give a complete answer to this question by giving a new analysis of the one-inclusion multiclass learner of Rubinstein et al (2006) showing that its sample complexity is essentially optimal. Then, we turn to study the popular hypothesis class of generalized linear classifiers. We derive optimal learners that, unlike the one-inclusion algorithm, are computationally efficient. Furthermore, we show that the sample complexity of these learners is better than the sample complexity of the ERM rule, thus settling in negative an open question due to Collins (2005).

preprint2014arXiv

The complexity of learning halfspaces using generalized linear methods

Many popular learning algorithms (E.g. Regression, Fourier-Transform based algorithms, Kernel SVM and Kernel ridge regression) operate by reducing the problem to a convex optimization problem over a vector space of functions. These methods offer the currently best approach to several central problems such as learning half spaces and learning DNF's. In addition they are widely used in numerous application domains. Despite their importance, there are still very few proof techniques to show limits on the power of these algorithms. We study the performance of this approach in the problem of (agnostically and improperly) learning halfspaces with margin $γ$. Let $\mathcal{D}$ be a distribution over labeled examples. The $γ$-margin error of a hyperplane $h$ is the probability of an example to fall on the wrong side of $h$ or at a distance $\leγ$ from it. The $γ$-margin error of the best $h$ is denoted $\mathrm{Err}_γ(\mathcal{D})$. An $α(γ)$-approximation algorithm receives $γ,ε$ as input and, using i.i.d. samples of $\mathcal{D}$, outputs a classifier with error rate $\le α(γ)\mathrm{Err}_γ(\mathcal{D}) + ε$. Such an algorithm is efficient if it uses $\mathrm{poly}(\frac{1}γ,\frac{1}ε)$ samples and runs in time polynomial in the sample size. The best approximation ratio achievable by an efficient algorithm is $O\left(\frac{1/γ}{\sqrt{\log(1/γ)}}\right)$ and is achieved using an algorithm from the above class. Our main result shows that the approximation ratio of every efficient algorithm from this family must be $\ge Ω\left(\frac{1/γ}{\mathrm{poly}\left(\log\left(1/γ\right)\right)}\right)$, essentially matching the best known upper bound.

preprint2013arXiv

More data speeds up training time in learning halfspaces over sparse vectors

The increased availability of data in recent years has led several authors to ask whether it is possible to use data as a {\em computational} resource. That is, if more data is available, beyond the sample complexity limit, is it possible to use the extra examples to speed up the computation time required to perform the learning task? We give the first positive answer to this question for a {\em natural supervised learning problem} --- we consider agnostic PAC learning of halfspaces over $3$-sparse vectors in $\{-1,1,0\}^n$. This class is inefficiently learnable using $O\left(n/ε^2\right)$ examples. Our main contribution is a novel, non-cryptographic, methodology for establishing computational-statistical gaps, which allows us to show that, under a widely believed assumption that refuting random $\mathrm{3CNF}$ formulas is hard, it is impossible to efficiently learn this class using only $O\left(n/ε^2\right)$ examples. We further show that under stronger hardness assumptions, even $O\left(n^{1.499}/ε^2\right)$ examples do not suffice. On the other hand, we show a new algorithm that learns this class efficiently using $\tildeΩ\left(n^2/ε^2\right)$ examples. This formally establishes the tradeoff between sample and computational complexity for a natural supervised learning problem.

preprint2013arXiv

The price of bandit information in multiclass online classification

We consider two scenarios of multiclass online learning of a hypothesis class $H\subseteq Y^X$. In the {\em full information} scenario, the learner is exposed to instances together with their labels. In the {\em bandit} scenario, the true label is not exposed, but rather an indication whether the learner's prediction is correct or not. We show that the ratio between the error rates in the two scenarios is at most $8\cdot|Y|\cdot \log(|Y|)$ in the realizable case, and $\tilde{O}(\sqrt{|Y|})$ in the agnostic case. The results are tight up to a logarithmic factor and essentially answer an open question from (Daniely et. al. - Multiclass learnability and the erm principle). We apply these results to the class of $γ$-margin multiclass linear classifiers in $\reals^d$. We show that the bandit error rate of this class is $\tildeΘ(\frac{|Y|}{γ^2})$ in the realizable case and $\tildeΘ(\frac{1}γ\sqrt{|Y|T})$ in the agnostic case. This resolves an open question from (Kakade et. al. - Efficient bandit algorithms for online multiclass prediction).

preprint2012arXiv

Clustering is difficult only when it does not matter

Numerous papers ask how difficult it is to cluster data. We suggest that the more relevant and interesting question is how difficult it is to cluster data sets {\em that can be clustered well}. More generally, despite the ubiquity and the great importance of clustering, we still do not have a satisfactory mathematical theory of clustering. In order to properly understand clustering, it is clearly necessary to develop a solid theoretical basis for the area. For example, from the perspective of computational complexity theory the clustering problem seems very hard. Numerous papers introduce various criteria and numerical measures to quantify the quality of a given clustering. The resulting conclusions are pessimistic, since it is computationally difficult to find an optimal clustering of a given data set, if we go by any of these popular criteria. In contrast, the practitioners' perspective is much more optimistic. Our explanation for this disparity of opinions is that complexity theory concentrates on the worst case, whereas in reality we only care for data sets that can be clustered well. We introduce a theoretical framework of clustering in metric spaces that revolves around a notion of "good clustering". We show that if a good clustering exists, then in many cases it can be efficiently found. Our conclusion is that contrary to popular belief, clustering should not be considered a hard task.

preprint2012arXiv

Multiclass Learning Approaches: A Theoretical Comparison with Implications

We theoretically analyze and compare the following five popular multiclass classification methods: One vs. All, All Pairs, Tree-based classifiers, Error Correcting Output Codes (ECOC) with randomly generated code matrices, and Multiclass SVM. In the first four methods, the classification is based on a reduction to binary classification. We consider the case where the binary classifier comes from a class of VC dimension $d$, and in particular from the class of halfspaces over $\reals^d$. We analyze both the estimation error and the approximation error of these methods. Our analysis reveals interesting conclusions of practical relevance, regarding the success of the different approaches under various conditions. Our proof technique employs tools from VC theory to analyze the \emph{approximation error} of hypothesis classes. This is in sharp contrast to most, if not all, previous uses of VC theory, which only deal with estimation error.

preprint2012arXiv

On the practically interesting instances of MAXCUT

The complexity of a computational problem is traditionally quantified based on the hardness of its worst case. This approach has many advantages and has led to a deep and beautiful theory. However, from the practical perspective, this leaves much to be desired. In application areas, practically interesting instances very often occupy just a tiny part of an algorithm's space of instances, and the vast majority of instances are simply irrelevant. Addressing these issues is a major challenge for theoretical computer science which may make theory more relevant to the practice of computer science. Following Bilu and Linial, we apply this perspective to MAXCUT, viewed as a clustering problem. Using a variety of techniques, we investigate practically interesting instances of this problem. Specifically, we show how to solve in polynomial time distinguished, metric, expanding and dense instances of MAXCUT under mild stability assumptions. In particular, $(1+ε)$-stability (which is optimal) suffices for metric and dense MAXCUT. We also show how to solve in polynomial time $Ω(\sqrt{n})$-stable instances of MAXCUT, substantially improving the best previously known result.

preprint2012arXiv

Tight products and Expansion

In this paper we study a new product of graphs called {\em tight product}. A graph $H$ is said to be a tight product of two (undirected multi) graphs $G_1$ and $G_2$, if $V(H)=V(G_1)\times V(G_2)$ and both projection maps $V(H)\to V(G_1)$ and $V(H)\to V(G_2)$ are covering maps. It is not a priori clear when two given graphs have a tight product (in fact, it is $NP$-hard to decide). We investigate the conditions under which this is possible. This perspective yields a new characterization of class-1 $(2k+1)$-regular graphs. We also obtain a new model of random $d$-regular graphs whose second eigenvalue is almost surely at most $O(d^{3/4})$. This construction resembles random graph lifts, but requires fewer random bits.

Amit Daniely

What is connected

Connect this record

See the researcher in context

Building this map preview

27 published item(s)

Deep Networks Learn Deep Hierarchical Models

Functionalization of Benzene Ices by Atomic Oxygen

Planning and Learning with Stochastic Action Sets

Learning Parities with Neural Networks

Memorizing Gaussians with no over-parameterizaion via gradient decent on neural networks

On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions

On the Optimality of Trees Generated by ID3

The Implicit Bias of Depth: How Incremental Learning Drives Generalization

Behavior-Based Machine-Learning: A Hybrid Approach for Predicting Human Decision Making

Complexity Theoretic Limitations on Learning Halfspaces

Distribution Free Learning with Local Queries

Sketching and Neural Networks

A PTAS for Agnostically Learning Halfspaces

Inapproximability of Truthful Mechanisms via Generalizations of the VC Dimension

Strongly Adaptive Online Learning

Complexity theoretic limitations on learning DNF's

From average case complexity to improper learning complexity

Learning Economic Parameters from Revealed Preferences

Multiclass learnability and the ERM principle

Optimal Learners for Multiclass Problems

The complexity of learning halfspaces using generalized linear methods

More data speeds up training time in learning halfspaces over sparse vectors

The price of bandit information in multiclass online classification

Clustering is difficult only when it does not matter

Multiclass Learning Approaches: A Theoretical Comparison with Implications

On the practically interesting instances of MAXCUT

Tight products and Expansion