Source author record

Daniel Kane

Daniel Kane appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computational Complexity Data Structures and Algorithms math.ST Statistics Theory Artificial Intelligence Computational Geometry Computer Science and Game Theory Cryptography and Security Discrete Mathematics Distributed, Parallel, and Cluster Computing Information Theory math.FA math.IT math.OC Multiagent Systems

Catalog footprint

What is connected

13works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Computational-Statistical Gaps in Reinforcement Learning

Reinforcement learning with function approximation has recently achieved tremendous results in applications with large state spaces. This empirical success has motivated a growing body of theoretical work proposing necessary and sufficient conditions under which efficient reinforcement learning is possible. From this line of work, a remarkably simple minimal sufficient condition has emerged for sample efficient reinforcement learning: MDPs with optimal value function $V^*$ and $Q^*$ linear in some known low-dimensional features. In this setting, recent works have designed sample efficient algorithms which require a number of samples polynomial in the feature dimension and independent of the size of state space. They however leave finding computationally efficient algorithms as future work and this is considered a major open problem in the community. In this work, we make progress on this open problem by presenting the first computational lower bound for RL with linear function approximation: unless NP=RP, no randomized polynomial time algorithm exists for deterministic transition MDPs with a constant number of actions and linear optimal value functions. To prove this, we show a reduction from Unique-Sat, where we convert a CNF formula into an MDP with deterministic transitions, constant number of actions and low dimensional linear optimal value functions. This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation, as the underlying statistical problem is information-theoretically solvable with a polynomial number of queries, but no computationally efficient algorithm exists unless NP=RP. Finally, we also prove a quasi-polynomial time lower bound under the Randomized Exponential Time Hypothesis.

preprint2022arXiv

Coresets for Data Discretization and Sine Wave Fitting

In the \emph{monitoring} problem, the input is an unbounded stream $P={p_1,p_2\cdots}$ of integers in $[N]:=\{1,\cdots,N\}$, that are obtained from a sensor (such as GPS or heart beats of a human). The goal (e.g., for anomaly detection) is to approximate the $n$ points received so far in $P$ by a single frequency $\sin$, e.g. $\min_{c\in C}cost(P,c)+λ(c)$, where $cost(P,c)=\sum_{i=1}^n \sin^2(\frac{2π}{N} p_ic)$, $C\subseteq [N]$ is a feasible set of solutions, and $λ$ is a given regularization function. For any approximation error $\varepsilon>0$, we prove that \emph{every} set $P$ of $n$ integers has a weighted subset $S\subseteq P$ (sometimes called core-set) of cardinality $|S|\in O(\log(N)^{O(1)})$ that approximates $cost(P,c)$ (for every $c\in [N]$) up to a multiplicative factor of $1\pm\varepsilon$. Using known coreset techniques, this implies streaming algorithms using only $O((\log(N)\log(n))^{O(1)})$ memory. Our results hold for a large family of functions. Experimental results and open source code are provided.

preprint2022arXiv

Fooling Gaussian PTFs via Local Hyperconcentration

We give a pseudorandom generator that fools degree-$d$ polynomial threshold functions over $n$-dimensional Gaussian space with seed length $\mathrm{poly}(d)\cdot \log n$. All previous generators had a seed length with at least a $2^d$ dependence on $d$. The key new ingredient is a Local Hyperconcentration Theorem, which shows that every degree-$d$ Gaussian polynomial is hyperconcentrated almost everywhere at scale $d^{-O(1)}$.

preprint2022arXiv

Sampling Equilibria: Fast No-Regret Learning in Structured Games

Learning and equilibrium computation in games are fundamental problems across computer science and economics, with applications ranging from politics to machine learning. Much of the work in this area revolves around a simple algorithm termed \emph{randomized weighted majority} (RWM), also known as "Hedge" or "Multiplicative Weights Update," which is well known to achieve statistically optimal rates in adversarial settings (Littlestone and Warmuth '94, Freund and Schapire '99). Unfortunately, RWM comes with an inherent computational barrier: it requires maintaining and sampling from a distribution over all possible actions. In typical settings of interest the action space is exponentially large, seemingly rendering RWM useless in practice. In this work, we refute this notion for a broad variety of \emph{structured} games, showing it is possible to efficiently (approximately) sample the action space in RWM in \emph{polylogarithmic} time. This gives the first efficient no-regret algorithms for problems such as the \emph{(discrete) Colonel Blotto game}, \emph{matroid congestion}, \emph{matroid security}, and basic \emph{dueling games}. As an immediate corollary, we give a polylogarithmic time meta-algorithm to compute approximate Nash Equilibria for these games that is exponentially faster than prior methods in several important settings. Further, our algorithm is the first to efficiently compute equilibria for more involved variants of these games with general sums, more than two players, and, for Colonel Blotto, multiple resource types.

preprint2021arXiv

Bounded Memory Active Learning through Enriched Queries

The explosive growth of easily-accessible unlabeled data has lead to growing interest in active learning, a paradigm in which data-hungry learning algorithms adaptively select informative examples in order to lower prohibitively expensive labeling costs. Unfortunately, in standard worst-case models of learning, the active setting often provides no improvement over non-adaptive algorithms. To combat this, a series of recent works have considered a model in which the learner may ask enriched queries beyond labels. While such models have seen success in drastically lowering label costs, they tend to come at the expense of requiring large amounts of memory. In this work, we study what families of classifiers can be learned in bounded memory. To this end, we introduce a novel streaming-variant of enriched-query active learning along with a natural combinatorial parameter called lossless sample compression that is sufficient for learning not only with bounded memory, but in a query-optimal and computationally efficient manner as well. Finally, we give three fundamental examples of classifier families with small, easy to compute lossless compression schemes when given access to basic enriched queries: axis-aligned rectangles, decision trees, and halfspaces in two dimensions.

preprint2021arXiv

Highway: Efficient Consensus with Flexible Finality

There has been recently a lot of progress in designing efficient partially synchronous BFT consensus protocols that are meant to serve as core consensus engines for Proof of Stake blockchain systems. While the state-of-the-art solutions attain virtually optimal performance under this theoretical model, there is still room for improvement, as several practical aspects of such systems are not captured by this model. Most notably, during regular execution, due to financial incentives in such systems, one expects an overwhelming fraction of nodes to honestly follow the protocol rules and only few of them to be faulty, most likely due to temporary network issues. Intuitively, the fact that almost all nodes behave honestly should result in stronger confidence in blocks finalized in such periods, however it is not the case under the classical model, where finality is binary. We propose Highway, a new consensus protocol that is safe and live in the classical partially synchronous BFT model, while at the same time offering practical improvements over existing solutions. Specifically, block finality in Highway is not binary but is expressed by fraction of nodes that would need to break the protocol rules in order for a block to be reverted. During periods of honest participation finality of blocks might reach well beyond 1/3 (as what would be the maximum for classical protocols), up to even 1 (complete certainty). Having finality defined this way, Highway offers flexibility with respect to the configuration of security thresholds among nodes running the protocol, allowing nodes with lower thresholds to reach finality faster than the ones requiring higher levels of confidence.

preprint2020arXiv

Noise-tolerant, Reliable Active Classification with Comparison Queries

With the explosion of massive, widely available unlabeled data in the past years, finding label and time efficient, robust learning algorithms has become ever more important in theory and in practice. We study the paradigm of active learning, in which algorithms with access to large pools of data may adaptively choose what samples to label in the hope of exponentially increasing efficiency. By introducing comparisons, an additional type of query comparing two points, we provide the first time and query efficient algorithms for learning non-homogeneous linear separators robust to bounded (Massart) noise. We further provide algorithms for a generalization of the popular Tsybakov low noise condition, and show how comparisons provide a strong reliability guarantee that is often impractical or impossible with only labels - returning a classifier that makes no errors with high probability.

preprint2020arXiv

Robustly Learning any Clusterable Mixture of Gaussians

We study the efficient learnability of high-dimensional Gaussian mixtures in the outlier-robust setting, where a small constant fraction of the data is adversarially corrupted. We resolve the polynomial learnability of this problem when the components are pairwise separated in total variation distance. Specifically, we provide an algorithm that, for any constant number of components $k$, runs in polynomial time and learns the components of an $ε$-corrupted $k$-mixture within information theoretically near-optimal error of $\tilde{O}(ε)$, under the assumption that the overlap between any pair of components $P_i, P_j$ (i.e., the quantity $1-TV(P_i, P_j)$) is bounded by $\mathrm{poly}(ε)$. Our separation condition is the qualitatively weakest assumption under which accurate clustering of the samples is possible. In particular, it allows for components with arbitrary covariances and for components with identical means, as long as their covariances differ sufficiently. Ours is the first polynomial time algorithm for this problem, even for $k=2$. Our algorithm follows the Sum-of-Squares based proofs to algorithms approach. Our main technical contribution is a new robust identifiability proof of clusters from a Gaussian mixture, which can be captured by the constant-degree Sum of Squares proof system. The key ingredients of this proof are a novel use of SoS-certifiable anti-concentration and a new characterization of pairs of Gaussians with small (dimension-independent) overlap in terms of their parameter distance.

preprint2020arXiv

Testing Bayesian Networks

This work initiates a systematic investigation of testing high-dimensional structured distributions by focusing on testing Bayesian networks -- the prototypical family of directed graphical models. A Bayesian network is defined by a directed acyclic graph, where we associate a random variable with each node. The value at any particular node is conditionally independent of all the other non-descendant nodes once its parents are fixed. Specifically, we study the properties of identity testing and closeness testing of Bayesian networks. Our main contribution is the first non-trivial efficient testing algorithms for these problems and corresponding information-theoretic lower bounds. For a wide range of parameter settings, our testing algorithms have sample complexity sublinear in the dimension and are sample-optimal, up to constant factors.

preprint2015arXiv

Pseudorandomness via the discrete Fourier transform

We present a new approach to constructing unconditional pseudorandom generators against classes of functions that involve computing a linear function of the inputs. We give an explicit construction of a pseudorandom generator that fools the discrete Fourier transforms of linear functions with seed-length that is nearly logarithmic (up to polyloglog factors) in the input size and the desired error parameter. Our result gives a single pseudorandom generator that fools several important classes of tests computable in logspace that have been considered in the literature, including halfspaces (over general domains), modular tests and combinatorial shapes. For all these classes, our generator is the first that achieves near logarithmic seed-length in both the input length and the error parameter. Getting such a seed-length is a natural challenge in its own right, which needs to be overcome in order to derandomize RL - a central question in complexity theory. Our construction combines ideas from a large body of prior work, ranging from a classical construction of [NN93] to the recent gradually increasing independence paradigm of [KMN11, CRSW13, GMRTV12], while also introducing some novel analytic machinery which might find other applications.

preprint2014arXiv

Pseudorandomness for concentration bounds and signed majorities

The problem of constructing pseudorandom generators that fool halfspaces has been studied intensively in recent times. For fooling halfspaces over the hypercube with polynomially small error, the best construction known requires seed-length O(log^2 n) (MekaZ13). Getting the seed-length down to O(log(n)) is a natural challenge in its own right, which needs to be overcome in order to derandomize RL. In this work we make progress towards this goal by obtaining near-optimal generators for two important special cases: 1) We give a near optimal derandomization of the Chernoff bound for independent, uniformly random bits. Specifically, we show how to generate a x in {1,-1}^n using $\tilde{O}(\log (n/ε))$ random bits such that for any unit vector u, <u,x> matches the sub-Gaussian tail behaviour predicted by the Chernoff bound up to error eps. 2) We construct a generator which fools halfspaces with {0,1,-1} coefficients with error eps with a seed-length of $\tilde{O}(\log(n/ε))$. This includes the important special case of majorities. In both cases, the best previous results required seed-length of $O(\log n + \log^2(1/ε))$. Technically, our work combines new Fourier-analytic tools with the iterative dimension reduction techniques and the gradually increasing independence paradigm of previous works (KaneMN11, CelisRSW13, GopalanMRTV12).

preprint2012arXiv

A PRG for Lipschitz Functions of Polynomials with Applications to Sparsest Cut

We give improved pseudorandom generators (PRGs) for Lipschitz functions of low-degree polynomials over the hypercube. These are functions of the form psi(P(x)), where P is a low-degree polynomial and psi is a function with small Lipschitz constant. PRGs for smooth functions of low-degree polynomials have received a lot of attention recently and play an important role in constructing PRGs for the natural class of polynomial threshold functions. In spite of the recent progress, no nontrivial PRGs were known for fooling Lipschitz functions of degree O(log n) polynomials even for constant error rate. In this work, we give the first such generator obtaining a seed-length of (log n)\tilde{O}(d^2/eps^2) for fooling degree d polynomials with error eps. Previous generators had an exponential dependence on the degree. We use our PRG to get better integrality gap instances for sparsest cut, a fundamental problem in graph theory with many applications in graph optimization. We give an instance of uniform sparsest cut for which a powerful semi-definite relaxation (SDP) first introduced by Goemans and Linial and studied in the seminal work of Arora, Rao and Vazirani has an integrality gap of exp(Ω((log log n)^{1/2})). Understanding the performance of the Goemans-Linial SDP for uniform sparsest cut is an important open problem in approximation algorithms and metric embeddings and our work gives a near-exponential improvement over previous lower bounds which achieved a gap of Ω(log log n).

preprint2010arXiv

A Pseudopolynomial Algorithm for Alexandrov's Theorem

Alexandrov's Theorem states that every metric with the global topology and local geometry required of a convex polyhedron is in fact the intrinsic metric of a unique convex polyhedron. Recent work by Bobenko and Izmestiev describes a differential equation whose solution leads to the polyhedron corresponding to a given metric. We describe an algorithm based on this differential equation to compute the polyhedron to arbitrary precision given the metric, and prove a pseudopolynomial bound on its running time. Along the way, we develop pseudopolynomial algorithms for computing shortest paths and weighted Delaunay triangulations on a polyhedral surface, even when the surface edges are not shortest paths.

Daniel Kane

What is connected

Connect this record

See the researcher in context

Building this map preview

13 published item(s)

Computational-Statistical Gaps in Reinforcement Learning

Coresets for Data Discretization and Sine Wave Fitting

Fooling Gaussian PTFs via Local Hyperconcentration

Sampling Equilibria: Fast No-Regret Learning in Structured Games

Bounded Memory Active Learning through Enriched Queries

Highway: Efficient Consensus with Flexible Finality

Noise-tolerant, Reliable Active Classification with Comparison Queries

Robustly Learning any Clusterable Mixture of Gaussians

Testing Bayesian Networks

Pseudorandomness via the discrete Fourier transform

Pseudorandomness for concentration bounds and signed majorities

A PRG for Lipschitz Functions of Polynomials with Applications to Sparsest Cut

A Pseudopolynomial Algorithm for Alexandrov's Theorem