Source author record

Alexandr Andoni

Alexandr Andoni appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Computational Complexity Machine Learning Information Theory math.IT math.ST Statistics Theory Artificial Intelligence Distributed, Parallel, and Cluster Computing Information Retrieval math.CO math.FA math.MG math.PR Populations and Evolution Quantitative Methods

Catalog footprint

What is connected

21works

17topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search

We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of $Q$ queries. Primarily, for the high-dimensional regime of $d = ω(\sqrt{Q})$, we introduce a sequence of algorithms with progressively stronger guarantees. We first establish a novel connection between adaptive security and \textit{fairness}, leveraging fair ANN search to hide internal randomness from the adversary with information-theoretic guarantees. To achieve data-independent performance, we then reduce the search problem to a robust decision primitive, solved using a differentially private mechanism on a Locality-Sensitive Hashing (LSH) data structure. This approach, however, faces an inherent $\sqrt{n}$ query time barrier. To break the barrier, we propose a novel concentric-annuli LSH construction that synthesizes these fairness and differential privacy techniques. The analysis introduces a new method for robustly releasing timing information from the underlying algorithm instances and, as a corollary, also improves existing results for fair ANN. In addition, for the low-dimensional regime $d = O(\sqrt{Q})$, we propose specialized algorithms that provide a strong ``for-all'' guarantee: correctness on \textit{every} possible query with high probability. We introduce novel metric covering constructions that simplify and improve prior approaches for ANN in Hamming and $\ell_p$ spaces.

preprint2026arXiv

Nearly Optimal Attention Coresets

We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{ρ+o(ρ)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $ρ$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $Ω({\sqrt{d} e^ρ/ε})$.

preprint2022arXiv

Edit Distance in Near-Linear Time: it's a Constant Factor

We present an algorithm for approximating the edit distance between two strings of length $n$ in time $n^{1+\varepsilon}$ up to a constant factor, for any $\varepsilon>0$. Our result completes a research direction set forth in the recent breakthrough paper [Chakraborty-Das-Goldenberg-Koucky-Saks, FOCS'18], which showed the first constant-factor approximation algorithm with a (strongly) sub-quadratic running time. The recent results of [Koucky-Saks, STOC'20] and [Brakensiek-Rubinstein, STOC'20] have shown near-linear time algorithms that obtain an additive approximation, near-linear in $n$ (equivalently, constant-factor approximation when the edit distance value is close to $n$). In contrast, our algorithm obtains a constant-factor approximation in near-linear time for any input strings. In contrast to prior algorithms, which are mostly recursing over smaller substrings, our algorithm gradually smoothes out the local contribution to the edit distance over progressively larger substrings. To accomplish this, we iteratively construct a distance oracle data structure for the metric of edit distance on all substrings of input strings, of length $n^{i\varepsilon}$ for $i=0,1,\ldots,1/\varepsilon$. The distance oracle approximates the edit distance over these substrings in a certain average sense, just enough to estimate the overall edit distance.

preprint2022arXiv

Learning to Hash Robustly, Guaranteed

The indexing algorithms for the high-dimensional nearest neighbor search (NNS) with the best worst-case guarantees are based on the randomized Locality Sensitive Hashing (LSH), and its derivatives. In practice, many heuristic approaches exist to "learn" the best indexing method in order to speed-up NNS, crucially adapting to the structure of the given dataset. Oftentimes, these heuristics outperform the LSH-based algorithms on real datasets, but, almost always, come at the cost of losing the guarantees of either correctness or robust performance on adversarial queries, or apply to datasets with an assumed extra structure/model. In this paper, we design an NNS algorithm for the Hamming space that has worst-case guarantees essentially matching that of theoretical algorithms, while optimizing the hashing to the structure of the dataset (think instance-optimal algorithms) for performance on the minimum-performing query. We evaluate the algorithm's ability to optimize for a given dataset both theoretically and practically. On the theoretical side, we exhibit a natural setting (dataset model) where our algorithm is much better than the standard theoretical one. On the practical side, we run experiments that show that our algorithm has a 1.8x and 2.1x better recall on the worst-performing queries to the MNIST and ImageNet datasets.

preprint2020arXiv

Streaming Complexity of SVMs

We study the space complexity of solving the bias-regularized SVM problem in the streaming model. This is a classic supervised learning problem that has drawn lots of attention, including for developing fast algorithms for solving the problem approximately. One of the most widely used algorithms for approximately optimizing the SVM objective is Stochastic Gradient Descent (SGD), which requires only $O(\frac{1}{λε})$ random samples, and which immediately yields a streaming algorithm that uses $O(\frac{d}{λε})$ space. For related problems, better streaming algorithms are only known for smooth functions, unlike the SVM objective that we focus on in this work. We initiate an investigation of the space complexity for both finding an approximate optimum of this objective, and for the related ``point estimation'' problem of sketching the data set to evaluate the function value $F_λ$ on any query $(θ, b)$. We show that, for both problems, for dimensions $d=1,2$, one can obtain streaming algorithms with space polynomially smaller than $\frac{1}{λε}$, which is the complexity of SGD for strongly convex functions like the bias-regularized SVM, and which is known to be tight in general, even for $d=1$. We also prove polynomial lower bounds for both point estimation and optimization. In particular, for point estimation we obtain a tight bound of $Θ(1/\sqrtε)$ for $d=1$ and a nearly tight lower bound of $\widetildeΩ(d/ε^2)$ for $d = Ω( \log(1/ε))$. Finally, for optimization, we prove a $Ω(1/\sqrtε)$ lower bound for $d = Ω( \log(1/ε))$, and show similar bounds when $d$ is constant.

preprint2016arXiv

Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

We show tight lower bounds for the entire trade-off between space and query time for the Approximate Near Neighbor search problem. Our lower bounds hold in a restricted model of computation, which captures all hashing-based approaches. In articular, our lower bound matches the upper bound recently shown in [Laarhoven 2015] for the random instance on a Euclidean sphere (which we show in fact extends to the entire space $\mathbb{R}^d$ using the techniques from [Andoni, Razenshteyn 2015]). We also show tight, unconditional cell-probe lower bounds for one and two probes, improving upon the best known bounds from [Panigrahy, Talwar, Wieder 2010]. In particular, this is the first space lower bound (for any static data structure) for two probes which is not polynomially smaller than for one probe. To show the result for two probes, we establish and exploit a connection to locally-decodable codes.

preprint2015arXiv

Optimal Data-Dependent Hashing for Approximate Near Neighbors

We show an optimal data-dependent hashing scheme for the approximate near neighbor problem. For an $n$-point data set in a $d$-dimensional space our data structure achieves query time $O(d n^{ρ+o(1)})$ and space $O(n^{1+ρ+o(1)} + dn)$, where $ρ=\tfrac{1}{2c^2-1}$ for the Euclidean space and approximation $c>1$. For the Hamming space, we obtain an exponent of $ρ=\tfrac{1}{2c-1}$. Our result completes the direction set forth in [AINR14] who gave a proof-of-concept that data-dependent hashing can outperform classical Locality Sensitive Hashing (LSH). In contrast to [AINR14], the new bound is not only optimal, but in fact improves over the best (optimal) LSH data structures [IM98,AI06] for all approximation factors $c>1$. From the technical perspective, we proceed by decomposing an arbitrary dataset into several subsets that are, in a certain sense, pseudo-random.

preprint2015arXiv

Practical and Optimal LSH for Angular Distance

We show the existence of a Locality-Sensitive Hashing (LSH) family for the angular distance that yields an approximate Near Neighbor Search algorithm with the asymptotically optimal running time exponent. Unlike earlier algorithms with this property (e.g., Spherical LSH [Andoni, Indyk, Nguyen, Razenshteyn 2014], [Andoni, Razenshteyn 2015]), our algorithm is also practical, improving upon the well-studied hyperplane LSH [Charikar, 2002] in practice. We also introduce a multiprobe version of this algorithm, and conduct experimental evaluation on real and synthetic data sets. We complement the above positive results with a fine-grained lower bound for the quality of any LSH family for angular distance. Our lower bound implies that the above LSH family exhibits a trade-off between evaluation time and quality that is close to optimal for a natural class of LSH functions.

preprint2015arXiv

Snowflake universality of Wasserstein spaces

For $p\in (1,\infty)$ let $\mathscr{P}_p(\mathbb{R}^3)$ denote the metric space of all $p$-integrable Borel probability measures on $\mathbb{R}^3$, equipped with the Wasserstein $p$ metric $\mathsf{W}_p$. We prove that for every $\varepsilon>0$, every $θ\in (0,1/p]$ and every finite metric space $(X,d_X)$, the metric space $(X,d_{X}^θ)$ embeds into $\mathscr{P}_p(\mathbb{R}^3)$ with distortion at most $1+\varepsilon$. We show that this is sharp when $p\in (1,2]$ in the sense that the exponent $1/p$ cannot be replaced by any larger number. In fact, for arbitrarily large $n\in \mathbb{N}$ there exists an $n$-point metric space $(X_n,d_n)$ such that for every $α\in (1/p,1]$ any embedding of the metric space $(X_n,d_n^α)$ into $\mathscr{P}_p(\mathbb{R}^3)$ incurs distortion that is at least a constant multiple of $(\log n)^{α-1/p}$. These statements establish that there exists an Alexandrov space of nonnegative curvature, namely $\mathscr{P}_{\! 2}(\mathbb{R}^3)$, with respect to which there does not exist a sequence of bounded degree expander graphs. It also follows that $\mathscr{P}_{\! 2}(\mathbb{R}^3)$ does not admit a uniform, coarse, or quasisymmetric embedding into any Banach space of nontrivial type. Links to several longstanding open questions in metric geometry are discussed, including the characterization of subsets of Alexandrov spaces, existence of expanders, the universality problem for $\mathscr{P}_{\! 2}(\mathbb{R}^k)$, and the metric cotype dichotomy problem.

preprint2015arXiv

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

We prove a tight lower bound for the exponent $ρ$ for data-dependent Locality-Sensitive Hashing schemes, recently used to design efficient solutions for the $c$-approximate nearest neighbor search. In particular, our lower bound matches the bound of $ρ\le \frac{1}{2c-1}+o(1)$ for the $\ell_1$ space, obtained via the recent algorithm from [Andoni-Razenshteyn, STOC'15]. In recent years it emerged that data-dependent hashing is strictly superior to the classical Locality-Sensitive Hashing, when the hash function is data-independent. In the latter setting, the best exponent has been already known: for the $\ell_1$ space, the tight bound is $ρ=1/c$, with the upper bound from [Indyk-Motwani, STOC'98] and the matching lower bound from [O'Donnell-Wu-Zhou, ITCS'11]. We prove that, even if the hashing is data-dependent, it must hold that $ρ\ge \frac{1}{2c-1}-o(1)$. To prove the result, we need to formalize the exact notion of data-dependent hashing that also captures the complexity of the hash functions (in addition to their collision properties). Without restricting such complexity, we would allow for obviously infeasible solutions such as the Voronoi diagram of a dataset. To preclude such solutions, we require our hash functions to be succinct. This condition is satisfied by all the known algorithmic results.

preprint2014arXiv

Parallel Algorithms for Geometric Graph Problems

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_ε(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+ε)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+ε)$-approximation algorithm with $n^δ$ space in the streaming-with-sorting model with $1/δ^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

preprint2014arXiv

Spectral Approaches to Nearest Neighbor Search

We study spectral algorithms for the high-dimensional Nearest Neighbor Search problem (NNS). In particular, we consider a semi-random setting where a dataset $P$ in $\mathbb{R}^d$ is chosen arbitrarily from an unknown subspace of low dimension $k\ll d$, and then perturbed by fully $d$-dimensional Gaussian noise. We design spectral NNS algorithms whose query time depends polynomially on $d$ and $\log n$ (where $n=|P|$) for large ranges of $k$, $d$ and $n$. Our algorithms use a repeated computation of the top PCA vector/subspace, and are effective even when the random-noise magnitude is {\em much larger} than the interpoint distances in $P$. Our motivation is that in practice, a number of spectral NNS algorithms outperform the random-projection methods that seem otherwise theoretically optimal on worst case datasets. In this paper we aim to provide theoretical justification for this disparity.

preprint2014arXiv

The Sketching Complexity of Graph Cuts

We study the problem of sketching an input graph, so that given the sketch, one can estimate the weight of any cut in the graph within factor $1+ε$. We present lower and upper bounds on the size of a randomized sketch, focusing on the dependence on the accuracy parameter $ε>0$. First, we prove that for every $ε> 1/\sqrt n$, every sketch that succeeds (with constant probability) in estimating the weight of all cuts $(S,\bar S)$ in an $n$-vertex graph (simultaneously), must be of size $Ω(n/ε^2)$ bits. In the special case where the sketch is itself a weighted graph (which may or may not be a subgraph) and the estimator is the sum of edge weights across the cut in the sketch, i.e., a cut sparsifier, we show the sketch must have $Ω(n/ε^2)$ edges, which is optimal. Despite the long sequence of work on graph sparsification, no such lower bound was known on the size of a cut sparsifier. We then design a randomized sketch that, given $ε\in(0,1)$ and an edge-weighted $n$-vertex graph, produces a sketch of size $\tilde O(n/ε)$ bits, from which the weight of any cut $(S,\bar S)$ can be reported, with high probability, within factor $1+ε$. The previous upper bound is $\tilde O(n/ε^2)$ bits, which follows by storing a cut sparsifier (Bencz{ú}r and Karger, 1996). To obtain this improvement, we critically use both that the sketch need only be correct on each fixed cut with high probability (rather than on all cuts), and that the estimation procedure of the data structure can be arbitrary (rather than a weighted subgraph). We also show a lower bound of $Ω(n/ε)$ bits for the space requirement of any data structure achieving this guarantee.

preprint2013arXiv

A Differential Equations Approach to Optimizing Regret Trade-offs

We consider the classical question of predicting binary sequences and study the {\em optimal} algorithms for obtaining the best possible regret and payoff functions for this problem. The question turns out to be also equivalent to the problem of optimal trade-offs between the regrets of two experts in an "experts problem", studied before by \cite{kearns-regret}. While, say, a regret of $Θ(\sqrt{T})$ is known, we argue that it important to ask what is the provably optimal algorithm for this problem --- both because it leads to natural algorithms, as well as because regret is in fact often comparable in magnitude to the final payoffs and hence is a non-negligible term. In the basic setting, the result essentially follows from a classical result of Cover from '65. Here instead, we focus on another standard setting, of time-discounted payoffs, where the final "stopping time" is not specified. We exhibit an explicit characterization of the optimal regret for this setting. To obtain our main result, we show that the optimal payoff functions have to satisfy the Hermite differential equation, and hence are given by the solutions to this equation. It turns out that characterization of the payoff function is qualitatively different from the classical (non-discounted) setting, and, namely, there's essentially a unique optimal solution.

preprint2013arXiv

Beyond Locality-Sensitive Hashing

We present a new data structure for the c-approximate near neighbor problem (ANN) in the Euclidean space. For n points in R^d, our algorithm achieves O(n^ρ + d log n) query time and O(n^{1 + ρ} + d log n) space, where ρ<= 7/(8c^2) + O(1 / c^3) + o(1). This is the first improvement over the result by Andoni and Indyk (FOCS 2006) and the first data structure that bypasses a locality-sensitive hashing lower bound proved by O'Donnell, Wu and Zhou (ICS 2011). By a standard reduction we obtain a data structure for the Hamming space and \ell_1 norm with ρ<= 7/(8c) + O(1/c^{3/2}) + o(1), which is the first improvement over the result of Indyk and Motwani (STOC 1998).

preprint2013arXiv

Tight Lower Bound for Linear Sketches of Moments

The problem of estimating frequency moments of a data stream has attracted a lot of attention since the onset of streaming algorithms [AMS99]. While the space complexity for approximately computing the $p^{\rm th}$ moment, for $p\in(0,2]$ has been settled [KNW10], for $p>2$ the exact complexity remains open. For $p>2$ the current best algorithm uses $O(n^{1-2/p}\log n)$ words of space [AKO11,BO10], whereas the lower bound is of $Ω(n^{1-2/p})$ [BJKS04]. In this paper, we show a tight lower bound of $Ω(n^{1-2/p}\log n)$ words for the class of algorithms based on linear sketches, which store only a sketch $Ax$ of input vector $x$ and some (possibly randomized) matrix $A$. We note that all known algorithms for this problem are linear sketches.

preprint2013arXiv

Towards (1+ε)-Approximate Flow Sparsifiers

A useful approach to "compress" a large network $G$ is to represent it with a {\em flow-sparsifier}, i.e., a small network $H$ that supports the same flows as $G$, up to a factor $q \geq 1$ called the quality of sparsifier. Specifically, we assume the network $G$ contains a set of $k$ terminals $T$, shared with the network $H$, i.e., $T\subseteq V(G)\cap V(H)$, and we want $H$ to preserve all multicommodity flows that can be routed between the terminals $T$. The challenge is to construct $H$ that is small. These questions have received a lot of attention in recent years, leading to some known tradeoffs between the sparsifier's quality $q$ and its size $|V(H)|$. Nevertheless, it remains an outstanding question whether every $G$ admits a flow-sparsifier $H$ with quality $q=1+ε$, or even $q=O(1)$, and size $|V(H)|\leq f(k,ε)$ (in particular, independent of $|V(G)|$ and the edge capacities). Making a first step in this direction, we present new constructions for several scenarios: * Our main result is that for quasi-bipartite networks $G$, one can construct a $(1+ε)$-flow-sparsifier of size $\poly(k/\eps)$. In contrast, exact ($q=1$) sparsifiers for this family of networks are known to require size $2^{Ω(k)}$. * For networks $G$ of bounded treewidth $w$, we construct a flow-sparsifier with quality $q=O(\log w / \log\log w)$ and size $O(w\cdot \poly(k))$. * For general networks $G$, we construct a {\em sketch} $sk(G)$, that stores all the feasible multicommodity flows up to factor $q=1+\eps$, and its size (storage requirement) is $f(k,ε)$.

preprint2011arXiv

Approximating Edit Distance in Near-Linear Time

We show how to compute the edit distance between two strings of length n up to a factor of 2^{Õ(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{Õ(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time.

preprint2011arXiv

Streaming Algorithms from Precision Sampling

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector $x=(x_1,...x_n)$, which is useful for the following applications. 1) Estimating the $F_k$-moment of $x$, for $k>2$. 2) Estimating the $\ell_p$-norm of $x$, for $p\in[1,2]$, with small update time. 3) Estimating cascaded norms $\ell_p(\ell_q)$ for all $p,q>0$. 4) $\ell_1$ sampling, where the goal is to produce an element $i$ with probability (approximately) $|x_i|/\|x\|_1$. It extends to similarly defined $\ell_p$-sampling, for $p\in [1,2]$. For all these applications the algorithm is essentially the same: scale the vector x entry-wise by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting vector. Our sketch is a linear function of x, thereby allowing general updates to the vector x. Precision Sampling itself addresses the problem of estimating a sum $\sum_{i=1}^n a_i$ from weak estimates of each real $a_i\in[0,1]$. More precisely, the estimator first chooses a desired precision $u_i\in(0,1]$ for each $i\in[n]$, and then it receives an estimate of every $a_i$ within additive $u_i$. Its goal is to provide a good approximation to $\sum a_i$ while keeping a tab on the "approximation cost" $\sum_i (1/u_i)$. Here we refine previous work [Andoni, Krauthgamer, and Onak, FOCS 2010] which shows that as long as $\sum a_i=Ω(1)$, a good multiplicative approximation can be achieved using total precision of only $O(n\log n)$.

preprint2010arXiv

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). This result arises naturally in the study of a new asymmetric query model. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner, while being charged for querying every symbol of x. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being "repetitive", which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance, which is edit distance on non-repetitive strings, such as permutations.

preprint2009arXiv

Global Alignment of Molecular Sequences via Ancestral State Reconstruction

Molecular phylogenetic techniques do not generally account for such common evolutionary events as site insertions and deletions (known as indels). Instead tree building algorithms and ancestral state inference procedures typically rely on substitution-only models of sequence evolution. In practice these methods are extended beyond this simplified setting with the use of heuristics that produce global alignments of the input sequences--an important problem which has no rigorous model-based solution. In this paper we consider a new version of the multiple sequence alignment in the context of stochastic indel models. More precisely, we introduce the following {\em trace reconstruction problem on a tree} (TRPT): a binary sequence is broadcast through a tree channel where we allow substitutions, deletions, and insertions; we seek to reconstruct the original sequence from the sequences received at the leaves of the tree. We give a recursive procedure for this problem with strong reconstruction guarantees at low mutation rates, providing also an alignment of the sequences at the leaves of the tree. The TRPT problem without indels has been studied in previous work (Mossel 2004, Daskalakis et al. 2006) as a bootstrapping step towards obtaining optimal phylogenetic reconstruction methods. The present work sets up a framework for extending these works to evolutionary models with indels.

Alexandr Andoni

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search

Nearly Optimal Attention Coresets

Edit Distance in Near-Linear Time: it's a Constant Factor

Learning to Hash Robustly, Guaranteed

Streaming Complexity of SVMs

Lower Bounds on Time-Space Trade-Offs for Approximate Near Neighbors

Optimal Data-Dependent Hashing for Approximate Near Neighbors

Practical and Optimal LSH for Angular Distance

Snowflake universality of Wasserstein spaces

Tight Lower Bounds for Data-Dependent Locality-Sensitive Hashing

Parallel Algorithms for Geometric Graph Problems

Spectral Approaches to Nearest Neighbor Search

The Sketching Complexity of Graph Cuts

A Differential Equations Approach to Optimizing Regret Trade-offs

Beyond Locality-Sensitive Hashing

Tight Lower Bound for Linear Sketches of Moments

Towards (1+ε)-Approximate Flow Sparsifiers

Approximating Edit Distance in Near-Linear Time

Streaming Algorithms from Precision Sampling

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

Global Alignment of Molecular Sequences via Ancestral State Reconstruction