Source author record

Krzysztof Onak

Krzysztof Onak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Geometry Distributed, Parallel, and Cluster Computing Computational Complexity

Catalog footprint

What is connected

10works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search

We study the Approximate Nearest Neighbor (ANN) problem under a powerful adaptive adversary that controls both the dataset and a sequence of $Q$ queries. Primarily, for the high-dimensional regime of $d = ω(\sqrt{Q})$, we introduce a sequence of algorithms with progressively stronger guarantees. We first establish a novel connection between adaptive security and \textit{fairness}, leveraging fair ANN search to hide internal randomness from the adversary with information-theoretic guarantees. To achieve data-independent performance, we then reduce the search problem to a robust decision primitive, solved using a differentially private mechanism on a Locality-Sensitive Hashing (LSH) data structure. This approach, however, faces an inherent $\sqrt{n}$ query time barrier. To break the barrier, we propose a novel concentric-annuli LSH construction that synthesizes these fairness and differential privacy techniques. The analysis introduces a new method for robustly releasing timing information from the underlying algorithm instances and, as a corollary, also improves existing results for fair ANN. In addition, for the low-dimensional regime $d = O(\sqrt{Q})$, we propose specialized algorithms that provide a strong ``for-all'' guarantee: correctness on \textit{every} possible query with high probability. We introduce novel metric covering constructions that simplify and improve prior approaches for ANN in Hamming and $\ell_p$ spaces.

preprint2021arXiv

Dynamic Graph Algorithms with Batch Updates in the Massively Parallel Computation Model

We study dynamic graph algorithms in the Massively Parallel Computation model, which was inspired by practical data processing systems. Our goal is to provide algorithms that can efficiently handle large batches of edge insertions and deletions. We show algorithms that require fewer rounds to update a solution to problems such as Minimum Spanning Forest, 2-Edge Connected Components, and Maximal Matching than would be required by their static counterparts to compute it from scratch. They work in the most restrictive memory regime, in which local memory per machine is strongly sublinear in the number of graph vertices. Improving on the size of the batch they can handle efficiently would improve on the round complexity of known static algorithms on sparse graphs. Our algorithms can process batches of updates of size $Θ(S)$, for Minimum Spanning Forest and 2-Edge Connected Components, and $Θ(S^{1-\varepsilon})$, for Maximal Matching, in $O(1)$ rounds, where $S$ is the local memory of a single machine.

preprint2016arXiv

Superlinear lower bounds for multipass graph processing

We prove $n^{1+Ω(1/p)}/p^{O(1)}$ lower bounds for the space complexity of $p$-pass streaming algorithms solving the following problems on $n$-vertex graphs: * testing if an undirected graph has a perfect matching (this implies lower bounds for computing a maximum matching or even just the maximum matching size), * testing if two specific vertices are at distance at most $2(p+1)$ in an undirected graph, * testing if there is a directed path from $s$ to $t$ for two specific vertices $s$ and $t$ in a directed graph. Prior to our result, it was known that these problems require $Ω(n^2)$ space in one pass, but no $n^{1+Ω(1)}$ lower bound was known for any $p\ge 2$. These streaming results follow from a communication complexity lower bound for a communication game in which the players hold two graphs on the same set of vertices. The task of the players is to find out whether the sets of vertices at distance exactly $p+1$ from a specific vertex intersect. The game requires a significant amount of communication only if the players are forced to speak in a specific difficult order. This is reminiscent of lower bounds for communication problems such as indexing and pointer chasing. Among other things, our line of attack requires proving an information cost lower bound for a decision version of the classic pointer chasing problem and a direct sum type theorem for the disjunction of several instances of this problem.

preprint2014arXiv

Parallel Algorithms for Geometric Graph Problems

We give algorithms for geometric graph problems in the modern parallel models inspired by MapReduce. For example, for the Minimum Spanning Tree (MST) problem over a set of points in the two-dimensional space, our algorithm computes a $(1+ε)$-approximate MST. Our algorithms work in a constant number of rounds of communication, while using total space and communication proportional to the size of the data (linear space and near linear time algorithms). In contrast, for general graphs, achieving the same result for MST (or even connectivity) remains a challenging open problem, despite drawing significant attention in recent years. We develop a general algorithmic framework that, besides MST, also applies to Earth-Mover Distance (EMD) and the transportation cost problem. Our algorithmic framework has implications beyond the MapReduce model. For example it yields a new algorithm for computing EMD cost in the plane in near-linear time, $n^{1+o_ε(1)}$. We note that while recently Sharathkumar and Agarwal developed a near-linear time algorithm for $(1+ε)$-approximating EMD, our algorithm is fundamentally different, and, for example, also solves the transportation (cost) problem, raised as an open question in their work. Furthermore, our algorithm immediately gives a $(1+ε)$-approximation algorithm with $n^δ$ space in the streaming-with-sorting model with $1/δ^{O(1)}$ passes. As such, it is tempting to conjecture that the parallel models may also constitute a concrete playground in the quest for efficient algorithms for EMD (and other similar problems) in the vanilla streaming model, a well-known open problem.

preprint2013arXiv

Fat Polygonal Partitions with Applications to Visualization and Embeddings

Let $\mathcal{T}$ be a rooted and weighted tree, where the weight of any node is equal to the sum of the weights of its children. The popular Treemap algorithm visualizes such a tree as a hierarchical partition of a square into rectangles, where the area of the rectangle corresponding to any node in $\mathcal{T}$ is equal to the weight of that node. The aspect ratio of the rectangles in such a rectangular partition necessarily depends on the weights and can become arbitrarily high. We introduce a new hierarchical partition scheme, called a polygonal partition, which uses convex polygons rather than just rectangles. We present two methods for constructing polygonal partitions, both having guarantees on the worst-case aspect ratio of the constructed polygons; in particular, both methods guarantee a bound on the aspect ratio that is independent of the weights of the nodes. We also consider rectangular partitions with slack, where the areas of the rectangles may differ slightly from the weights of the corresponding nodes. We show that this makes it possible to obtain partitions with constant aspect ratio. This result generalizes to hyper-rectangular partitions in $\mathbb{R}^d$. We use these partitions with slack for embedding ultrametrics into $d$-dimensional Euclidean space: we give a $\mathop{\rm polylog}(Δ)$-approximation algorithm for embedding $n$-point ultrametrics into $\mathbb{R}^d$ with minimum distortion, where $Δ$ denotes the spread of the metric, i.e., the ratio between the largest and the smallest distance between two points. The previously best-known approximation ratio for this problem was polynomial in $n$. This is the first algorithm for embedding a non-trivial family of weighted-graph metrics into a space of constant dimension that achieves polylogarithmic approximation ratio.

preprint2011arXiv

A Near-Optimal Sublinear-Time Algorithm for Approximating the Minimum Vertex Cover Size

We give a nearly optimal sublinear-time algorithm for approximating the size of a minimum vertex cover in a graph G. The algorithm may query the degree deg(v) of any vertex v of its choice, and for each 1 <= i <= deg(v), it may ask for the i-th neighbor of v. Letting VC_opt(G) denote the minimum size of vertex cover in G, the algorithm outputs, with high constant success probability, an estimate VC_estimate(G) such that VC_opt(G) <= VC_estimate(G) <= 2 * VC_opt(G) + epsilon*n, where epsilon is a given additive approximation parameter. We refer to such an estimate as a (2,epsilon)-estimate. The query complexity and running time of the algorithm are ~O(avg_deg * poly(1/epsilon)), where avg_deg denotes the average vertex degree in the graph. The best previously known sublinear algorithm, of Yoshida et al. (STOC 2009), has query complexity and running time O(d^4/epsilon^2), where d is the maximum degree in the graph. Given the lower bound of Omega(avg_deg) (for constant epsilon) for obtaining such an estimate (with any constant multiplicative factor) due to Parnas and Ron (TCS 2007), our result is nearly optimal. In the case that the graph is dense, that is, the number of edges is Theta(n^2), we consider another model, in which the algorithm may ask, for any pair of vertices u and v, whether there is an edge between u and v. We show how to adapt the algorithm that uses neighbor queries to this model and obtain an algorithm that outputs a (2,epsilon)-estimate of the size of a minimum vertex cover whose query complexity and running time are ~O(n) * poly(1/epsilon).

preprint2011arXiv

An Efficient Partitioning Oracle for Bounded-Treewidth Graphs

Partitioning oracles were introduced by Hassidim et al. (FOCS 2009) as a generic tool for constant-time algorithms. For any epsilon > 0, a partitioning oracle provides query access to a fixed partition of the input bounded-degree minor-free graph, in which every component has size poly(1/epsilon), and the number of edges removed is at most epsilon*n, where n is the number of vertices in the graph. However, the oracle of Hassidimet al. makes an exponential number of queries to the input graph to answer every query about the partition. In this paper, we construct an efficient partitioning oracle for graphs with constant treewidth. The oracle makes only O(poly(1/epsilon)) queries to the input graph to answer each query about the partition. Examples of bounded-treewidth graph classes include k-outerplanar graphs for fixed k, series-parallel graphs, cactus graphs, and pseudoforests. Our oracle yields poly(1/epsilon)-time property testing algorithms for membership in these classes of graphs. Another application of the oracle is a poly(1/epsilon)-time algorithm that approximates the maximum matching size, the minimum vertex cover size, and the minimum dominating set size up to an additive epsilon*n in graphs with bounded treewidth. Finally, the oracle can be used to test in poly(1/epsilon) time whether the input bounded-treewidth graph is k-colorable or perfect.

preprint2011arXiv

Approximating Edit Distance in Near-Linear Time

We show how to compute the edit distance between two strings of length n up to a factor of 2^{Õ(sqrt(log n))} in n^(1+o(1)) time. This is the first sub-polynomial approximation algorithm for this problem that runs in near-linear time, improving on the state-of-the-art n^(1/3+o(1)) approximation. Previously, approximation of 2^{Õ(sqrt(log n))} was known only for embedding edit distance into l_1, and it is not known if that embedding can be computed in less than quadratic time.

preprint2011arXiv

Streaming Algorithms from Precision Sampling

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called Precision Sampling. Using this method, we obtain simple data-stream algorithms that maintain a randomized sketch of an input vector $x=(x_1,...x_n)$, which is useful for the following applications. 1) Estimating the $F_k$-moment of $x$, for $k>2$. 2) Estimating the $\ell_p$-norm of $x$, for $p\in[1,2]$, with small update time. 3) Estimating cascaded norms $\ell_p(\ell_q)$ for all $p,q>0$. 4) $\ell_1$ sampling, where the goal is to produce an element $i$ with probability (approximately) $|x_i|/\|x\|_1$. It extends to similarly defined $\ell_p$-sampling, for $p\in [1,2]$. For all these applications the algorithm is essentially the same: scale the vector x entry-wise by a well-chosen random vector, and run a heavy-hitter estimation algorithm on the resulting vector. Our sketch is a linear function of x, thereby allowing general updates to the vector x. Precision Sampling itself addresses the problem of estimating a sum $\sum_{i=1}^n a_i$ from weak estimates of each real $a_i\in[0,1]$. More precisely, the estimator first chooses a desired precision $u_i\in(0,1]$ for each $i\in[n]$, and then it receives an estimate of every $a_i$ within additive $u_i$. Its goal is to provide a good approximation to $\sum a_i$ while keeping a tab on the "approximation cost" $\sum_i (1/u_i)$. Here we refine previous work [Andoni, Krauthgamer, and Onak, FOCS 2010] which shows that as long as $\sum a_i=Ω(1)$, a good multiplicative approximation can be achieved using total precision of only $O(n\log n)$.

preprint2010arXiv

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity

We present a near-linear time algorithm that approximates the edit distance between two strings within a polylogarithmic factor; specifically, for strings of length n and every fixed epsilon>0, it can compute a (log n)^O(1/epsilon) approximation in n^(1+epsilon) time. This is an exponential improvement over the previously known factor, 2^(O (sqrt(log n))), with a comparable running time (Ostrovsky and Rabani J.ACM 2007; Andoni and Onak STOC 2009). Previously, no efficient polylogarithmic approximation algorithm was known for any computational task involving edit distance (e.g., nearest neighbor search or sketching). This result arises naturally in the study of a new asymmetric query model. In this model, the input consists of two strings x and y, and an algorithm can access y in an unrestricted manner, while being charged for querying every symbol of x. Indeed, we obtain our main result by designing an algorithm that makes a small number of queries in this model. We then provide a nearly-matching lower bound on the number of queries. Our lower bound is the first to expose hardness of edit distance stemming from the input strings being "repetitive", which means that many of their substrings are approximately identical. Consequently, our lower bound provides the first rigorous separation between edit distance and Ulam distance, which is edit distance on non-repetitive strings, such as permutations.

Krzysztof Onak

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Efficient Algorithms for Adversarially Robust Approximate Nearest Neighbor Search

Dynamic Graph Algorithms with Batch Updates in the Massively Parallel Computation Model

Superlinear lower bounds for multipass graph processing

Parallel Algorithms for Geometric Graph Problems

Fat Polygonal Partitions with Applications to Visualization and Embeddings

A Near-Optimal Sublinear-Time Algorithm for Approximating the Minimum Vertex Cover Size

An Efficient Partitioning Oracle for Bounded-Treewidth Graphs

Approximating Edit Distance in Near-Linear Time

Streaming Algorithms from Precision Sampling

Polylogarithmic Approximation for Edit Distance and the Asymmetric Query Complexity