Source author record

Tsvi Kopelowitz

Tsvi Kopelowitz appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing Computational Complexity Computational Geometry

Catalog footprint

What is connected

20works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

An Improved Algorithm for The $k$-Dyck Edit Distance Problem

A Dyck sequence is a sequence of opening and closing parentheses (of various types) that is balanced. The Dyck edit distance of a given sequence of parentheses $S$ is the smallest number of edit operations (insertions, deletions, and substitutions) needed to transform $S$ into a Dyck sequence. We consider the threshold Dyck edit distance problem, where the input is a sequence of parentheses $S$ and a positive integer $k$, and the goal is to compute the Dyck edit distance of $S$ only if the distance is at most $k$, and otherwise report that the distance is larger than $k$. Backurs and Onak [PODS'16] showed that the threshold Dyck edit distance problem can be solved in $O(n+k^{16})$ time. In this work, we design new algorithms for the threshold Dyck edit distance problem which costs $O(n+k^{4.544184})$ time with high probability or $O(n+k^{4.853059})$ deterministically. Our algorithms combine several new structural properties of the Dyck edit distance problem, a refined algorithm for fast $(\min,+)$ matrix product, and a careful modification of ideas used in Valiant's parsing algorithm.

preprint2020arXiv

Approximating Text-to-Pattern Hamming Distances

We revisit a fundamental problem in string matching: given a pattern of length m and a text of length n, both over an alphabet of size $σ$, compute the Hamming distance between the pattern and the text at every location. Several $(1+ε)$-approximation algorithms have been proposed in the literature, with running time of the form $O(ε^{-O(1)}n\log n\log m)$, all using fast Fourier transform (FFT). We describe a simple $(1+ε)$-approximation algorithm that is faster and does not need FFT. Combining our approach with additional ideas leads to numerous new results: - We obtain the first linear-time approximation algorithm; the running time is $O(ε^{-2}n)$. - We obtain a faster exact algorithm computing all Hamming distances up to a given threshold k; its running time improves previous results by logarithmic factors and is linear if $k\le\sqrt m$. - We obtain approximation algorithms with better $ε$-dependence using rectangular matrix multiplication. The time-bound is $Õ(n)$ when the pattern is sufficiently long: $m\ge ε^{-28}$. Previous algorithms require $Õ(ε^{-1}n)$ time. - When k is not too small, we obtain a truly sublinear-time algorithm to find all locations with Hamming distance approximately (up to a constant factor) less than k, in $O((n/k^{Ω(1)}+occ)n^{o(1)})$ time, where occ is the output size. The algorithm leads to a property tester, returning true if an exact match exists and false if the Hamming distance is more than $δm$ at every location, running in $Õ(δ^{-1/3}n^{2/3}+δ^{-1}n/m)$ time. - We obtain a streaming algorithm to report all locations with Hamming distance approximately less than k, using $Õ(ε^{-2}\sqrt k)$ space. Previously, streaming algorithms were known for the exact problem with Õ(k) space or for the approximate problem with $Õ(ε^{-O(1)}\sqrt m)$ space.

preprint2020arXiv

Contention Resolution Without Collision Detection

This paper focuses on the contention resolution problem on a shared communication channel that does not support collision detection. A shared communication channel is a multiple access channel, which consists of a sequence of synchronized time slots. Players on the channel may attempt to broadcast a packet (message) in any time slot. A player's broadcast succeeds if no other player broadcasts during that slot. If two or more players broadcast in the same time slot, then the broadcasts collide and both broadcasts fail. The lack of collision detection means that a player monitoring the channel cannot differentiate between the case of two or more players broadcasting in the same slot (a collision) and zero players broadcasting. In the contention-resolution problem, players arrive on the channel over time, and each player has one packet to transmit. The goal is to coordinate the players so that each player is able to successfully transmit its packet within reasonable time. However, the players can only communicate via the shared channel by choosing to either broadcast or not. A contention-resolution protocol is measured in terms of its throughput (channel utilization). Previous work on contention resolution that achieved constant throughput assumed that either players could detect collisions, or the players' arrival pattern is generated by a memoryless (non-adversarial) process. The foundational question answered by this paper is whether collision detection is a luxury or necessity when the objective is to achieve constant throughput. We show that even without collision detection, one can solve contention resolution, achieving constant throughput, with high probability.

preprint2020arXiv

Improved Circular $k$-Mismatch Sketches

The shift distance $\mathsf{sh}(S_1,S_2)$ between two strings $S_1$ and $S_2$ of the same length is defined as the minimum Hamming distance between $S_1$ and any rotation (cyclic shift) of $S_2$. We study the problem of sketching the shift distance, which is the following communication complexity problem: Strings $S_1$ and $S_2$ of length $n$ are given to two identical players (encoders), who independently compute sketches (summaries) $\mathtt{sk}(S_1)$ and $\mathtt{sk}(S_2)$, respectively, so that upon receiving the two sketches, a third player (decoder) is able to compute (or approximate) $\mathsf{sh}(S_1,S_2)$ with high probability. This paper primarily focuses on the more general $k$-mismatch version of the problem, where the decoder is allowed to declare a failure if $\mathsf{sh}(S_1,S_2)>k$, where $k$ is a parameter known to all parties. Andoni et al. (STOC'13) introduced exact circular $k$-mismatch sketches of size $\widetilde{O}(k+D(n))$, where $D(n)$ is the number of divisors of $n$. Andoni et al. also showed that their sketch size is optimal in the class of linear homomorphic sketches. We circumvent this lower bound by designing a (non-linear) exact circular $k$-mismatch sketch of size $\widetilde{O}(k)$; this size matches communication-complexity lower bounds. We also design $(1\pm \varepsilon)$-approximate circular $k$-mismatch sketch of size $\widetilde{O}(\min(\varepsilon^{-2}\sqrt{k}, \varepsilon^{-1.5}\sqrt{n}))$, which improves upon an $\widetilde{O}(\varepsilon^{-2}\sqrt{n})$-size sketch of Crouch and McGregor (APPROX'11).

preprint2020arXiv

The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

We revisit the $k$-mismatch problem in the streaming model on a pattern of length $m$ and a streaming text of length $n$, both over a size-$σ$ alphabet. The current state-of-the-art algorithm for the streaming $k$-mismatch problem, by Clifford et al. [SODA 2019], uses $\tilde O(k)$ space and $\tilde O\big(\sqrt k\big)$ worst-case time per character. The space complexity is known to be (unconditionally) optimal, and the worst-case time per character matches a conditional lower bound. However, there is a gap between the total time cost of the algorithm, which is $\tilde O(n\sqrt k)$, and the fastest known offline algorithm, which costs $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$ time. Moreover, it is not known whether improvements over the $\tilde O(n\sqrt k)$ total time are possible when using more than $O(k)$ space. We address these gaps by designing a randomized streaming algorithm for the $k$-mismatch problem that, given an integer parameter $k\le s \le m$, uses $\tilde O(s)$ space and costs $\tilde O\big(n+\min\big(\frac {nk^2}m,\frac{nk}{\sqrt s},\frac{σnm}s\big)\big)$ total time. For $s=m$, the total runtime becomes $\tilde O\big(n + \min\big(\frac{nk}{\sqrt m},σn\big)\big)$, which matches the time cost of the fastest offline algorithm. Moreover, the worst-case time cost per character is still $\tilde O\big(\sqrt k\big)$.

preprint2017arXiv

Streaming Pattern Matching with d Wildcards

In the pattern matching with $d$ wildcards problem one is given a text $T$ of length $n$ and a pattern $P$ of length $m$ that contains $d$ wildcard characters, each denoted by a special symbol $'?'$. A wildcard character matches any other character. The goal is to establish for each $m$-length substring of $T$ whether it matches $P$. In the streaming model variant of the pattern matching with $d$ wildcards problem the text $T$ arrives one character at a time and the goal is to report, before the next character arrives, if the last $m$ characters match $P$ while using only $o(m)$ words of space. In this paper we introduce two new algorithms for the $d$ wildcard pattern matching problem in the streaming model. The first is a randomized Monte Carlo algorithm that is parameterized by a constant $0\leq δ\leq 1$. This algorithm uses $\tilde{O}(d^{1-δ})$ amortized time per character and $\tilde{O}(d^{1+δ})$ words of space. The second algorithm, which is used as a black box in the first algorithm, is a randomized Monte Carlo algorithm which uses $O(d+\log m)$ worst-case time per character and $O(d\log m)$ words of space.

preprint2016arXiv

An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message: 1. Fast $Δ$-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of $Δ$-coloring a tree with maximum degree $Δ\ge 55$ is $Θ(\log_Δ\log n)$, whereas its deterministic complexity is $Θ(\log_Δn)$ for any $Δ\ge 3$. This also establishes a large separation between the deterministic complexity of $Δ$-coloring and $(Δ+1)$-coloring trees. 2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in $O(1)+o(\log_Δn)$ rounds can be transformed to run in $O(\log^*n-\log^*Δ+1)$ rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires $Ω(\log_Δn)$ time deterministically. 3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size $n$ is at least its deterministic complexity on instances of size $\sqrt{\log n}$. This shows that a deterministic $Ω(\log_Δn)$ lower bound for any problem implies a randomized $Ω(\log_Δ\log n)$ lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

preprint2015arXiv

Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/ε)$ Time Per Alignment

The algorithmic tasks of computing the Hamming distance between a given pattern of length $m$ and each location in a text of length $n$ is one of the most fundamental algorithmic tasks in string algorithms. Unfortunately, there is evidence that for a text $T$ of size $n$ and a pattern $P$ of size $m$, one cannot compute the exact Hamming distance for all locations in $T$ in time which is less than $\tilde O(n\sqrt m)$. However, Karloff~\cite{karloff} showed that if one is willing to suffer a $1\pmε$ approximation, then it is possible to solve the problem with high probability, in $\tilde O(\frac n {ε^2})$ time. Due to related lower bounds for computing the Hamming distance of two strings in the one-way communication complexity model, it is strongly believed that obtaining an algorithm for solving the approximation version cannot be done much faster as a function of $\frac 1 ε$. We show here that this belief is false by introducing a new $\tilde O(\frac{n}ε)$ time algorithm that succeeds with high probability. The main idea behind our algorithm, which is common in sparse recovery problems, is to reduce the variance of a specific randomized experiment by (approximately) separating heavy hitters from non-heavy hitters. However, while known sparse recovery techniques work very well on vectors, they do not seem to apply here, where we are dealing with mismatches between pairs of characters. We introduce two main algorithmic ingredients. The first is a new sparse recovery method that applies for pair inputs (such as in our setting). The second is a new construction of hash/projection functions, which allows to count the number of projections that induce mismatches between two characters exponentially faster than brute force. We expect that these algorithmic techniques will be of independent interest.

preprint2015arXiv

Dynamic Set Intersection

Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time, and updates in $O(\log w)$ expected time. Using this algorithm we can list all $t$ triangles of a graph $G=(V,E)$ in $O(m+\frac{mα}{w/\log^2 w} +t)$ expected time, where $m=|E|$ and $α$ is the arboricity of $G$. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in $O(m α)$ time. We provide an incremental data structure on $F$ that supports intersection {\em witness} queries, where we only need to find {\em one} $e\in S\cap S'$. Both queries and insertions take $O\paren{\sqrt \frac{N}{w/\log^2 w}}$ expected time, where $N=\sum_{S\in F} |S|$. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using $M$ words of space, each update costs $O(\sqrt {M \log N})$ expected time, each reporting query costs $O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1})$ expected time where $op$ is the size of the output, and each witness query costs $O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N)$ expected time.

preprint2015arXiv

Faster Worst Case Deterministic Dynamic Connectivity

We present a deterministic dynamic connectivity data structure for undirected graphs with worst case update time $O\left(\sqrt{\frac{n(\log\log n)^2}{\log n}}\right)$ and constant query time. This improves on the previous best deterministic worst case algorithm of Frederickson (STOC 1983) and Eppstein Galil, Italiano, and Nissenzweig (J. ACM 1997), which had update time $O(\sqrt{n})$. All other algorithms for dynamic connectivity are either randomized (Monte Carlo) or have only amortized performance guarantees.

preprint2015arXiv

Mind the Gap

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Finding efficient algorithmic solutions for (online) DMOG has proven to be a difficult algorithmic challenge. We demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem even, in the offline setting, can be traced back to the infamous 3SUM conjecture. Interestingly, our reduction deviates from the known reduction paths that follow from 3SUM. In particular, most reductions from 3SUM go through the set-disjointness problem, which corresponds to the problem of preprocessing a graph to answer edge-triangles queries. We use a new path of reductions by considering the complementary, although structurally very different, vertex-triangles queries. Using this new path we show a conditional lower bound of $Ω(δ(G_D)+op)$ time per text character, where $G_D$ is a bipartite graph that captures the structure of $D$, $δ(G_D)$ is the degeneracy of this graph, and $op$ is the output size. We also provide matching upper-bounds (up to sub-polynomial factors) for the vertex-triangles problem, and then extend these techniques to the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on $δ(G_D)$. Our algorithms make use of graph orientations, together with some additional techniques. Finally, when $δ(G_D)$ is large we are able to obtain even more efficient solutions.

preprint2014arXiv

The Family Holiday Gathering Problem or Fair and Periodic Scheduling of Independent Sets

We introduce and examine the {\em Holiday Gathering Problem} which models the difficulty that couples have when trying to decide with which parents should they spend the holiday. Our goal is to schedule the family gatherings so that the parents that will be {\em happy}, i.e.\ all their children will be home {\em simultaneously} for the holiday festivities, while minimizing the number of consecutive holidays in which parents are not happy. The holiday gathering problem is closely related to several classical problems in computer science, such as the {\em dining philosophers problem} on a general graph and periodic scheduling,and has applications in scheduling of transmissions made by cellular radios. We also show interesting connections between periodic scheduling, coloring, and universal prefix free encodings. The combinatorial definition of the Holiday Gathering Problem is: given a graph $G$, find an infinite sequence of independent-sets of $G$. The objective function is to minimize, for every node $v$, the maximal gap between two appearances of $v$. In good solutions this gap depends on local properties of the node (i.e., its degree) and the the solution should be periodic, i.e.\ a node appears every fixed number of periods. We show a coloring-based construction where the period of each node colored with the $c$ is at most $2^{1+\log^*c}\cdot\prod_{i=0}^{\log^*c} \log^{(i)}c$ (where $\log^{(i)}$ means iterating the $\log$ function $i$ times). This is achieved via a connection with {\it prefix-free encodings}. We prove that this is the best possible for coloring-based solutions. We also show a construction with period at most $2d$ for a node of degree $d$.

preprint2013arXiv

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

This paper presents a general technique for optimally transforming any dynamic data structure that operates on atomic and indivisible keys by constant-time comparisons, into a data structure that handles unbounded-length keys whose comparison cost is not a constant. Examples of these keys are strings, multi-dimensional points, multiple-precision numbers, multi-key data (e.g.~records), XML paths, URL addresses, etc. The technique is more general than what has been done in previous work as no particular exploitation of the underlying structure of is required. The only requirement is that the insertion of a key must identify its predecessor or its successor. Using the proposed technique, online suffix tree can be constructed in worst case time $O(\log n)$ per input symbol (as opposed to amortized $O(\log n)$ time per symbol, achieved by previously known algorithms). To our knowledge, our algorithm is the first that achieves $O(\log n)$ worst case time per input symbol. Searching for a pattern of length $m$ in the resulting suffix tree takes $O(\min(m\log |Σ|, m + \log n) + tocc)$ time, where $tocc$ is the number of occurrences of the pattern. The paper also describes more applications and show how to obtain alternative methods for dealing with suffix sorting, dynamic lowest common ancestors and order maintenance.

preprint2013arXiv

Orienting Fully Dynamic Graphs with Worst-Case Time Bounds

In edge orientations, the goal is usually to orient (direct) the edges of an undirected $n$-vertex graph $G$ such that all out-degrees are bounded. When the graph $G$ is fully dynamic, i.e., admits edge insertions and deletions, we wish to maintain such an orientation while keeping a tab on the update time. Low out-degree orientations turned out to be a surprisingly useful tool, with several algorithmic applications involving static or dynamic graphs. Brodal and Fagerberg (1999) initiated the study of the edge orientation problem in terms of the graph's arboricity, which is very natural in this context. They provided a solution with constant out-degree and \emph{amortized} logarithmic update time for all graphs with constant arboricity, which include all planar and excluded-minor graphs. However, it remained an open question (first proposed by Brodal and Fagerberg, later by others) to obtain similar bounds with worst-case update time. We resolve this 15 year old question in the affirmative, by providing a simple algorithm with worst-case bounds that nearly match the previous amortized bounds. Our algorithm is based on a new approach of a combinatorial invariant, and achieves a logarithmic out-degree with logarithmic worst-case update times. This result has applications in various dynamic graph problems such as maintaining a maximal matching, where we obtain $O(\log n)$ worst-case update time compared to the $O(\frac{\log n}{\log\log n})$ amortized update time of Neiman and Solomon (2013).

preprint2013arXiv

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Suffix trees and suffix arrays are two of the most widely used data structures for text indexing. Each uses linear space and can be constructed in linear time for polynomially sized alphabets. However, when it comes to answering queries with worst-case deterministic time bounds, the prior does so in $O(m\log|Σ|)$ time, where $m$ is the query size, $|Σ|$ is the alphabet size, and the latter does so in $O(m+\log n)$ time, where $n$ is the text size. If one wants to output all appearances of the query, an additive cost of $O(occ)$ time is sufficient, where $occ$ is the size of the output. We propose a novel way of combining the two into, what we call, a {\em suffix tray}. The space and construction time remain linear and the query time improves to $O(m+\log|Σ|)$ for integer alphabets from a linear range, i.e. $Σ\subset \{1,\cdots, cn\}$, for an arbitrary constant $c$. The construction and query are deterministic. Here also an additive $O(occ)$ time is sufficient if one desires to output all appearances of the query. We also consider the online version of indexing, where the text arrives online, one character at a time, and indexing queries are answered in tandem. In this variant we create a cross between a suffix tree and a suffix list (a dynamic variant of suffix array) to be called a {\em suffix trist}; it supports queries in $O(m+\log|Σ|)$ time. The suffix trist also uses linear space. Furthermore, if there exists an online construction for a linear-space suffix tree such that the cost of adding a character is worst-case deterministic $f(n,|Σ|)$ ($n$ is the size of the current text), then one can further update the suffix trist in $O(f(n,|Σ|)+\log |Σ|)$ time. The best currently known worst-case deterministic bound for $f(n,|Σ|)$ is $O(\log n)$ time.

preprint2012arXiv

Faster Clustering via Preprocessing

We examine the efficiency of clustering a set of points, when the encompassing metric space may be preprocessed in advance. In computational problems of this genre, there is a first stage of preprocessing, whose input is a collection of points $M$; the next stage receives as input a query set $Q\subset M$, and should report a clustering of $Q$ according to some objective, such as 1-median, in which case the answer is a point $a\in M$ minimizing $\sum_{q\in Q} d_M(a,q)$. We design fast algorithms that approximately solve such problems under standard clustering objectives like $p$-center and $p$-median, when the metric $M$ has low doubling dimension. By leveraging the preprocessing stage, our algorithms achieve query time that is near-linear in the query size $n=|Q|$, and is (almost) independent of the total number of points $m=|M|$.

preprint2012arXiv

On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List

The problem of Text Indexing is a fundamental algorithmic problem in which one wishes to preprocess a text in order to quickly locate pattern queries within the text. In the ever evolving world of dynamic and on-line data, there is also a need for developing solutions to index texts which arrive on-line, i.e. a character at a time, and still be able to quickly locate said patterns. In this paper, a new solution for on-line indexing is presented by providing an on-line suffix tree construction in $O(\log \log n + \log\log |Σ|)$ worst-case expected time per character, where $n$ is the size of the string, and $Σ$ is the alphabet. This improves upon all previously known on-line suffix tree constructions for general alphabets, at the cost of having the run time in expectation. The main idea is to reduce the problem of constructing a suffix tree on-line to an interesting variant of the order maintenance problem, which may be of independent interest. In the famous order maintenance problem, one wishes to maintain a dynamic list $L$ of size $n$ under insertions, deletions, and order queries. In an order query, one is given two nodes from $L$ and must determine which node precedes the other in $L$. In the Predecessor search on Dynamic Subsets of an Ordered Dynamic List problem (POLP) it is also necessary to maintain dynamic subsets of $L$ such that given some $u\in L$ it will be possible to quickly locate the predecessor of $u$ in any subset. This paper provides an efficient data structure capable of solving the POLP with worst-case expected bounds that match the currently best known bounds for predecessor search in the RAM model, improving over a solution which may be implicitly obtained from Dietz [Die89]. Furthermore, this paper improves or simplifies bounds for several additional applications, including fully-persistent arrays and the Order-Maintenance Problem.

preprint2012arXiv

Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting

The selection problem, where one wishes to locate the $k^{th}$ smallest element in an unsorted array of size $n$, is one of the basic problems studied in computer science. The main focus of this work is designing algorithms for solving the selection problem in the presence of memory faults. These can happen as the result of cosmic rays, alpha particles, or hardware failures. Specifically, the computational model assumed here is a faulty variant of the RAM model (abbreviated as FRAM), which was introduced by Finocchi and Italiano. In this model, the content of memory cells might get corrupted adversarially during the execution, and the algorithm is given an upper bound $δ$ on the number of corruptions that may occur. The main contribution of this work is a deterministic resilient selection algorithm with optimal O(n) worst-case running time. Interestingly, the running time does not depend on the number of faults, and the algorithm does not need to know $δ$. The aforementioned resilient selection algorithm can be used to improve the complexity bounds for resilient $k$-d trees developed by Gieseke, Moruz and Vahrenhold. Specifically, the time complexity for constructing a $k$-d tree is improved from $O(n\log^2 n + δ^2)$ to $O(n \log n)$. Besides the deterministic algorithm, a randomized resilient selection algorithm is developed, which is simpler than the deterministic one, and has $O(n + α)$ expected time complexity and O(1) space complexity (i.e., is in-place). This algorithm is used to develop the first resilient sorting algorithm that is in-place and achieves optimal $O(n\log n + αδ)$ expected running time.

preprint2012arXiv

Sparse Suffix Tree Construction with Small Space

We consider the problem of constructing a sparse suffix tree (or suffix array) for $b$ suffixes of a given text $T$ of size $n$, using only $O(b)$ words of space during construction time. Breaking the naive bound of $Ω(nb)$ time for this problem has occupied many algorithmic researchers since a different structure, the (evenly spaced) sparse suffix tree, was introduced by K{ä}rkk{ä}inen and Ukkonen in 1996. While in the evenly spaced sparse suffix tree the suffixes considered must be evenly spaced in $T$, here there is no constraint on the locations of the suffixes. We show that the sparse suffix tree can be constructed in $O(n\log^2b)$ time. To achieve this we develop a technique, which may be of independent interest, that allows to efficiently answer $b$ longest common prefix queries on suffixes of $T$, using only $O(b)$ space. We expect that this technique will prove useful in many other applications in which space usage is a concern. Furthermore, additional tradeoffs between the space usage and the construction time are given.

preprint2010arXiv

Fast, precise and dynamic distance queries

We present an approximate distance oracle for a point set S with n points and doubling dimension λ. For every ε>0, the oracle supports (1+ε)-approximate distance queries in (universal) constant time, occupies space [ε^{-O(λ)} + 2^{O(λ log λ)}]n, and can be constructed in [2^{O(λ)} log3 n + ε^{-O(λ)} + 2^{O(λ log λ)}]n expected time. This improves upon the best previously known constructions, presented by Har-Peled and Mendel. Furthermore, the oracle can be made fully dynamic with expected O(1) query time and only 2^{O(λ)} log n + ε^{-O(λ)} + 2^{O(λ log λ)} update time. This is the first fully dynamic (1+ε)-distance oracle.

Tsvi Kopelowitz

What is connected

Connect this record

See the researcher in context

Building this map preview

20 published item(s)

An Improved Algorithm for The $k$-Dyck Edit Distance Problem

Approximating Text-to-Pattern Hamming Distances

Contention Resolution Without Collision Detection

Improved Circular $k$-Mismatch Sketches

The Streaming k-Mismatch Problem: Tradeoffs between Space and Total Time

Streaming Pattern Matching with d Wildcards

An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model

Breaking the Variance: Approximating the Hamming Distance in $\tilde O(1/ε)$ Time Per Alignment

Dynamic Set Intersection

Faster Worst Case Deterministic Dynamic Connectivity

Mind the Gap

The Family Holiday Gathering Problem or Fair and Periodic Scheduling of Independent Sets

Managing Unbounded-Length Keys in Comparison-Driven Data Structures with Applications to On-Line Indexing

Orienting Fully Dynamic Graphs with Worst-Case Time Bounds

Suffix Trays and Suffix Trists: Structures for Faster Text Indexing

Faster Clustering via Preprocessing

On-line Indexing for General Alphabets via Predecessor Queries on Subsets of an Ordered List

Selection in the Presence of Memory Faults, with Applications to In-place Resilient Sorting

Sparse Suffix Tree Construction with Small Space

Fast, precise and dynamic distance queries