Source author record

Seth Pettie

Seth Pettie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Distributed, Parallel, and Cluster Computing math.CO Computational Complexity Computational Geometry Discrete Mathematics Databases Information Theory math.IT math.ST Statistics Theory

Catalog footprint

What is connected

25works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Byzantine Agreement in Polynomial Time with Near-Optimal Resilience

It has been known since the early 1980s that Byzantine Agreement in the full information, asynchronous model is impossible to solve deterministically against even one crash fault [FLP85], but that it can be solved with probability 1 [Ben83], even against an adversary that controls the scheduling of all messages and corrupts up to $f<n/3$ players [Bra87]. The main downside of [Ben83, Bra87] is that they terminate in $2^{Θ(n)}$ rounds in expectation whenever $f=Θ(n)$. King and Saia [KS16, KS18(arXiv:1812.10169)] developed a polynomial protocol (polynomial rounds, polynomial computation) that is resilient to $f < (1.14\times 10^{-9})n$ Byzantine faults. The new idea in their protocol is to detect -- and blacklist -- coalitions of likely-bad players by analyzing the deviations of random variables generated by those players over many rounds. In this work we design a simple collective coin-flipping protocol such that if any coalition of faulty players repeatedly does not follow protocol, then they will eventually be detected by one of two simple statistical tests. Using this coin-flipping protocol, we solve Byzantine Agreement in a polynomial number of rounds, even in the presence of up to $f<n/4$ Byzantine faults. This comes close to the $f<n/3$ upper bound on the maximum number of faults [BT85,FLM86,LSP82].

preprint2022arXiv

Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection

Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha's 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{Θ(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions. In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively. Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin's behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist'' players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.

preprint2022arXiv

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

\emph{Cardinality Estimation} (aka \emph{Distinct Elements}) is a classic problem in sketching with many industrial applications. Although sketching \emph{algorithms} are fairly simple, analyzing the cardinality \emph{estimators} is notoriously difficult, and even today the state-of-the-art sketches such as HyperLogLog and (compressed) \PCSA{} are not covered in graduate level Big Data courses. In this paper we define a class of \emph{generalized remaining area} (\tGRA) estimators, and observe that HyperLogLog, LogLog, and some estimators for PCSA are merely instantiations of \tGRA{} for various integral values of $τ$. We then analyze the limiting relative variance of \tGRA{} estimators. It turns out that the standard estimators for HyperLogLog and PCSA can be improved by choosing a \emph{fractional} value of $τ$. The resulting estimators come \emph{very} close to the Cramér-Rao lower bounds for HyperLogLog{} and PCSA derived from their Fisher information. Although the Cramér-Rao lower bound \emph{can} be achieved with the Maximum Likelihood Estimator (MLE), the MLE is cumbersome to compute and dynamically update. In contrast, \tGRA{} estimators are trivial to update in constant time. Our presentation assumes only basic calculus and probability, not any complex analysis~\cite{FlajoletM85,DurandF03,FlajoletFGM07}.

preprint2022arXiv

Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks

We consider networks of small, autonomous devices that communicate with each other wirelessly. Minimizing energy usage is an important consideration in designing algorithms for such networks, as battery life is a crucial and limited resource. Working in a model where both sending and listening for messages deplete energy, we consider the problem of finding a maximal matching of the nodes in a radio network of arbitrary and unknown topology. We present a distributed randomized algorithm that produces, with high probability, a maximal matching. The maximum energy cost per node is $O(\log^2 n)$, where $n$ is the size of the network. The total latency of our algorithm is $O(n \log n)$ time steps. We observe that there exist families of network topologies for which both of these bounds are simultaneously optimal up to polylog factors, so any significant improvement will require additional assumptions about the network topology. We also consider the related problem of assigning, for each node in the network, a neighbor to back up its data in case of node failure. Here, a key goal is to minimize the maximum load, defined as the number of nodes assigned to a single node. We present a decentralized low-energy algorithm that finds a neighbor assignment whose maximum load is at most a polylog($n$) factor bigger that the optimum.

preprint2021arXiv

Non-Mergeable Sketching for Cardinality Estimation

Cardinality estimation is perhaps the simplest non-trivial statistical problem that can be solved via sketching. Industrially-deployed sketches like HyperLogLog, MinHash, and PCSA are mergeable, which means that large data sets can be sketched in a distributed environment, and then merged into a single sketch of the whole data set. In the last decade a variety of sketches have been developed that are non-mergeable, but attractive for other reasons. They are simpler, their cardinality estimates are strictly unbiased, and they have substantially lower variance. We evaluate sketching schemes on a reasonably level playing field, in terms of their memory-variance product (MVP). E.g., a sketch that occupies $5m$ bits and whose relative variance is $2/m$ (standard error $\sqrt{2/m}$) has an MVP of $10$. Our contributions are as follows. Cohen and Ting independently discovered what we call the Martingale transform for converting a mergeable sketch into a non-mergeable sketch. We present a simpler way to analyze the limiting MVP of Martingale-type sketches. We prove that the \Martingale{} transform is optimal in the non-mergeable world, and that \Martingale{} \fishmonger{} in particular is optimal among linearizable sketches, with an MVP of $H_0/2 \approx 1.63$. E.g., this is circumstantial evidence that to achieve 1\% standard error, we cannot do better than a 2 kilobyte sketch. \Martingale{} \fishmonger{} is neither simple nor practical. We develop a new mergeable sketch called \Curtain{} that strikes a nice balance between simplicity and efficiency, and prove that \Martingale{} \Curtain{} has limiting $\MVP\approx 2.31$. It can be updated with $O(1)$ memory accesses and it has lower empirical variance than \Martingale{} \LogLog, a practical non-mergeable version of HyperLogLog.

preprint2021arXiv

The Structure of Minimum Vertex Cuts

In this paper we continue a long line of work on representing the cut structure of graphs. We classify the types minimum vertex cuts, and the possible relationships between multiple minimum vertex cuts. As a consequence of these investigations, we exhibit a simple $O(κn)$-space data structure that can quickly answer pairwise $(κ+1)$-connectivity queries in a $κ$-connected graph. We also show how to compute the "closest" $κ$-cut to every vertex in near linear $\tilde{O}(m+poly(κ)n)$ time.

preprint2020arXiv

Approximate Generalized Matching: $f$-Factors and $f$-Edge Covers

In this paper we present linear time approximation schemes for several generalized matching problems on nonbipartite graphs. Our results include $O_ε(m)$-time algorithms for $(1-ε)$-maximum weight $f$-factor and $(1+ε)$-approximate minimum weight $f$-edge cover. As a byproduct, we also obtain direct algorithms for the exact cardinality versions of these problems running in $O(m\sqrt{f(V)})$ time. The technical contributions of this work include an efficient method for maintaining {\em relaxed complementary slackness} in generalized matching problems and approximation-preserving reductions between the $f$-factor and $f$-edge cover problems.

preprint2020arXiv

Contention Resolution Without Collision Detection

This paper focuses on the contention resolution problem on a shared communication channel that does not support collision detection. A shared communication channel is a multiple access channel, which consists of a sequence of synchronized time slots. Players on the channel may attempt to broadcast a packet (message) in any time slot. A player's broadcast succeeds if no other player broadcasts during that slot. If two or more players broadcast in the same time slot, then the broadcasts collide and both broadcasts fail. The lack of collision detection means that a player monitoring the channel cannot differentiate between the case of two or more players broadcasting in the same slot (a collision) and zero players broadcasting. In the contention-resolution problem, players arrive on the channel over time, and each player has one packet to transmit. The goal is to coordinate the players so that each player is able to successfully transmit its packet within reasonable time. However, the players can only communicate via the shared channel by choosing to either broadcast or not. A contention-resolution protocol is measured in terms of its throughput (channel utilization). Previous work on contention resolution that achieved constant throughput assumed that either players could detect collisions, or the players' arrival pattern is generated by a memoryless (non-adversarial) process. The foundational question answered by this paper is whether collision detection is a luxury or necessity when the objective is to achieve constant throughput. We show that even without collision detection, one can solve contention resolution, achieving constant throughput, with high probability.

preprint2020arXiv

Joins on Samples: A Theoretical Guide for Practitioners

Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximation's accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics.

preprint2020arXiv

Planar Distance Oracles with Better Time-Space Tradeoffs

In a recent breakthrough, Charalampopoulos, Gawrychowski, Mozes, and Weimann (STOC 2019) showed that exact distance queries on planar graphs could be answered in $n^{o(1)}$ time by a data structure occupying $n^{1+o(1)}$ space, i.e., up to $o(1)$ terms, optimal exponents in time (0) and space (1) can be achieved simultaneously. Their distance query algorithm is recursive: it makes successive calls to a point-location algorithm for planar Voronoi diagrams, which involves many recursive distance queries. The depth of this recursion is non-constant and the branching factor logarithmic, leading to $(\log n)^{ω(1)} = n^{o(1)}$ query times. In this paper we present a new way to do point-location in planar Voronoi diagrams, which leads to a new exact distance oracle. At the two extremes of our space-time tradeoff curve we can achieve either $n^{1+o(1)}$ space and $\log^{2+o(1)}n$ query time, or $n\log^{2+o(1)}n$ space and $n^{o(1)}$ query time. All previous oracles with $\tilde{O}(1)$ query time occupy space $n^{1+Ω(1)}$, and all previous oracles with space $\tilde{O}(n)$ answer queries in $n^{Ω(1)}$ time.

preprint2020arXiv

The Communication Complexity of Set Intersection and Multiple Equality Testing

In this paper we explore fundamental problems in randomized communication complexity such as computing Set Intersection on sets of size $k$ and Equality Testing between vectors of length $k$. Sağlam and Tardos and Brody et al. showed that for these types of problems, one can achieve optimal communication volume of $O(k)$ bits, with a randomized protocol that takes $O(\log^* k)$ rounds. Aside from rounds and communication volume, there is a \emph{third} parameter of interest, namely the \emph{error probability} $p_{\mathrm{err}}$. It is straightforward to show that protocols for Set Intersection or Equality Testing need to send $Ω(k + \log p_{\mathrm{err}}^{-1})$ bits. Is it possible to simultaneously achieve optimality in all three parameters, namely $O(k + \log p_{\mathrm{err}}^{-1})$ communication and $O(\log^* k)$ rounds? In this paper we prove that there is no universally optimal algorithm, and complement the existing round-communication tradeoffs with a new tradeoff between rounds, communication, and probability of error. In particular: 1. Any protocol for solving Multiple Equality Testing in $r$ rounds with failure probability $2^{-E}$ has communication volume $Ω(Ek^{1/r})$. 2. There exists a protocol for solving Multiple Equality Testing in $r + \log^*(k/E)$ rounds with $O(k + rEk^{1/r})$ communication, thereby essentially matching our lower bound and that of Sağlam and Tardos. Our original motivation for considering $p_{\mathrm{err}}$ as an independent parameter came from the problem of enumerating triangles in distributed ($\textsf{CONGEST}$) networks having maximum degree $Δ$. We prove that this problem can be solved in $O(Δ/\log n + \log\log Δ)$ time with high probability $1-1/\operatorname{poly}(n)$.

preprint2020arXiv

The Energy Complexity of BFS in Radio Networks

We consider a model of energy complexity in Radio Networks in which transmitting or listening on the channel costs one unit of energy and computation is free. This simplified model captures key aspects of battery-powered sensors: that battery life is most influenced by transceiver usage, and that at low transmission powers, the actual cost of transmitting and listening are very similar. The energy complexity of tasks in single-hop networks is well understood. Recent work of Chang et al. considered energy complexity in multi-hop networks and showed that $\mathsf{Broadcast}$ admits an energy-efficient protocol, by which we mean each of the $n$ nodes in the network spends $O(\text{polylog}(n))$ energy. This work left open the strange possibility that all natural problems in multi-hop networks might admit such an energy-efficient solution. In this paper we prove that the landscape of energy complexity is rich enough to support a multitude of problem complexities. Whereas $\mathsf{Broadcast}$ can be solved by an energy-efficient protocol, exact computation of $\mathsf{Diameter}$ cannot, requiring $Ω(n)$ energy. Our main result is that $\mathsf{Breadth First Search}$ has sub-polynomial energy complexity at most $2^{O(\sqrt{\log n\log\log n})}=n^{o(1)}$; whether it admits an efficient $O(\text{polylog}(n))$-energy protocol is an open problem. Our main algorithm involves recursively solving a generalized BFS problem on a cluster graph introduced by Miller, Peng, and Xu. In this application, we make crucial use of a close relationship between distances in this cluster graph, and distances in the original network. This relationship is new and may be of independent interest.

preprint2016arXiv

An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model

Over the past 30 years numerous algorithms have been designed for symmetry breaking problems in the LOCAL model, such as maximal matching, MIS, vertex coloring, and edge-coloring. For most problems the best randomized algorithm is at least exponentially faster than the best deterministic algorithm. In this paper we prove that these exponential gaps are necessary and establish connections between the deterministic and randomized complexities in the LOCAL model. Each result has a very compelling take-away message: 1. Fast $Δ$-coloring of trees requires random bits: Building on the recent lower bounds of Brandt et al., we prove that the randomized complexity of $Δ$-coloring a tree with maximum degree $Δ\ge 55$ is $Θ(\log_Δ\log n)$, whereas its deterministic complexity is $Θ(\log_Δn)$ for any $Δ\ge 3$. This also establishes a large separation between the deterministic complexity of $Δ$-coloring and $(Δ+1)$-coloring trees. 2. Randomized lower bounds imply deterministic lower bounds: We prove that any deterministic algorithm for a natural class of problems that runs in $O(1)+o(\log_Δn)$ rounds can be transformed to run in $O(\log^*n-\log^*Δ+1)$ rounds. If the transformed algorithm violates a lower bound (even allowing randomization), then one can conclude that the problem requires $Ω(\log_Δn)$ time deterministically. 3. Deterministic lower bounds imply randomized lower bounds: We prove that the randomized complexity of any natural problem on instances of size $n$ is at least its deterministic complexity on instances of size $\sqrt{\log n}$. This shows that a deterministic $Ω(\log_Δn)$ lower bound for any problem implies a randomized $Ω(\log_Δ\log n)$ lower bound. It also illustrates that the graph shattering technique is absolutely essential to the LOCAL model.

preprint2016arXiv

Lower Bounds on Davenport-Schinzel Sequences via Rectangular Zarankiewicz Matrices

An order-$s$ Davenport-Schinzel sequence over an $n$-letter alphabet is one avoiding immediate repetitions and alternating subsequences with length $s+2$. The main problem is to determine the maximum length of such a sequence, as a function of $n$ and $s$. When $s$ is fixed this problem has been settled but when $s$ is a function of $n$, very little is known about the extremal function $λ(s,n)$ of such sequences. In this paper we give a new recursive construction of Davenport-Schinzel sequences that is based on dense 0-1 matrices avoiding large all-1 submatrices (aka Zarankiewicz's Problem.) In particular, we give a simple construction of $n^{2/t} \times n$ matrices containing $n^{1+1/t}$ 1s that avoid $t\times 2$ all-1 submatrices. Our lower bounds on $λ(s,n)$ exhibit three qualitatively different behaviors depending on the size of $s$ relative to $n$. When $s \le \log\log n$ we show that $λ(s,n)/n \ge 2^s$ grows exponentially with $s$. When $s = n^{o(1)}$ we show $λ(s,n)/n \ge (\frac{s}{2\log\log_s n})^{\log\log_s n}$ grows faster than any polynomial in $s$. Finally, when $s=Ω(n^{1/t}(t-1)!)$, $λ(s,n) = Ω(n^2 s/(t-1)!)$ matches the trivial upper bound $O(n^2s)$ asymptotically, whenever $t$ is constant.

preprint2015arXiv

A Linear-Size Logarithmic Stretch Path-Reporting Distance Oracle for General Graphs

In 2001 Thorup and Zwick devised a distance oracle, which given an $n$-vertex undirected graph and a parameter $k$, has size $O(k n^{1+1/k})$. Upon a query $(u,v)$ their oracle constructs a $(2k-1)$-approximate path $Π$ between $u$ and $v$. The query time of the Thorup-Zwick's oracle is $O(k)$, and it was subsequently improved to $O(1)$ by Chechik. A major drawback of the oracle of Thorup and Zwick is that its space is $Ω(n \cdot \log n)$. Mendel and Naor devised an oracle with space $O(n^{1+1/k})$ and stretch $O(k)$, but their oracle can only report distance estimates and not actual paths. In this paper we devise a path-reporting distance oracle with size $O(n^{1+1/k})$, stretch $O(k)$ and query time $O(n^ε)$, for an arbitrarily small $ε> 0$. In particular, our oracle can provide logarithmic stretch using linear size. Another variant of our oracle has size $O(n \log\log n)$, polylogarithmic stretch, and query time $O(\log\log n)$. For unweighted graphs we devise a distance oracle with multiplicative stretch $O(1)$, additive stretch $O(β(k))$, for a function $β(\cdot)$, space $O(n^{1+1/k} \cdot β)$, and query time $O(n^ε)$, for an arbitrarily small constant $ε>0$. The tradeoff between multiplicative stretch and size in these oracles is far below girth conjecture threshold (which is stretch $2k-1$ and size $O(n^{1+1/k})$). Breaking the girth conjecture tradeoff is achieved by exhibiting a tradeoff of different nature between additive stretch $β(k)$ and size $O(n^{1+1/k})$. A similar type of tradeoff was exhibited by a construction of $(1+ε,β)$-spanners due to Elkin and Peleg. However, so far $(1+ε,β)$-spanners had no counterpart in the distance oracles' world. An important novel tool that we develop on the way to these results is a {distance-preserving path-reporting oracle}.

preprint2015arXiv

Dynamic Set Intersection

Consider the problem of maintaining a family $F$ of dynamic sets subject to insertions, deletions, and set-intersection reporting queries: given $S,S'\in F$, report every member of $S\cap S'$ in any order. We show that in the word RAM model, where $w$ is the word size, given a cap $d$ on the maximum size of any set, we can support set intersection queries in $O(\frac{d}{w/\log^2 w})$ expected time, and updates in $O(\log w)$ expected time. Using this algorithm we can list all $t$ triangles of a graph $G=(V,E)$ in $O(m+\frac{mα}{w/\log^2 w} +t)$ expected time, where $m=|E|$ and $α$ is the arboricity of $G$. This improves a 30-year old triangle enumeration algorithm of Chiba and Nishizeki running in $O(m α)$ time. We provide an incremental data structure on $F$ that supports intersection {\em witness} queries, where we only need to find {\em one} $e\in S\cap S'$. Both queries and insertions take $O\paren{\sqrt \frac{N}{w/\log^2 w}}$ expected time, where $N=\sum_{S\in F} |S|$. Finally, we provide time/space tradeoffs for the fully dynamic set intersection reporting problem. Using $M$ words of space, each update costs $O(\sqrt {M \log N})$ expected time, each reporting query costs $O(\frac{N\sqrt{\log N}}{\sqrt M}\sqrt{op+1})$ expected time where $op$ is the size of the output, and each witness query costs $O(\frac{N\sqrt{\log N}}{\sqrt M} + \log N)$ expected time.

preprint2015arXiv

Faster Worst Case Deterministic Dynamic Connectivity

We present a deterministic dynamic connectivity data structure for undirected graphs with worst case update time $O\left(\sqrt{\frac{n(\log\log n)^2}{\log n}}\right)$ and constant query time. This improves on the previous best deterministic worst case algorithm of Frederickson (STOC 1983) and Eppstein Galil, Italiano, and Nissenzweig (J. ACM 1997), which had update time $O(\sqrt{n})$. All other algorithms for dynamic connectivity are either randomized (Monte Carlo) or have only amortized performance guarantees.

preprint2015arXiv

Mind the Gap

We examine the complexity of the online Dictionary Matching with One Gap Problem (DMOG) which is the following. Preprocess a dictionary $D$ of $d$ patterns, where each pattern contains a special gap symbol that can match any string, so that given a text that arrives online, a character at a time, we can report all of the patterns from $D$ that are suffixes of the text that has arrived so far, before the next character arrives. In more general versions the gap symbols are associated with bounds determining the possible lengths of matching strings. Finding efficient algorithmic solutions for (online) DMOG has proven to be a difficult algorithmic challenge. We demonstrate that the difficulty in obtaining efficient solutions for the DMOG problem even, in the offline setting, can be traced back to the infamous 3SUM conjecture. Interestingly, our reduction deviates from the known reduction paths that follow from 3SUM. In particular, most reductions from 3SUM go through the set-disjointness problem, which corresponds to the problem of preprocessing a graph to answer edge-triangles queries. We use a new path of reductions by considering the complementary, although structurally very different, vertex-triangles queries. Using this new path we show a conditional lower bound of $Ω(δ(G_D)+op)$ time per text character, where $G_D$ is a bipartite graph that captures the structure of $D$, $δ(G_D)$ is the degeneracy of this graph, and $op$ is the output size. We also provide matching upper-bounds (up to sub-polynomial factors) for the vertex-triangles problem, and then extend these techniques to the online DMOG problem. In particular, we introduce algorithms whose time cost depends linearly on $δ(G_D)$. Our algorithms make use of graph orientations, together with some additional techniques. Finally, when $δ(G_D)$ is large we are able to obtain even more efficient solutions.

preprint2015arXiv

The Locality of Distributed Symmetry Breaking

Symmetry breaking problems are among the most well studied in the field of distributed computing and yet the most fundamental questions about their complexity remain open. In this paper we work in the LOCAL model (where the input graph and underlying distributed network are identical) and study the randomized complexity of four fundamental symmetry breaking problems on graphs: computing MISs (maximal independent sets), maximal matchings, vertex colorings, and ruling sets. A small sample of our results includes - An MIS algorithm running in $O(\log^2Δ+ 2^{O(\sqrt{\log\log n})})$ time, where $Δ$ is the maximum degree. This is the first MIS algorithm to improve on the 1986 algorithms of Luby and Alon, Babai, and Itai, when $\log n \ll Δ\ll 2^{\sqrt{\log n}}$, and comes close to the $Ω(\log Δ)$ lower bound of Kuhn, Moscibroda, and Wattenhofer. - A maximal matching algorithm running in $O(\logΔ+ \log^4\log n)$ time. This is the first significant improvement to the 1986 algorithm of Israeli and Itai. Moreover, its dependence on $Δ$ is provably optimal. - A method for reducing symmetry breaking problems in low arboricity/degeneracy graphs to low degree graphs. (Roughly speaking, the arboricity or degeneracy of a graph bounds the density of any subgraph.) Corollaries of this reduction include an $O(\sqrt{\log n})$-time maximal matching algorithm for graphs with arboricity up to $2^{\sqrt{\log n}}$ and an $O(\log^{2/3} n)$-time MIS algorithm for graphs with arboricity up to $2^{(\log n)^{1/3}}$. Each of our algorithms is based on a simple, but powerful technique for reducing a randomized symmetry breaking task to a corresponding deterministic one on a poly$(\log n)$-size graph.

preprint2014arXiv

Sensitivity Analysis of Minimum Spanning Trees in Sub-Inverse-Ackermann Time

We present a deterministic algorithm for computing the sensitivity of a minimum spanning tree (MST) or shortest path tree in $O(m\logα(m,n))$ time, where $α$ is the inverse-Ackermann function. This improves upon a long standing bound of $O(mα(m,n))$ established by Tarjan. Our algorithms are based on an efficient split-findmin data structure, which maintains a collection of sequences of weighted elements that may be split into smaller subsequences. As far as we are aware, our split-findmin algorithm is the first with superlinear but sub-inverse-Ackermann complexity. We also give a reduction from MST sensitivity to the MST problem itself. Together with the randomized linear time MST algorithm of Karger, Klein, and Tarjan, this gives another randomized linear time MST sensitivity algoritm.

preprint2014arXiv

Three Generalizations of Davenport-Schinzel Sequences

We present new, and mostly sharp, bounds on the maximum length of certain generalizations of Davenport-Schinzel sequences. Among the results are sharp bounds on order-$s$ {\em double DS} sequences, for all $s$, sharp bounds on sequences avoiding {\em catenated permutations} (aka formation free sequences), and new lower bounds on sequences avoiding {\em zig-zagging} patterns.

preprint2014arXiv

Threesomes, Degenerates, and Love Triangles

The 3SUM problem is to decide, given a set of $n$ real numbers, whether any three sum to zero. It is widely conjectured that a trivial $O(n^2)$-time algorithm is optimal and over the years the consequences of this conjecture have been revealed. This 3SUM conjecture implies $Ω(n^2)$ lower bounds on numerous problems in computational geometry and a variant of the conjecture implies strong lower bounds on triangle enumeration, dynamic graph algorithms, and string matching data structures. In this paper we refute the 3SUM conjecture. We prove that the decision tree complexity of 3SUM is $O(n^{3/2}\sqrt{\log n})$ and give two subquadratic 3SUM algorithms, a deterministic one running in $O(n^2 / (\log n/\log\log n)^{2/3})$ time and a randomized one running in $O(n^2 (\log\log n)^2 / \log n)$ time with high probability. Our results lead directly to improved bounds for $k$-variate linear degeneracy testing for all odd $k\ge 3$. The problem is to decide, given a linear function $f(x_1,\ldots,x_k) = α_0 + \sum_{1\le i\le k} α_i x_i$ and a set $A \subset \mathbb{R}$, whether $0\in f(A^k)$. We show the decision tree complexity of this problem is $O(n^{k/2}\sqrt{\log n})$. Finally, we give a subcubic algorithm for a generalization of the $(\min,+)$-product over real-valued matrices and apply it to the problem of finding zero-weight triangles in weighted graphs. We give a depth-$O(n^{5/2}\sqrt{\log n})$ decision tree for this problem, as well as an algorithm running in time $O(n^3 (\log\log n)^2/\log n)$.

preprint2013arXiv

Sharp Bounds on Davenport-Schinzel Sequences of Every Order

One of the longest-standing open problems in computational geometry is to bound the lower envelope of $n$ univariate functions, each pair of which crosses at most $s$ times, for some fixed $s$. This problem is known to be equivalent to bounding the length of an order-$s$ Davenport-Schinzel sequence, namely a sequence over an $n$-letter alphabet that avoids alternating subsequences of the form $a \cdots b \cdots a \cdots b \cdots$ with length $s+2$. These sequences were introduced by Davenport and Schinzel in 1965 to model a certain problem in differential equations and have since been applied to bounding the running times of geometric algorithms, data structures, and the combinatorial complexity of geometric arrangements. Let $λ_s(n)$ be the maximum length of an order-$s$ DS sequence over $n$ letters. What is $λ_s$ asymptotically? This question has been answered satisfactorily (by Hart and Sharir, Agarwal, Sharir, and Shor, Klazar, and Nivasch) when $s$ is even or $s\le 3$. However, since the work of Agarwal, Sharir, and Shor in the mid-1980s there has been a persistent gap in our understanding of the odd orders. In this work we effectively close the problem by establishing sharp bounds on Davenport-Schinzel sequences of every order $s$. Our results reveal that, contrary to one's intuition, $λ_s(n)$ behaves essentially like $λ_{s-1}(n)$ when $s$ is odd. This refutes conjectures due to Alon et al. (2008) and Nivasch (2010).

preprint2012arXiv

Connectivity Oracles for Planar Graphs

We consider dynamic subgraph connectivity problems for planar graphs. In this model there is a fixed underlying planar graph, where each edge and vertex is either "off" (failed) or "on" (recovered). We wish to answer connectivity queries with respect to the "on" subgraph. The model has two natural variants, one in which there are $d$ edge/vertex failures that precede all connectivity queries, and one in which failures/recoveries and queries are intermixed. We present a $d$-failure connectivity oracle for planar graphs that processes any $d$ edge/vertex failures in $sort(d,n)$ time so that connectivity queries can be answered in $pred(d,n)$ time. (Here $sort$ and $pred$ are the time for integer sorting and integer predecessor search over a subset of $[n]$ of size $d$.) Our algorithm has two discrete parts. The first is an algorithm tailored to triconnected planar graphs. It makes use of Barnette's theorem, which states that every triconnected planar graph contains a degree-3 spanning tree. The second part is a generic reduction from general (planar) graphs to triconnected (planar) graphs. Our algorithm is, moreover, provably optimal. An implication of Patrascu and Thorup's lower bound on predecessor search is that no $d$-failure connectivity oracle (even on trees) can beat $pred(d,n)$ query time. We extend our algorithms to the subgraph connectivity model where edge/vertex failures (but no recoveries) are intermixed with connectivity queries. In triconnected planar graphs each failure and query is handled in $O(\log n)$ time (amortized), whereas in general planar graphs both bounds become $O(\log^2 n)$.

preprint2011arXiv

Scaling algorithms for approximate and exact maximum weight matching

The {\em maximum cardinality} and {\em maximum weight matching} problems can be solved in time $\tilde{O}(m\sqrt{n})$, a bound that has resisted improvement despite decades of research. (Here $m$ and $n$ are the number of edges and vertices.) In this article we demonstrate that this "$m\sqrt{n}$ barrier" is extremely fragile, in the following sense. For any $ε>0$, we give an algorithm that computes a $(1-ε)$-approximate maximum weight matching in $O(mε^{-1}\logε^{-1})$ time, that is, optimal {\em linear time} for any fixed $ε$. Our algorithm is dramatically simpler than the best exact maximum weight matching algorithms on general graphs and should be appealing in all applications that can tolerate a negligible relative error. Our second contribution is a new {\em exact} maximum weight matching algorithm for integer-weighted bipartite graphs that runs in time $O(m\sqrt{n}\log N)$. This improves on the $O(Nm\sqrt{n})$-time and $O(m\sqrt{n}\log(nN))$-time algorithms known since the mid 1980s, for $1\ll \log N \ll \log n$. Here $N$ is the maximum integer edge weight.

Seth Pettie

What is connected

Connect this record

See the researcher in context

Building this map preview

25 published item(s)

Byzantine Agreement in Polynomial Time with Near-Optimal Resilience

Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks

Non-Mergeable Sketching for Cardinality Estimation

The Structure of Minimum Vertex Cuts

Approximate Generalized Matching: $f$-Factors and $f$-Edge Covers

Contention Resolution Without Collision Detection

Joins on Samples: A Theoretical Guide for Practitioners

Planar Distance Oracles with Better Time-Space Tradeoffs

The Communication Complexity of Set Intersection and Multiple Equality Testing

The Energy Complexity of BFS in Radio Networks

An Exponential Separation Between Randomized and Deterministic Complexity in the LOCAL Model

Lower Bounds on Davenport-Schinzel Sequences via Rectangular Zarankiewicz Matrices

A Linear-Size Logarithmic Stretch Path-Reporting Distance Oracle for General Graphs

Dynamic Set Intersection

Faster Worst Case Deterministic Dynamic Connectivity

Mind the Gap

The Locality of Distributed Symmetry Breaking

Sensitivity Analysis of Minimum Spanning Trees in Sub-Inverse-Ackermann Time

Three Generalizations of Davenport-Schinzel Sequences

Threesomes, Degenerates, and Love Triangles

Sharp Bounds on Davenport-Schinzel Sequences of Every Order

Connectivity Oracles for Planar Graphs

Scaling algorithms for approximate and exact maximum weight matching