Researcher profile

Seth Pettie

Seth Pettie contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
12works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

12 published item(s)

preprint2022arXiv

Byzantine Agreement in Polynomial Time with Near-Optimal Resilience

It has been known since the early 1980s that Byzantine Agreement in the full information, asynchronous model is impossible to solve deterministically against even one crash fault [FLP85], but that it can be solved with probability 1 [Ben83], even against an adversary that controls the scheduling of all messages and corrupts up to $f<n/3$ players [Bra87]. The main downside of [Ben83, Bra87] is that they terminate in $2^{Θ(n)}$ rounds in expectation whenever $f=Θ(n)$. King and Saia [KS16, KS18(arXiv:1812.10169)] developed a polynomial protocol (polynomial rounds, polynomial computation) that is resilient to $f < (1.14\times 10^{-9})n$ Byzantine faults. The new idea in their protocol is to detect -- and blacklist -- coalitions of likely-bad players by analyzing the deviations of random variables generated by those players over many rounds. In this work we design a simple collective coin-flipping protocol such that if any coalition of faulty players repeatedly does not follow protocol, then they will eventually be detected by one of two simple statistical tests. Using this coin-flipping protocol, we solve Byzantine Agreement in a polynomial number of rounds, even in the presence of up to $f<n/4$ Byzantine faults. This comes close to the $f<n/3$ upper bound on the maximum number of faults [BT85,FLM86,LSP82].

preprint2022arXiv

Byzantine Agreement with Optimal Resilience via Statistical Fraud Detection

Since the mid-1980s it has been known that Byzantine Agreement can be solved with probability 1 asynchronously, even against an omniscient, computationally unbounded adversary that can adaptively \emph{corrupt} up to $f<n/3$ parties. Moreover, the problem is insoluble with $f\geq n/3$ corruptions. However, Bracha&#39;s 1984 protocol achieved $f<n/3$ resilience at the cost of exponential expected latency $2^{Θ(n)}$, a bound that has never been improved in this model with $f=\lfloor (n-1)/3 \rfloor$ corruptions. In this paper we prove that Byzantine Agreement in the asynchronous, full information model can be solved with probability 1 against an adaptive adversary that can corrupt $f<n/3$ parties, while incurring only polynomial latency with high probability. Our protocol follows earlier polynomial latency protocols of King and Saia and Huang, Pettie, and Zhu, which had suboptimal resilience, namely $f \approx n/10^9$ and $f<n/4$, respectively. Resilience $f=(n-1)/3$ is uniquely difficult as this is the point at which the influence of the Byzantine and honest players are of roughly equal strength. The core technical problem we solve is to design a collective coin-flipping protocol that eventually lets us flip a coin with an unambiguous outcome. In the beginning the influence of the Byzantine players is too powerful to overcome and they can essentially fix the coin&#39;s behavior at will. We guarantee that after just a polynomial number of executions of the coin-flipping protocol, either (a) the Byzantine players fail to fix the behavior of the coin (thereby ending the game) or (b) we can ``blacklist&#39;&#39; players such that the blacklisting rate for Byzantine players is at least as large as the blacklisting rate for good players. The blacklisting criterion is based on a simple statistical test of fraud detection.

preprint2022arXiv

Simpler and Better Cardinality Estimators for HyperLogLog and PCSA

\emph{Cardinality Estimation} (aka \emph{Distinct Elements}) is a classic problem in sketching with many industrial applications. Although sketching \emph{algorithms} are fairly simple, analyzing the cardinality \emph{estimators} is notoriously difficult, and even today the state-of-the-art sketches such as HyperLogLog and (compressed) \PCSA{} are not covered in graduate level Big Data courses. In this paper we define a class of \emph{generalized remaining area} (\tGRA) estimators, and observe that HyperLogLog, LogLog, and some estimators for PCSA are merely instantiations of \tGRA{} for various integral values of $τ$. We then analyze the limiting relative variance of \tGRA{} estimators. It turns out that the standard estimators for HyperLogLog and PCSA can be improved by choosing a \emph{fractional} value of $τ$. The resulting estimators come \emph{very} close to the Cramér-Rao lower bounds for HyperLogLog{} and PCSA derived from their Fisher information. Although the Cramér-Rao lower bound \emph{can} be achieved with the Maximum Likelihood Estimator (MLE), the MLE is cumbersome to compute and dynamically update. In contrast, \tGRA{} estimators are trivial to update in constant time. Our presentation assumes only basic calculus and probability, not any complex analysis~\cite{FlajoletM85,DurandF03,FlajoletFGM07}.

preprint2022arXiv

Wake Up and Join Me! An Energy-Efficient Algorithm for Maximal Matching in Radio Networks

We consider networks of small, autonomous devices that communicate with each other wirelessly. Minimizing energy usage is an important consideration in designing algorithms for such networks, as battery life is a crucial and limited resource. Working in a model where both sending and listening for messages deplete energy, we consider the problem of finding a maximal matching of the nodes in a radio network of arbitrary and unknown topology. We present a distributed randomized algorithm that produces, with high probability, a maximal matching. The maximum energy cost per node is $O(\log^2 n)$, where $n$ is the size of the network. The total latency of our algorithm is $O(n \log n)$ time steps. We observe that there exist families of network topologies for which both of these bounds are simultaneously optimal up to polylog factors, so any significant improvement will require additional assumptions about the network topology. We also consider the related problem of assigning, for each node in the network, a neighbor to back up its data in case of node failure. Here, a key goal is to minimize the maximum load, defined as the number of nodes assigned to a single node. We present a decentralized low-energy algorithm that finds a neighbor assignment whose maximum load is at most a polylog($n$) factor bigger that the optimum.

preprint2021arXiv

Non-Mergeable Sketching for Cardinality Estimation

Cardinality estimation is perhaps the simplest non-trivial statistical problem that can be solved via sketching. Industrially-deployed sketches like HyperLogLog, MinHash, and PCSA are mergeable, which means that large data sets can be sketched in a distributed environment, and then merged into a single sketch of the whole data set. In the last decade a variety of sketches have been developed that are non-mergeable, but attractive for other reasons. They are simpler, their cardinality estimates are strictly unbiased, and they have substantially lower variance. We evaluate sketching schemes on a reasonably level playing field, in terms of their memory-variance product (MVP). E.g., a sketch that occupies $5m$ bits and whose relative variance is $2/m$ (standard error $\sqrt{2/m}$) has an MVP of $10$. Our contributions are as follows. Cohen and Ting independently discovered what we call the Martingale transform for converting a mergeable sketch into a non-mergeable sketch. We present a simpler way to analyze the limiting MVP of Martingale-type sketches. We prove that the \Martingale{} transform is optimal in the non-mergeable world, and that \Martingale{} \fishmonger{} in particular is optimal among linearizable sketches, with an MVP of $H_0/2 \approx 1.63$. E.g., this is circumstantial evidence that to achieve 1\% standard error, we cannot do better than a 2 kilobyte sketch. \Martingale{} \fishmonger{} is neither simple nor practical. We develop a new mergeable sketch called \Curtain{} that strikes a nice balance between simplicity and efficiency, and prove that \Martingale{} \Curtain{} has limiting $\MVP\approx 2.31$. It can be updated with $O(1)$ memory accesses and it has lower empirical variance than \Martingale{} \LogLog, a practical non-mergeable version of HyperLogLog.

preprint2021arXiv

The Structure of Minimum Vertex Cuts

In this paper we continue a long line of work on representing the cut structure of graphs. We classify the types minimum vertex cuts, and the possible relationships between multiple minimum vertex cuts. As a consequence of these investigations, we exhibit a simple $O(κn)$-space data structure that can quickly answer pairwise $(κ+1)$-connectivity queries in a $κ$-connected graph. We also show how to compute the &#34;closest&#34; $κ$-cut to every vertex in near linear $\tilde{O}(m+poly(κ)n)$ time.

preprint2020arXiv

Approximate Generalized Matching: $f$-Factors and $f$-Edge Covers

In this paper we present linear time approximation schemes for several generalized matching problems on nonbipartite graphs. Our results include $O_ε(m)$-time algorithms for $(1-ε)$-maximum weight $f$-factor and $(1+ε)$-approximate minimum weight $f$-edge cover. As a byproduct, we also obtain direct algorithms for the exact cardinality versions of these problems running in $O(m\sqrt{f(V)})$ time. The technical contributions of this work include an efficient method for maintaining {\em relaxed complementary slackness} in generalized matching problems and approximation-preserving reductions between the $f$-factor and $f$-edge cover problems.

preprint2020arXiv

Contention Resolution Without Collision Detection

This paper focuses on the contention resolution problem on a shared communication channel that does not support collision detection. A shared communication channel is a multiple access channel, which consists of a sequence of synchronized time slots. Players on the channel may attempt to broadcast a packet (message) in any time slot. A player&#39;s broadcast succeeds if no other player broadcasts during that slot. If two or more players broadcast in the same time slot, then the broadcasts collide and both broadcasts fail. The lack of collision detection means that a player monitoring the channel cannot differentiate between the case of two or more players broadcasting in the same slot (a collision) and zero players broadcasting. In the contention-resolution problem, players arrive on the channel over time, and each player has one packet to transmit. The goal is to coordinate the players so that each player is able to successfully transmit its packet within reasonable time. However, the players can only communicate via the shared channel by choosing to either broadcast or not. A contention-resolution protocol is measured in terms of its throughput (channel utilization). Previous work on contention resolution that achieved constant throughput assumed that either players could detect collisions, or the players&#39; arrival pattern is generated by a memoryless (non-adversarial) process. The foundational question answered by this paper is whether collision detection is a luxury or necessity when the objective is to achieve constant throughput. We show that even without collision detection, one can solve contention resolution, achieving constant throughput, with high probability.

preprint2020arXiv

Joins on Samples: A Theoretical Guide for Practitioners

Despite decades of research on approximate query processing (AQP), our understanding of sample-based joins has remained limited and, to some extent, even superficial. The common belief in the community is that joining random samples is futile. This belief is largely based on an early result showing that the join of two uniform samples is not an independent sample of the original join, and that it leads to quadratically fewer output tuples. However, unfortunately, this result has little applicability to the key questions practitioners face. For example, the success metric is often the final approximation&#39;s accuracy, rather than output cardinality. Moreover, there are many non-uniform sampling strategies that one can employ. Is sampling for joins still futile in all of these settings? If not, what is the best sampling strategy in each case? To the best of our knowledge, there is no formal study answering these questions. This paper aims to improve our understanding of sample-based joins and offer a guideline for practitioners building and using real-world AQP systems. We study limitations of offline samples in approximating join queries: given an offline sampling budget, how well can one approximate the join of two tables? We answer this question for two success metrics: output size and estimator variance. We show that maximizing output size is easy, while there is an information-theoretical lower bound on the lowest variance achievable by any sampling strategy. We then define a hybrid sampling scheme that captures all combinations of stratified, universe, and Bernoulli sampling, and show that this scheme with our optimal parameters achieves the theoretical lower bound within a constant factor. Since computing these optimal parameters requires shuffling statistics across the network, we also propose a decentralized variant where each node acts autonomously using minimal statistics.

preprint2020arXiv

Planar Distance Oracles with Better Time-Space Tradeoffs

In a recent breakthrough, Charalampopoulos, Gawrychowski, Mozes, and Weimann (STOC 2019) showed that exact distance queries on planar graphs could be answered in $n^{o(1)}$ time by a data structure occupying $n^{1+o(1)}$ space, i.e., up to $o(1)$ terms, optimal exponents in time (0) and space (1) can be achieved simultaneously. Their distance query algorithm is recursive: it makes successive calls to a point-location algorithm for planar Voronoi diagrams, which involves many recursive distance queries. The depth of this recursion is non-constant and the branching factor logarithmic, leading to $(\log n)^{ω(1)} = n^{o(1)}$ query times. In this paper we present a new way to do point-location in planar Voronoi diagrams, which leads to a new exact distance oracle. At the two extremes of our space-time tradeoff curve we can achieve either $n^{1+o(1)}$ space and $\log^{2+o(1)}n$ query time, or $n\log^{2+o(1)}n$ space and $n^{o(1)}$ query time. All previous oracles with $\tilde{O}(1)$ query time occupy space $n^{1+Ω(1)}$, and all previous oracles with space $\tilde{O}(n)$ answer queries in $n^{Ω(1)}$ time.

preprint2020arXiv

The Communication Complexity of Set Intersection and Multiple Equality Testing

In this paper we explore fundamental problems in randomized communication complexity such as computing Set Intersection on sets of size $k$ and Equality Testing between vectors of length $k$. Sağlam and Tardos and Brody et al. showed that for these types of problems, one can achieve optimal communication volume of $O(k)$ bits, with a randomized protocol that takes $O(\log^* k)$ rounds. Aside from rounds and communication volume, there is a \emph{third} parameter of interest, namely the \emph{error probability} $p_{\mathrm{err}}$. It is straightforward to show that protocols for Set Intersection or Equality Testing need to send $Ω(k + \log p_{\mathrm{err}}^{-1})$ bits. Is it possible to simultaneously achieve optimality in all three parameters, namely $O(k + \log p_{\mathrm{err}}^{-1})$ communication and $O(\log^* k)$ rounds? In this paper we prove that there is no universally optimal algorithm, and complement the existing round-communication tradeoffs with a new tradeoff between rounds, communication, and probability of error. In particular: 1. Any protocol for solving Multiple Equality Testing in $r$ rounds with failure probability $2^{-E}$ has communication volume $Ω(Ek^{1/r})$. 2. There exists a protocol for solving Multiple Equality Testing in $r + \log^*(k/E)$ rounds with $O(k + rEk^{1/r})$ communication, thereby essentially matching our lower bound and that of Sağlam and Tardos. Our original motivation for considering $p_{\mathrm{err}}$ as an independent parameter came from the problem of enumerating triangles in distributed ($\textsf{CONGEST}$) networks having maximum degree $Δ$. We prove that this problem can be solved in $O(Δ/\log n + \log\log Δ)$ time with high probability $1-1/\operatorname{poly}(n)$.

preprint2020arXiv

The Energy Complexity of BFS in Radio Networks

We consider a model of energy complexity in Radio Networks in which transmitting or listening on the channel costs one unit of energy and computation is free. This simplified model captures key aspects of battery-powered sensors: that battery life is most influenced by transceiver usage, and that at low transmission powers, the actual cost of transmitting and listening are very similar. The energy complexity of tasks in single-hop networks is well understood. Recent work of Chang et al. considered energy complexity in multi-hop networks and showed that $\mathsf{Broadcast}$ admits an energy-efficient protocol, by which we mean each of the $n$ nodes in the network spends $O(\text{polylog}(n))$ energy. This work left open the strange possibility that all natural problems in multi-hop networks might admit such an energy-efficient solution. In this paper we prove that the landscape of energy complexity is rich enough to support a multitude of problem complexities. Whereas $\mathsf{Broadcast}$ can be solved by an energy-efficient protocol, exact computation of $\mathsf{Diameter}$ cannot, requiring $Ω(n)$ energy. Our main result is that $\mathsf{Breadth First Search}$ has sub-polynomial energy complexity at most $2^{O(\sqrt{\log n\log\log n})}=n^{o(1)}$; whether it admits an efficient $O(\text{polylog}(n))$-energy protocol is an open problem. Our main algorithm involves recursively solving a generalized BFS problem on a cluster graph introduced by Miller, Peng, and Xu. In this application, we make crucial use of a close relationship between distances in this cluster graph, and distances in the original network. This relationship is new and may be of independent interest.