Researcher profile

Will Rosenbaum

Will Rosenbaum contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Bias Reduction for Sum Estimation

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution $P$, and that when an element $x$ is sampled, the probability $P$ of sampling $x$ is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution $Q$ that is sufficiently close to $P$. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution $Q$ sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of $N$ real values $x_1, \ldots, x_N$. This problem is well-studied in the statistical literature in the case $P = Q$, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution $P$, values are sampled from a distribution $Q$ that is pointwise close to $P$. For every positive integer $k$ we define an estimator $ζ_k$ for $μ= \sum_i x_i$ whose bias is proportional to $γ^k$ (where our $ζ_1$ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if $Q$ is pointwise $γ$-close to uniform and all $x_i \in \{0, 1\}$, for any $ε> 0$, we can estimate $μ$ to within additive error $εN$ using $m = Θ({N^{1-\frac{1}{k}} / ε^{2/k}})$ samples, where $k = \left\lceil (\log ε)/(\log γ)\right\rceil$. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter $ε$: for some values of $ε$, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

preprint2022arXiv

Packet Forwarding with a Locally Bursty Adversary

We consider packet forwarding in the adversarial queueing theory (AQT) model introduced by Borodin et al. We introduce a refinement of the AQT $(ρ, σ)$-bounded adversary, which we call a \emph{locally bursty adversary} (LBA) that parameterizes injection patterns jointly by edge utilization and packet origin. For constant ($O(1)$) parameters, the LBA model is strictly more permissive than the $(ρ, σ)$ model. For example, there are injection patterns in the LBA model with constant parameters that can only be realized as $(ρ, σ)$-bounded injection patterns with $ρ+ σ= Ω(n)$ (where $n$ is the network size). We show that the LBA model (unlike the $(ρ, σ)$ model) is closed under packet bundling and discretization operations. Thus, the LBA model allows one to reduce the study of general (uniform) capacity networks and inhomogenous packet sizes to unit capacity networks with homogeneous packets. On the algorithmic side, we focus on information gathering networks -- i.e., networks in which all packets share a common destination, and the union of packet routes forms a tree. We show that the Odd-Even Downhill (OED) forwarding protocol described independently by Dobrev et al.\ and Patt-Shamir and Rosenbaum achieves buffer space usage of $O(\log n)$ against all LBAs with constant parameters. OED is a local protocol, but we show that the upper bound is tight even when compared to centralized protocols. Our lower bound for the LBA model is in contrast to the $(ρ, σ)$-model, where centralized protocols can achieve worst-case buffer space usage $O(1)$ for $ρ, σ= O(1)$, while the $O(\log n)$ upper bound for OED is optimal only for local protocols.

preprint2022arXiv

Training-Time Attacks against k-Nearest Neighbors

Nearest neighbor-based methods are commonly used for classification tasks and as subroutines of other data-analysis methods. An attacker with the capability of inserting their own data points into the training set can manipulate the inferred nearest neighbor structure. We distill this goal to the task of performing a training-set data insertion attack against $k$-Nearest Neighbor classification ($k$NN). We prove that computing an optimal training-time (a.k.a. poisoning) attack against $k$NN classification is NP-Hard, even when $k = 1$ and the attacker can insert only a single data point. We provide an anytime algorithm to perform such an attack, and a greedy algorithm for general $k$ and attacker budget. We provide theoretical bounds and empirically demonstrate the effectiveness and practicality of our methods on synthetic and real-world datasets. Empirically, we find that $k$NN is vulnerable in practice and that dimensionality reduction is an effective defense. We conclude with a discussion of open problems illuminated by our analysis.

preprint2021arXiv

Stable Matchings with Restricted Preferences: Structure and Complexity

It is well known that every stable matching instance $I$ has a rotation poset $R(I)$ that can be computed efficiently and the downsets of $R(I)$ are in one-to-one correspondence with the stable matchings of $I$. Furthermore, for every poset $P$, an instance $I(P)$ can be constructed efficiently so that the rotation poset of $I(P)$ is isomorphic to $P$. In this case, we say that $I(P)$ realizes $P$. Many researchers exploit the rotation poset of an instance to develop fast algorithms or to establish the hardness of stable matching problems. In order to gain a parameterized understanding of the complexity of sampling stable matchings, Bhatnagar et al. [SODA 2008] introduced stable matching instances whose preference lists are restricted but nevertheless model situations that arise in practice. In this paper, we study four such parameterized restrictions; our goal is to characterize the rotation posets that arise from these models: $k$-bounded, $k$-attribute, $(k_1, k_2)$-list, $k$-range. We prove that there is a constant $k$ so that every rotation poset is realized by some instance in the first three models for some fixed constant $k$. We describe efficient algorithms for constructing such instances given the Hasse diagram of a poset. As a consequence, the fundamental problem of counting stable matchings remains $\#$BIS-complete even for these restricted instances. For $k$-range preferences, we show that a poset $P$ is realizable if and only if the Hasse diagram of $P$ has pathwidth bounded by functions of $k$. Using this characterization, we show that the following problems are fixed parameter tractable when parametrized by the range of the instance: exactly counting and uniformly sampling stable matchings, finding median, sex-equal, and balanced stable matchings.

preprint2020arXiv

PALS: Plesiochronous and Locally Synchronous Systems

Consider an arbitrary network of communicating modules on a chip, each requiring a local signal telling it when to execute a computational step. There are three common solutions to generating such a local clock signal: (i) by deriving it from a single, central clock source, (ii) by local, free-running oscillators, or (iii) by handshaking between neighboring modules. Conceptually, each of these solutions is the result of a perceived dichotomy in which (sub)systems are either clocked or fully asynchronous, suggesting that the designer's choice is limited to deciding where to draw the line between synchronous and asynchronous design. In contrast, we take the view that the better question to ask is how synchronous the system can and should be. Based on a distributed clock synchronization algorithm, we present a novel design providing modules with local clocks whose frequency bounds are almost as good as those of corresponding free-running oscillators, yet neighboring modules are guaranteed to have a phase offset substantially smaller than one clock cycle. Concretely, parameters obtained from a 15nm ASIC implementation running at 2GHz yield mathematical worst-case bounds of 30ps on phase offset for a 32x32 node grid network.

preprint2020arXiv

Seeing Far vs. Seeing Wide: Volume Complexity of Local Graph Problems

Consider a graph problem that is locally checkable but not locally solvable: given a solution we can check that it is feasible by verifying all constant-radius neighborhoods, but to find a solution each node needs to explore the input graph at least up to distance $Ω(\log n)$ in order to produce its output. We consider the complexity of such problems from the perspective of volume: how large a subgraph does a node need to see in order to produce its output. We study locally checkable graph problems on bounded-degree graphs. We give a number of constructions that exhibit tradeoffs between deterministic distance, randomized distance, deterministic volume, and randomized volume: - If the deterministic distance is linear, it is also known that randomized distance is near-linear. In contrast, we show that there are problems with linear deterministic volume but only logarithmic randomized volume. - We prove a volume hierarchy theorem for randomized complexity: among problems with linear deterministic volume complexity, there are infinitely many distinct randomized volume complexity classes between $Ω(\log n)$ and $O(n)$. This hierarchy persists even when restricting to problems whose randomized and deterministic distance complexities are $Θ(\log n)$. - Similar hierarchies exist for polynomial distance complexities: for any $k, \ell \in N$ with $k \leq \ell$, there are problems whose randomized and deterministic distance complexities are $Θ(n^{1/\ell})$, randomized volume complexities are $Θ(n^{1/k})$, and whose deterministic volume complexities are $Θ(n)$. Additionally, we consider connections between our volume model and massively parallel computation (MPC). We give a general simulation argument that any volume-efficient algorithm can be transformed into a space-efficient MPC algorithm.

preprint2020arXiv

Simple Counting and Sampling Algorithms for Graphs with Bounded Pathwidth

In this paper, we consider the problem of counting and sampling structures in graphs. We define a class of "edge universal labeling problems"---which include proper $k$-colorings, independent sets, and downsets---and describe simple algorithms for counting and uniformly sampling valid labelings of graphs, assuming a path decomposition is given. Thus, we show that several well-studied counting and sampling problems are fixed parameter tractable (FPT) when parameterized by the pathwidth of the input graph. We discuss connections to counting and sampling problems for distributive lattices and, in particular, we give a new FPT algorithm for exactly counting and uniformly sampling stable matchings.