Researcher profile

Michael Mitzenmacher

Michael Mitzenmacher contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
14works
0followers
10topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

14 published item(s)

preprint2026arXiv

Quantizing With Randomized Hadamard Transforms: Efficient Heuristic Now Proven

Uniform random rotations (URRs) are a common preprocessing step in modern quantization approaches used for gradient compression, inference acceleration, KV-cache compression, model weight quantization, and approximate nearest-neighbor search in vector databases. In practice, URRs are often replaced by randomized Hadamard transforms (RHTs), which preserve orthogonality while admitting fast implementations. The remaining issue is the performance for worst-case inputs. With a URR, each coordinate is individually distributed as a shifted beta distribution, which converges to a Gaussian distribution in high dimensions. Generally, one RHT is not suitable in the worst case, as individual coordinates can be far from these distributions. We show that after composing two RHTs on any $d$-sized input vector, the marginal distribution of every fixed coordinate of the normalized rotated vector is within $O(d^{-1/2})$ of a standard Gaussian both in Kolmogorov distance and in $1$-Wasserstein distance. We then plug these bounds into the analyses of modern compression schemes, namely DRIVE and QUIC-FL, and show that two RHTs achieve performance that asymptotically matches URRs. However, we show that two RHTs may not be sufficient for Vector Quantization (VQ), which often requires weak correlation across fixed-size blocks of coordinates (as opposed to only marginal distribution convergence for single coordinates). We prove that a composition of three RHTs leads to decaying coordinate covariance. This ensures that any fixed, bounded, multi-dimensional VQ codebook optimized for URRs has the same expected error when using three RHTs, up to an additive term that vanishes with the dimension. Finally, because practical inputs are rarely adversarial, we propose a linear-time ${O}(d)$ check on the input's moments to dynamically adapt the number of RHTs used at runtime to improve performance.

preprint2026arXiv

The Mixed Birth-death/death-Birth Moran Process

We study evolutionary dynamics on graphs in which each step consists of one birth and one death, also known as the Moran processes. There are two types of individuals: residents with fitness $1$ and mutants with fitness $r$. Two standard update rules are used in the literature. In Birth-death (Bd), a vertex is chosen to reproduce proportional to fitness, and one of its neighbors is selected uniformly at random to be replaced by the offspring. In death-Birth (dB), a vertex is chosen uniformly to die, and then one of its neighbors is chosen, proportional to fitness, to place an offspring into the vacancy. We formalize and study a unified model, the $λ$-mixed Moran process, in which each step is independently a Bd step with probability $λ\in [0,1]$ and a dB step otherwise. We analyze this mixed process for undirected, connected graphs. As an interesting special case, we show at $λ=1/2$, for any graph that the fixation probability when $r=1$ with a single mutant initially on the graph is exactly $1/n$, and also at $λ=1/2$ that the absorption time for any $r$ is $O_r(n^4)$. We also show results for graphs that are "almost regular," in a manner defined in the paper. We use this to show that for suitable random graphs from $G \sim G(n,p)$ and fixed $r>1$, with high probability over the choice of graph, the absorption time is $O_r(n^4)$, the fixation probability is $Ω_r(n^{-2})$, and we can approximate the fixation probability in polynomial time. Another special case is when the graph has only two distinct degree values $\{d_1, d_2\}$ with $d_1 \leq d_2$. For those graphs, we give exact formulas for fixation probabilities when $r = 1$ and any $λ$, and establish an absorption time of $O_r(n^4 α^4)$ for all $λ$, where $α= d_2 / d_1$. We also provide explicit formulas for the star and cycle under any $r$ or $λ$.

preprint2023arXiv

Analyzing Generalized Pólya Urn Models using Martingales, with an Application to Viral Evolution

The randomized play-the-winner (RPW) model is a generalized Pólya Urn process with broad applications ranging from clinical trials to molecular evolution. We derive an exact expression for the variance of the RPW model by transforming the Pólya Urn process into a martingale, correcting an earlier result of Matthews and Rosenberger (1997). We then use this result to approximate the full probability mass function of the RPW model for certain parameter values relevant to genetic applications. Finally, we fit our model to genomic sequencing data of SARS-CoV-2, demonstrating a novel method of estimating the viral mutation rate that delivers comparable results to existing scientific literature.

preprint2022arXiv

EDEN: Communication-Efficient and Robust Distributed Mean Estimation for Federated Learning

Distributed Mean Estimation (DME) is a central building block in federated learning, where clients send local gradients to a parameter server for averaging and updating the model. Due to communication constraints, clients often use lossy compression techniques to compress the gradients, resulting in estimation inaccuracies. DME is more challenging when clients have diverse network conditions, such as constrained communication budgets and packet losses. In such settings, DME techniques often incur a significant increase in the estimation error leading to degraded learning performance. In this work, we propose a robust DME technique named EDEN that naturally handles heterogeneous communication budgets and packet losses. We derive appealing theoretical guarantees for EDEN and evaluate it empirically. Our results demonstrate that EDEN consistently improves over state-of-the-art DME techniques.

preprint2022arXiv

Incentive Compatible Queues Without Money

For job scheduling systems, where jobs require some amount of processing and then leave the system, it is natural for each user to provide an estimate of their job's time requirement in order to aid the scheduler. However, if there is no incentive mechanism for truthfulness, each user will be motivated to provide estimates that give their job precedence in the schedule, so that the job completes as early as possible. We examine how to make such scheduling systems incentive compatible, without using monetary charges, under a natural queueing theory framework. In our setup, each user has an estimate of their job's running time, but it is possible for this estimate to be incorrect. We examine scheduling policies where if a job exceeds its estimate, it is with some probability "punished" and re-scheduled after other jobs, to disincentivize underestimates of job times. However, because user estimates may be incorrect (without any malicious intent), excessive punishment may incentivize users to overestimate their job times, which leads to less efficient scheduling. We describe two natural scheduling policies, BlindTrust and MeasuredTrust. We show that, for both of these policies, given the parameters of the system, we can efficiently determine the set of punishment probabilities that are incentive compatible, in that users are incentivized to provide their actual estimate of the job time. Moreover, we prove for MeasuredTrust that in the limit as estimates converge to perfect accuracy, the range of punishment probabilities that are incentive compatible converges to $[0,1]$. Our formalism establishes a framework for studying further queue-based scheduling problems where job time estimates from users are utilized, and the system needs to incentivize truthful reporting of estimates.

preprint2022arXiv

Proteus: A Self-Designing Range Filter

We introduce Proteus, a novel self-designing approximate range filter, which configures itself based on sampled data in order to optimize its false positive rate (FPR) for a given space requirement. Proteus unifies the probabilistic and deterministic design spaces of state-of-the-art range filters to achieve robust performance across a larger variety of use cases. At the core of Proteus lies our Contextual Prefix FPR (CPFPR) model - a formal framework for the FPR of prefix-based filters across their design spaces. We empirically demonstrate the accuracy of our model and Proteus' ability to optimize over both synthetic workloads and real-world datasets. We further evaluate Proteus in RocksDB and show that it is able to improve end-to-end performance by as much as 5.3x over more brittle state-of-the-art methods such as SuRF and Rosetta. Our experiments also indicate that the cost of modeling is not significant compared to the end-to-end performance gains and that Proteus is robust to workload shifts.

preprint2022arXiv

The Supermarket Model with Known and Predicted Service Times

The supermarket model refers to a system with a large number of queues, where new customers choose d queues at random and join the one with the fewest customers. This model demonstrates the power of even small amounts of choice, as compared to simply joining a queue chosen uniformly at random, for load balancing systems. In this work we perform simulation-based studies to consider variations where service times for a customer are predicted, as might be done in modern settings using machine learning techniques or related mechanisms. Our primary takeaway is that using even seemingly weak predictions of service times can yield significant benefits over blind First In First Out queueing in this context. However, some care must be taken when using predicted service time information to both choose a queue and order elements for service within a queue; while in many cases using the information for both choosing and ordering is beneficial, in many of our simulation settings we find that simply using the number of jobs to choose a queue is better when using predicted service times to order jobs in a queue. In our simulations, we evaluate both synthetic and real-world workloads--in the latter, service times are predicted by machine learning. Our results provide practical guidance for the design of real-world systems; moreover, we leave many natural theoretical open questions for future work, validating their relevance to real-world situations.

preprint2022arXiv

Uniform Bounds for Scheduling with Job Size Estimates

We consider the problem of scheduling to minimize mean response time in M/G/1 queues where only estimated job sizes (processing times) are known to the scheduler, where a job of true size $s$ has estimated size in the interval $[βs, αs]$ for some $α\geq β> 0$. We evaluate each scheduling policy by its approximation ratio, which we define to be the ratio between its mean response time and that of Shortest Remaining Processing Time (SRPT), the optimal policy when true sizes are known. Our question: is there a scheduling policy that (a) has approximation ratio near 1 when $α$ and $β$ are near 1, (b) has approximation ratio bounded by some function of $α$ and $β$ even when they are far from 1, and (c) can be implemented without knowledge of $α$ and $β$? We first show that naively running SRPT using estimated sizes in place of true sizes is not such a policy: its approximation ratio can be arbitrarily large for any fixed $β< 1$. We then provide a simple variant of SRPT for estimated sizes that satisfies criteria (a), (b), and (c). In particular, we prove its approximation ratio approaches 1 uniformly as $α$ and $β$ approach 1. This is the first result showing this type of convergence for M/G/1 scheduling. We also study the Preemptive Shortest Job First (PSJF) policy, a cousin of SRPT. We show that, unlike SRPT, naively running PSJF using estimated sizes in place of true sizes satisfies criteria (b) and (c), as well as a weaker version of (a).

preprint2021arXiv

Dynamic Longest Increasing Subsequence and the Erdös-Szekeres Partitioning Problem

In this paper, we provide new approximation algorithms for dynamic variations of the longest increasing subsequence (\textsf{LIS}) problem, and the complementary distance to monotonicity (\textsf{DTM}) problem. In this setting, operations of the following form arrive sequentially: (i) add an element, (ii) remove an element, or (iii) substitute an element for another. At every point in time, the algorithm has an approximation to the longest increasing subsequence (or distance to monotonicity). We present a $(1+ε)$-approximation algorithm for \textsf{DTM} with polylogarithmic worst-case update time and a constant factor approximation algorithm for \textsf{LIS} with worst-case update time $\tilde O(n^ε)$ for any constant $ε> 0$.% $n$ in the runtime denotes the size of the array at the time the operation arrives. Our dynamic algorithm for \textsf{LIS} leads to an almost optimal algorithm for the Erdös-Szekeres partitioning problem. Erdös-Szekeres partitioning problem was introduced by Erdös and Szekeres in 1935 and was known to be solvable in time $O(n^{1.5}\log n)$. Subsequent work improve the runtime to $O(n^{1.5})$ only in 1998. Our dynamic \textsf{LIS} algorithm leads to a solution for Erdös-Szekeres partitioning problem with runtime $\tilde O_ε(n^{1+ε})$ for any constant $ε> 0$.

preprint2021arXiv

SALSA: Self-Adjusting Lean Streaming Analytics

Counters are the fundamental building block of many data sketching schemes, which hash items to a small number of counters and account for collisions to provide good approximations for frequencies and other measures. Most existing methods rely on fixed-size counters, which may be wasteful in terms of space, as counters must be large enough to eliminate any risk of overflow. Instead, some solutions use small, fixed-size counters that may overflow into secondary structures. This paper takes a different approach. We propose a simple and general method called SALSA for dynamic re-sizing of counters and show its effectiveness. SALSA starts with small counters, and overflowing counters simply merge with their neighbors. SALSA can thereby allow more counters for a given space, expanding them as necessary to represent large numbers. Our evaluation demonstrates that, at the cost of a small overhead for its merging logic, SALSA significantly improves the accuracy of popular schemes (such as Count-Min Sketch and Count Sketch) over a variety of tasks. Our code is released as open-source [1].

preprint2020arXiv

Faster and More Accurate Measurement through Additive-Error Counters

Counters are a fundamental building block for networking applications such as load balancing, traffic engineering, and intrusion detection, which require estimating flow sizes and identifying heavy hitter flows. Existing works suggest replacing counters with shorter multiplicative error \emph{estimators} that improve the accuracy by fitting more of them within a given space. However, such estimators impose a computational overhead that degrades the measurement throughput. Instead, we propose \emph{additive} error estimators, which are simpler, faster, and more accurate when used for network measurement. Our solution is rigorously analyzed and empirically evaluated against several other measurement algorithms on real Internet traces. For a given error target, we improve the speed of the uncompressed solutions by $5\times$-$30\times$, and the space by up to $4\times$. Compared with existing state-of-the-art estimators, our solution is $ 9\times$-$35\times$ faster while being considerably more accurate.

preprint2020arXiv

PINT: Probabilistic In-band Network Telemetry

Commodity network devices support adding in-band telemetry measurements into data packets, enabling a wide range of applications, including network troubleshooting, congestion control, and path tracing. However, including such information on packets adds significant overhead that impacts both flow completion times and application-level performance. We introduce PINT, an in-band telemetry framework that bounds the amount of information added to each packet. PINT encodes the requested data on multiple packets, allowing per-packet overhead limits that can be as low as one bit. We analyze PINT and prove performance bounds, including cases when multiple queries are running simultaneously. PINT is implemented in P4 and can be deployed on network devices. Using real topologies and traffic characteristics, we show that PINT concurrently enables applications such as congestion control, path tracing, and computing tail latencies, using only sixteen bits per packet, with performance comparable to the state of the art.

preprint2020arXiv

Queues with Small Advice

Motivated by recent work on scheduling with predicted job sizes, we consider the performance of scheduling algorithms with minimal advice, namely a single bit. Besides demonstrating the power of very limited advice, such schemes are quite natural. In the prediction setting, one bit of advice can be used to model a simple prediction as to whether a job is &#34;large&#34; or &#34;small&#34;; that is, whether a job is above or below a given threshold. Further, one-bit advice schemes can correspond to mechanisms that tell whether to put a job at the front or the back for the queue, a limitation which may be useful in many implementation settings. Finally, queues with a single bit of advice have a simple enough state that they can be analyzed in the limiting mean-field analysis framework for the power of two choices. Our work follows in the path of recent work by showing that even small amounts of even possibly inaccurate information can greatly improve scheduling performance.