Source author record

Rudy Zhou

Rudy Zhou appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms

Catalog footprint

What is connected

5works

1topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Configuration Balancing for Stochastic Requests

The configuration balancing problem with stochastic requests generalizes many well-studied resource allocation problems such as load balancing and virtual circuit routing. In it, we have $m$ resources and $n$ requests. Each request has multiple possible configurations, each of which increases the load of each resource by some amount. The goal is to select one configuration for each request to minimize the makespan: the load of the most-loaded resource. In our work, we focus on a stochastic setting, where we only know the distribution for how each configuration increases the resource loads, learning the realized value only after a configuration is chosen. We develop both offline and online algorithms for configuration balancing with stochastic requests. When the requests are known offline, we give a non-adaptive policy for configuration balancing with stochastic requests that $O(\frac{\log m}{\log \log m})$-approximates the optimal adaptive policy. In particular, this closes the adaptivity gap for this problem as there is an asymptotically matching lower bound even for the very special case of load balancing on identical machines. When requests arrive online in a list, we give a non-adaptive policy that is $O(\log m)$ competitive. Again, this result is asymptotically tight due to information-theoretic lower bounds for very special cases (e.g., for load balancing on unrelated machines). Finally, we show how to leverage adaptivity in the special case of load balancing on related machines to obtain a constant-factor approximation offline and an $O(\log \log m)$-approximation online. A crucial technical ingredient in all of our results is a new structural characterization of the optimal adaptive policy that allows us to limit the correlations between its decisions.

preprint2022arXiv

Minimizing Completion Times for Stochastic Jobs via Batched Free Times

We study the classic problem of minimizing the expected total completion time of jobs on $m$ identical machines in the setting where the sizes of the jobs are stochastic. Specifically, the size of each job is a random variable whose distribution is known to the algorithm, but whose realization is revealed only after the job is scheduled. While minimizing the total completion time is easy in the deterministic setting, the stochastic problem has long been notorious: all known algorithms have approximation ratios that either depend on the variances, or depend linearly on the number of machines. We give an $\widetilde{O}(\sqrt{m})$-approximation for stochastic jobs which have Bernoulli processing times. This is the first approximation for this problem that is both independent of the variance in the job sizes, and is sublinear in the number of machines $m$. Our algorithm is based on a novel reduction from minimizing the total completion time to a natural makespan-like objective, which we call the weighted free time. We hope this free time objective will be useful in further improvements to this problem, as well as other stochastic scheduling problems.

preprint2022arXiv

Online Demand Scheduling with Failovers

Motivated by cloud computing applications, we study the problem of how to optimally deploy new hardware subject to both power and robustness constraints. To model the situation observed in large-scale data centers, we introduce the Online Demand Scheduling with Failover problem. There are $m$ identical devices with capacity constraints. Demands come one-by-one and, to be robust against a device failure, need to be assigned to a pair of devices. When a device fails (in a failover scenario), each demand assigned to it is rerouted to its paired device (which may now run at increased capacity). The goal is to assign demands to the devices to maximize the total utilization subject to both the normal capacity constraints as well as these novel failover constraints. These latter constraints introduce new decision tradeoffs not present in classic assignment problems such as the Multiple Knapsack problem and AdWords. In the worst-case model, we design a deterministic $\approx \frac{1}{2}$-competitive algorithm, and show this is essentially tight. To circumvent this constant-factor loss, which in the context of big cloud providers represents substantial capital losses, we consider the stochastic arrival model, where all demands come i.i.d. from an unknown distribution. In this model we design an algorithm that achieves a sub-linear additive regret (i.e. as OPT or $m$ increases, the multiplicative competitive ratio goes to $1$). This requires a combination of different techniques, including a configuration LP with a non-trivial post-processing step and an online monotone matching procedure introduced by Rhee and Talagrand.

preprint2020arXiv

Fast Noise Removal for $k$-Means Clustering

This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is called $k$-means clustering with outliers. The goal of $k$-means clustering with outliers is to discard up to a specified number $z$ of points as noise/outliers and then find a $k$-means solution on the remaining data. The problem has received significant attention, yet current algorithms with theoretical guarantees suffer from either high running time or inherent loss in the solution quality. The main contribution of this paper is two-fold. Firstly, we develop a simple greedy algorithm that has provably strong worst case guarantees. The greedy algorithm adds a simple preprocessing step to remove noise, which can be combined with any $k$-means clustering algorithm. This algorithm gives the first pseudo-approximation-preserving reduction from $k$-means with outliers to $k$-means without outliers. Secondly, we show how to construct a coreset of size $O(k \log n)$. When combined with our greedy algorithm, we obtain a scalable, near linear time algorithm. The theoretical contributions are verified experimentally by demonstrating that the algorithm quickly removes noise and obtains a high-quality clustering.

preprint2020arXiv

Structural Iterative Rounding for Generalized $k$-Median Problems

This paper considers approximation algorithms for generalized $k$-median problems. This class of problems can be informally described as $k$-median with a constant number of extra constraints, and includes $k$-median with outliers, and knapsack median. Our first contribution is a pseudo-approximation algorithm for generalized $k$-median that outputs a $6.387$-approximate solution, with a constant number of fractional variables. The algorithm builds on the iterative rounding framework introduced by Krishnaswamy, Li, and Sandeep for $k$-median with outliers. The main technical innovation is allowing richer constraint sets in the iterative rounding and taking advantage of the structure of the resulting extreme points. Using our pseudo-approximation algorithm, we give improved approximation algorithms for $k$-median with outliers and knapsack median. This involves combining our pseudo-approximation with pre- and post-processing steps to round a constant number of fractional variables at a small increase in cost. Our algorithms achieve approximation ratios $6.994 + ε$ and $6.387 + ε$ for $k$-median with outliers and knapsack median, respectively. These improve on the best-known approximation ratio $7.081 + ε$ for both problems \cite{DBLP:conf/stoc/KrishnaswamyLS18}.

Rudy Zhou

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Configuration Balancing for Stochastic Requests

Minimizing Completion Times for Stochastic Jobs via Batched Free Times

Online Demand Scheduling with Failovers

Fast Noise Removal for $k$-Means Clustering

Structural Iterative Rounding for Generalized $k$-Median Problems