Source author record

Mor Harchol-Balter

Mor Harchol-Balter appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Performance math.PR

Catalog footprint

What is connected

6works

2topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

LookAhead: The Optimal Non-decreasing Index Policy for a Time-Varying Holding Cost problem

In practice, the cost of delaying a job can grow as the job waits. Such behavior is modeled by the Time-Varying Holding Cost (TVHC) problem, where each job's instantaneous holding cost increases with its current age (a job's age is the time since it arrived). The goal of the TVHC problem is to find a scheduling policy that minimizes the time-average total holding cost across all jobs. However, no optimality results are known for the TVHC problem outside of the asymptotic regime. In this paper, we study a simple yet still challenging special case: A two-class M/M/1 queue in which class 1 jobs incur a non-decreasing, time-varying holding cost and class 2 jobs incur a constant holding cost. Our main contribution is deriving the first optimal (non-decreasing) index policy for this special case of the TVHC problem. Our optimal policy, called LookAhead, stems from the following idea: Rather than considering each job's current holding cost when making scheduling decisions, we should look at their cost some $X$ time into the future, where this $X$ is intuitively called the ``lookahead amount." This paper derives that optimal lookahead amount.

preprint2022arXiv

WCFS: A new framework for analyzing multiserver systems

Multiserver queueing systems are found at the core of a wide variety of practical systems. Many important multiserver models have a previously-unexplained similarity: identical mean response time behavior is empirically observed in the heavy traffic limit. We explain this similarity for the first time. We do so by introducing the work-conserving finite-skip (WCFS) framework, which encompasses a broad class of important models. This class includes the heterogeneous M/G/k, the limited processor sharing policy for the M/G/1, the threshold parallelism model, and the multiserver-job model under a novel scheduling algorithm. We prove that for all WCFS models, scaled mean response time $E[T](1-ρ)$ converges to the same value, $E[S^2]/(2E[S])$, in the heavy-traffic limit, which is also the heavy traffic limit for the M/G/1/FCFS. Moreover, we prove additively tight bounds on mean response time for the WCFS class, which hold for all load $ρ$. For each of the four models mentioned above, our bounds are the first known bounds on mean response time.

preprint2021arXiv

Zero Queueing for Multi-Server Jobs

Cloud computing today is dominated by multi-server jobs. These are jobs that request multiple servers simultaneously and hold onto all of these servers for the duration of the job. Multi-server jobs add a lot of complexity to the traditional one-job-per-server model: an arrival might not "fit" into the available servers and might have to queue, blocking later arrivals and leaving servers idle. From a queueing perspective, almost nothing is understood about multi-server job queueing systems; even understanding the exact stability region is a very hard problem. In this paper, we investigate a multi-server job queueing model under scaling regimes where the number of servers in the system grows. Specifically, we consider a system with multiple classes of jobs, where jobs from different classes can request different numbers of servers and have different service time distributions, and jobs are served in first-come-first-served order. The multi-server job model opens up new scaling regimes where both the number of servers that a job needs and the system load scale with the total number of servers. Within these scaling regimes, we derive the first results on stability, queueing probability, and the transient analysis of the number of jobs in the system for each class. In particular we derive sufficient conditions for zero queueing. Our analysis introduces a novel way of extracting information from the Lyapunov drift, which can be applicable to a broader scope of problems in queueing systems.

preprint2020arXiv

Optimal Resource Allocation for Elastic and Inelastic Jobs

Modern data centers are tasked with processing heterogeneous workloads consisting of various classes of jobs. These classes differ in their arrival rates, size distributions, and job parallelizability. With respect to paralellizability, some jobs are elastic, meaning they can parallelize linearly across many servers. Other jobs are inelastic, meaning they can only run on a single server. Although job classes can differ drastically, they are typically forced to share a single cluster. When sharing a cluster among heterogeneous jobs, one must decide how to allocate servers to each job at every moment in time. In this paper, we design and analyze allocation policies which aim to minimize the mean response time across jobs, where a job's response time is the time from when it arrives until it completes. We model this problem in a stochastic setting where each job may be elastic or inelastic. Job sizes are drawn from exponential distributions, but are unknown to the system. We show that, in the common case where elastic jobs are larger on average than inelastic jobs, the optimal allocation policy is Inelastic-First, giving inelastic jobs preemptive priority over elastic jobs. We obtain this result by introducing a novel sample path argument. We also show that there exist cases where Elastic-First (giving priority to elastic jobs) performs better than Inelastic-First. We then provide the first analysis of mean response time under both Elastic-First and Inelastic-First by leveraging recent techniques for solving high-dimensional Markov chains.

preprint2020arXiv

Simple Near-Optimal Scheduling for the M/G/1

We consider the problem of preemptively scheduling jobs to minimize mean response time of an M/G/1 queue. When we know each job's size, the shortest remaining processing time (SRPT) policy is optimal. Unfortunately, in many settings we do not have access to each job's size. Instead, we know only the job size distribution. In this setting the Gittins policy is known to minimize mean response time, but its complex priority structure can be computationally intractable. A much simpler alternative to Gittins is the shortest expected remaining processing time (SERPT) policy. While SERPT is a natural extension of SRPT to unknown job sizes, it is unknown whether or not SERPT is close to optimal for mean response time. We present a new variant of SERPT called monotonic SERPT (M-SERPT) which is as simple as SERPT but has provably near-optimal mean response time at all loads for any job size distribution. Specifically, we prove the mean response time ratio between M-SERPT and Gittins is at most 3 for load $ρ\leq 8/9$ and at most 5 for any load. This makes M-SERPT the only non-Gittins scheduling policy known to have a constant-factor approximation ratio for mean response time.

preprint2015arXiv

Clearing Analysis on Phases: Exact Limiting Probabilities for Skip-free, Unidirectional, Quasi-birth-death Processes

Many problems in computing, service, and manufacturing systems can be modeled via infinite repeating Markov chains with an infinite number of levels and a finite number of phases. Many such chains are quasi-birth-death processes (QBDs) with transitions that are skip-free in level, in that one can only transition between consecutive levels, and unidirectional in phase, in that one can only transition from lower-numbered phases to higher-numbered phases. We present a procedure, which we call Clearing Analysis on Phases (CAP), for determining the limiting probabilities of such Markov chains exactly. The CAP method yields the limiting probability of each state in the repeating portion of the chain as a linear combination of scalar bases raised to a power corresponding to the level of the state. The weights in these linear combinations can be determined by solving a finite system of linear equations.