Source author record

Neal E. Young

Neal E. Young appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics Computational Complexity Networking and Internet Architecture Distributed, Parallel, and Cluster Computing Information Theory math.IT Computational Geometry math.CO

Catalog footprint

What is connected

40works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Huang and Wong's Algorithm for Generalized Binary Split Trees

Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution.

preprint2021arXiv

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which also allows unsuccessful queries and for which no polynomial-time algorithm was previously known. The correctness proof of our algorithm relies on a new structural theorem for two-way-comparison search trees.

preprint2021arXiv

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does.

preprint2021arXiv

Optimal Search Trees with 2-Way Comparisons

In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algorithms were known only for restricted variants. We solve the general case, giving (i) an $O(n^4)$-time algorithm and (ii) an $O(n \log n)$-time additive-3 approximation algorithm. Also, for finding optimal binary split trees, we (iii) obtain a linear speedup and (iv) prove some previous work incorrect.

preprint2020arXiv

Algorithmic approaches to selecting control clones in DNA array hybridization experiments

We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonucleotide probe. This experiment produces analog signals, one for each clone, which then need to be classified, that is, converted into binary values 1 and 0 that represent hybridization and non-hybridization events. In addition to the sample rRNA gene clones, the array contains a number of control clones needed to calibrate the classification procedure of the hybridization signals. These control clones must be selected with care to optimize the classification process. We formulate this as a combinatorial optimization problem called Balanced Covering. We prove that the problem is NP-hard, and we show some results on hardness of approximation. We propose approximation algorithms based on randomized rounding and we show that, with high probability, our algorithms approximate well the optimum solution. The experimental results confirm that the algorithms find high quality control clones. The algorithms have been implemented and are publicly available as part of the software package called CloneTools.

preprint2020arXiv

Distributed algorithms for covering, packing and maximum weighted matching

This paper gives poly-logarithmic-round, distributed D-approximation algorithms for covering problems with submodular cost and monotone covering constraints (Submodular-cost Covering). The approximation ratio D is the maximum number of variables in any constraint. Special cases include Covering Mixed Integer Linear Programs (CMIP), and Weighted Vertex Cover (with D=2). Via duality, the paper also gives poly-logarithmic-round, distributed D-approximation algorithms for Fractional Packing linear programs (where D is the maximum number of constraints in which any variable occurs), and for Max Weighted c-Matching in hypergraphs (where D is the maximum size of any of the hyperedges; for graphs D=2). The paper also gives parallel (RNC) 2-approximation algorithms for CMIP with two variables per constraint and Weighted Vertex Cover. The algorithms are randomized. All of the approximation ratios exactly match those of comparable centralized algorithms.

preprint2020arXiv

Incremental Medians via Online Bidding

In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the cost of serving a customer u is the minimum distance between u and a facility in F. The goal is to find a set F of k facilities that minimizes the sum, over all customers, of their service costs. Following Mettu and Plaxton, we study the incremental medians problem, where k is not known in advance, and the algorithm produces a nested sequence of facility sets where the kth set has size k. The algorithm is c-cost-competitive if the cost of each set is at most c times the cost of the optimum set of size k. We give improved incremental algorithms for the metric version: an 8-cost-competitive deterministic algorithm, a 2e ~ 5.44-cost-competitive randomized algorithm, a (24+epsilon)-cost-competitive, poly-time deterministic algorithm, and a (6e+epsilon ~ .31)-cost-competitive, poly-time randomized algorithm. The algorithm is s-size-competitive if the cost of the kth set is at most the minimum cost of any set of size k, and has size at most s k. The optimal size-competitive ratios for this problem are 4 (deterministic) and e (randomized). We present the first poly-time O(log m)-size-approximation algorithm for the offline problem and first poly-time O(log m)-size-competitive algorithm for the incremental problem. Our proofs reduce incremental medians to the following online bidding problem: faced with an unknown threshold T, an algorithm submits "bids" until it submits a bid that is at least the threshold. It pays the sum of all its bids. We prove that folklore algorithms for online bidding are optimally competitive.

preprint2015arXiv

Approximation Algorithms for the Joint Replenishment Problem with Deadlines

The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders to minimize the sum of ordering costs and retailers' waiting costs. We study the approximability of JRP-D, the version of JRP with deadlines, where instead of waiting costs the retailers impose strict deadlines. We study the integrality gap of the standard linear-program (LP) relaxation, giving a lower bound of 1.207, a stronger, computer-assisted lower bound of 1.245, as well as an upper bound and approximation ratio of 1.574. The best previous upper bound and approximation ratio was 1.667; no lower bound was previously published. For the special case when all demand periods are of equal length we give an upper bound of 1.5, a lower bound of 1.2, and show APX-hardness.

preprint2015arXiv

On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms

We give a lower bound on the iteration complexity of a natural class of Lagrangean-relaxation algorithms for approximately solving packing/covering linear programs. We show that, given an input with $m$ random 0/1-constraints on $n$ variables, with high probability, any such algorithm requires $Ω(ρ\log(m)/ε^2)$ iterations to compute a $(1+ε)$-approximate solution, where $ρ$ is the width of the input. The bound is tight for a range of the parameters $(m,n,ρ,ε)$. The algorithms in the class include Dantzig-Wolfe decomposition, Benders' decomposition, Lagrangean relaxation as developed by Held and Karp [1971] for lower-bounding TSP, and many others (e.g. by Plotkin, Shmoys, and Tardos [1988] and Grigoriadis and Khachiyan [1996]). To prove the bound, we use a discrepancy argument to show an analogous lower bound on the support size of $(1+ε)$-approximate mixed strategies for random two-player zero-sum 0/1-matrix games.

preprint2014arXiv

Nearly Linear-Work Algorithms for Mixed Packing/Covering and Facility-Location Linear Programs

We describe the first nearly linear-time approximation algorithms for explicitly given mixed packing/covering linear programs, and for (non-metric) fractional facility location. We also describe the first parallel algorithms requiring only near-linear total work and finishing in polylog time. The algorithms compute $(1+ε)$-approximate solutions in time (and work) $O^*(N/ε^2)$, where $N$ is the number of non-zeros in the constraint matrix. For facility location, $N$ is the number of eligible client/facility pairs.

preprint2013arXiv

A Nearly Linear-Time PTAS for Explicit Fractional Packing and Covering Linear Programs

We give an approximation algorithm for packing and covering linear programs (linear programs with non-negative coefficients). Given a constraint matrix with n non-zeros, r rows, and c columns, the algorithm computes feasible primal and dual solutions whose costs are within a factor of 1+eps of the optimal cost in time O((r+c)log(n)/eps^2 + n).

preprint2013arXiv

Approximating 1-dimensional TSP Requires Omega(n log n) Comparisons

We give a short proof that any comparison-based n^(1-epsilon)-approximation algorithm for the 1-dimensional Traveling Salesman Problem (TSP) requires Omega(n log n) comparisons.

preprint2013arXiv

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Can one choose a good Huffman code on the fly, without knowing the underlying distribution? Online Slot Allocation (OSA) models this and similar problems: There are n slots, each with a known cost. There are n items. Requests for items are drawn i.i.d. from a fixed but hidden probability distribution p. After each request, if the item, i, was not previously requested, then the algorithm (knowing the slot costs and the requests so far, but not p) must place the item in some vacant slot j(i). The goal is to minimize the sum, over the items, of the probability of the item times the cost of its assigned slot. The optimal offline algorithm is trivial: put the most probable item in the cheapest slot, the second most probable item in the second cheapest slot, etc. The optimal online algorithm is First Come First Served (FCFS): put the first requested item in the cheapest slot, the second (distinct) requested item in the second cheapest slot, etc. The optimal competitive ratios for any online algorithm are 1+H(n-1) ~ ln n for general costs and 2 for concave costs. For logarithmic costs, the ratio is, asymptotically, 1: FCFS gives cost opt + O(log opt). For Huffman coding, FCFS yields an online algorithm (one that allocates codewords on demand, without knowing the underlying probability distribution) that guarantees asymptotically optimal cost: at most opt + 2 log(1+opt) + 2.

preprint2013arXiv

Hamming Approximation of NP Witnesses

Given a satisfiable 3-SAT formula, how hard is it to find an assignment to the variables that has Hamming distance at most n/2 to a satisfying assignment? More generally, consider any polynomial-time verifier for any NP-complete language. A d(n)-Hamming-approximation algorithm for the verifier is one that, given any member x of the language, outputs in polynomial time a string a with Hamming distance at most d(n) to some witness w, where (x,w) is accepted by the verifier. Previous results have shown that, if P != NP, then every NP-complete language has a verifier for which there is no (n/2-n^(2/3+d))-Hamming-approximation algorithm, for various constants d > 0. Our main result is that, if P != NP, then every paddable NP-complete language has a verifier that admits no (n/2+O(sqrt(n log n)))-Hamming-approximation algorithm. That is, one cannot get even half the bits right. We also consider natural verifiers for various well-known NP-complete problems. They do have n/2-Hamming-approximation algorithms, but, if P != NP, have no (n/2-n^epsilon)-Hamming-approximation algorithms for any constant epsilon > 0. We show similar results for randomized algorithms.

preprint2013arXiv

On a Linear Program for Minimum-Weight Triangulation

Minimum-weight triangulation (MWT) is NP-hard. It has a polynomial-time constant-factor approximation algorithm, and a variety of effective polynomial- time heuristics that, for many instances, can find the exact MWT. Linear programs (LPs) for MWT are well-studied, but previously no connection was known between any LP and any approximation algorithm or heuristic for MWT. Here we show the first such connections: for an LP formulation due to Dantzig et al. (1985): (i) the integrality gap is bounded by a constant; (ii) given any instance, if the aforementioned heuristics find the MWT, then so does the LP.

preprint2012arXiv

Caching with rental cost and zapping

The \emph{file caching} problem is defined as follows. Given a cache of size $k$ (a positive integer), the goal is to minimize the total retrieval cost for the given sequence of requests to files. A file $f$ has size $size(f)$ (a positive integer) and retrieval cost $cost(f)$ (a non-negative number) for bringing the file into the cache. A \emph{miss} or \emph{fault} occurs when the requested file is not in the cache and the file has to be retrieved into the cache by paying the retrieval cost, and some other file may have to be removed (\emph{evicted}) from the cache so that the total size of the files in the cache does not exceed $k$. We study the following variants of the online file caching problem. \textbf{\emph{Caching with Rental Cost} (or \emph{Rental Caching})}: There is a rental cost $λ$ (a positive number) for each file in the cache at each time unit. The goal is to minimize the sum of the retrieval costs and the rental costs. \textbf{\emph{Caching with Zapping}}: A file can be \emph{zapped} by paying a zapping cost $N \ge 1$. Once a file is zapped, all future requests of the file don't incur any cost. The goal is to minimize the sum of the retrieval costs and the zapping costs. We study these two variants and also the variant which combines these two (rental caching with zapping). We present deterministic lower and upper bounds in the competitive-analysis framework. We study and extend the online covering algorithm from \citep{young02online} to give deterministic online algorithms. We also present randomized lower and upper bounds for some of these problems.

preprint2012arXiv

Huffman Coding with Letter Costs: A Linear-Time Approximation Scheme

We give a polynomial-time approximation scheme for the generalization of Huffman Coding in which codeword letters have non-uniform costs (as in Morse code, where the dash is twice as long as the dot). The algorithm computes a (1+epsilon)-approximate solution in time O(n + f(epsilon) log^3 n), where n is the input size.

preprint2011arXiv

Greedy D-Approximation Algorithm for Covering with Arbitrary Constraints and Submodular Cost

This paper describes a simple greedy D-approximation algorithm for any covering problem whose objective function is submodular and non-decreasing, and whose feasible region can be expressed as the intersection of arbitrary (closed upwards) covering constraints, each of which constrains at most D variables of the problem. (A simple example is Vertex Cover, with D = 2.) The algorithm generalizes previous approximation algorithms for fundamental covering problems and online paging and caching problems.

preprint2010arXiv

A Bound on the Sum of Weighted Pairwise Distances of Points Constrained to Balls

We consider the problem of choosing Euclidean points to maximize the sum of their weighted pairwise distances, when each point is constrained to a ball centered at the origin. We derive a dual minimization problem and show strong duality holds (i.e., the resulting upper bound is tight) when some locally optimal configuration of points is affinely independent. We sketch a polynomial time algorithm for finding a near-optimal set of points.

preprint2005arXiv

Approximation Algorithms for Covering/Packing Integer Programs

Given matrices A and B and vectors a, b, c and d, all with non-negative entries, we consider the problem of computing min {c.x: x in Z^n_+, Ax > a, Bx < b, x < d}. We give a bicriteria-approximation algorithm that, given epsilon in (0, 1], finds a solution of cost O(ln(m)/epsilon^2) times optimal, meeting the covering constraints (Ax > a) and multiplicity constraints (x < d), and satisfying Bx < (1 + epsilon)b + beta, where beta is the vector of row sums beta_i = sum_j B_ij. Here m denotes the number of rows of A. This gives an O(ln m)-approximation algorithm for CIP -- minimum-cost covering integer programs with multiplicity constraints, i.e., the special case when there are no packing constraints Bx < b. The previous best approximation ratio has been O(ln(max_j sum_i A_ij)) since 1982. CIP contains the set cover problem as a special case, so O(ln m)-approximation is the best possible unless P=NP.

preprint2005arXiv

K-Medians, Facility Location, and the Chernoff-Wald Bound

The paper gives approximation algorithms for the k-medians and facility-location problems (both NP-hard). For k-medians, the algorithm returns a solution using at most ln(n+n/epsilon)k medians and having cost at most (1+epsilon) times the cost of the best solution that uses at most k medians. Here epsilon > 0 is an input to the algorithm. In comparison, the best previous algorithm (Jyh-Han Lin and Jeff Vitter, 1992) had a (1+1/epsilon)ln(n) term instead of the ln(n+n/epsilon) term in the performance guarantee. For facility location, the algorithm returns a solution of cost at most d+ln(n) k, provided there exists a solution of cost d+k where d is the assignment cost and k is the facility cost. In comparison, the best previous algorithm (Dorit Hochbaum, 1982) returned a solution of cost at most ln(n)(d+k). For both problems, the algorithms currently provide the best performance guarantee known for the general (non-metric) problems. The paper also introduces a new probabilistic bound (called "Chernoff-Wald bound") for bounding the expectation of the maximum of a collection of sums of random variables, when each sum contains a random number of terms. The bound is used to analyze the randomized rounding scheme that underlies the algorithms.

preprint2005arXiv

The reverse greedy algorithm for the metric k-median problem

The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the approximation ratio of RGreedy is between ?(log n/ log log n) and O(log n).

preprint2003arXiv

Rounding Algorithms for a Geometric Embedding of Minimum Multiway Cut

The multiway-cut problem is, given a weighted graph and k >= 2 terminal nodes, to find a minimum-weight set of edges whose removal separates all the terminals. The problem is NP-hard, and even NP-hard to approximate within 1+delta for some small delta > 0. Calinescu, Karloff, and Rabani (1998) gave an algorithm with performance guarantee 3/2-1/k, based on a geometric relaxation of the problem. In this paper, we give improved randomized rounding schemes for their relaxation, yielding a 12/11-approximation algorithm for k=3 and a 1.3438-approximation algorithm in general. Our approach hinges on the observation that the problem of designing a randomized rounding scheme for a geometric relaxation is itself a linear programming problem. The paper explores computational solutions to this problem, and gives a proof that for a general class of geometric relaxations, there are always randomized rounding schemes that match the integrality gap.

preprint2002arXiv

A Network-Flow Technique for Finding Low-Weight Bounded-Degree Spanning Trees

The problem considered is the following. Given a graph with edge weights satisfying the triangle inequality, and a degree bound for each vertex, compute a low-weight spanning tree such that the degree of each vertex is at most its specified bound. The problem is NP-hard (it generalizes Traveling Salesman (TSP)). This paper describes a network-flow heuristic for modifying a given tree T to meet the constraints. Choosing T to be a minimum spanning tree (MST) yields approximation algorithms with performance guarantee less than 2 for the problem on geometric graphs with L_p-norms. The paper also describes a Euclidean graph whose minimum TSP costs twice the MST, disproving a conjecture made in ``Low-Degree Spanning Trees of Small Weight'' (1996).

preprint2002arXiv

A New Operation on Sequences: the Boustrouphedon Transform

A generalization of the Seidel-Entringer-Arnold method for calculating the alternating permutation numbers (or secant-tangent numbers) leads to a new operation on integer sequences, the Boustrophedon transform.

preprint2002arXiv

Approximating the Minimum Equivalent Digraph

The MEG (minimum equivalent graph) problem is, given a directed graph, to find a small subset of the edges that maintains all reachability relations between nodes. The problem is NP-hard. This paper gives an approximation algorithm with performance guarantee of pi^2/6 ~ 1.64. The algorithm and its analysis are based on the simple idea of contracting long cycles. (This result is strengthened slightly in ``On strongly connected digraphs with bounded cycle length'' (1996).) The analysis applies directly to 2-Exchange, a simple ``local improvement'' algorithm, showing that its performance guarantee is 1.75.

preprint2002arXiv

Balancing Minimum Spanning and Shortest Path Trees

This paper give a simple linear-time algorithm that, given a weighted digraph, finds a spanning tree that simultaneously approximates a shortest-path tree and a minimum spanning tree. The algorithm provides a continuous trade-off: given the two trees and epsilon > 0, the algorithm returns a spanning tree in which the distance between any vertex and the root of the shortest-path tree is at most 1+epsilon times the shortest-path distance, and yet the total weight of the tree is at most 1+2/epsilon times the weight of a minimum spanning tree. This is the best tradeoff possible. The paper also describes a fast parallel implementation.

preprint2002arXiv

Competitive Paging Algorithms

The paging problem is that of deciding which pages to keep in a memory of k pages in order to minimize the number of page faults. This paper introduces the marking algorithm, a simple randomized on-line algorithm for the paging problem, and gives a proof that its performance guarantee (competitive ratio) is O(log k). In contrast, no deterministic on-line algorithm can have a performance guarantee better than k.

preprint2002arXiv

Designing Multi-Commodity Flow Trees

The traditional multi-commodity flow problem assumes a given flow network in which multiple commodities are to be maximally routed in response to given demands. This paper considers the multi-commodity flow network-design problem: given a set of multi-commodity flow demands, find a network subject to certain constraints such that the commodities can be maximally routed. This paper focuses on the case when the network is required to be a tree. The main result is an approximation algorithm for the case when the tree is required to be of constant degree. The algorithm reduces the problem to the minimum-weight balanced-separator problem; the performance guarantee of the algorithm is within a factor of 4 of the performance guarantee of the balanced-separator procedure. If Leighton and Rao's balanced-separator procedure is used, the performance guarantee is O(log n). This improves the O(log^2 n) approximation factor that is trivial to obtain by a direct application of the balanced-separator method.

preprint2002arXiv

Lecture Notes on Evasiveness of Graph Properties

This report presents notes from the first eight lectures of the class Many Models of Complexity taught by Laszlo Lovasz at Princeton University in the fall of 1990. The topic is evasiveness of graph properties: given a graph property, how many edges of the graph an algorithm must check in the worst case before it knows whether the property holds.

preprint2002arXiv

Low-Degree Spanning Trees of Small Weight

The degree-d spanning tree problem asks for a minimum-weight spanning tree in which the degree of each vertex is at most d. When d=2 the problem is TSP, and in this case, the well-known Christofides algorithm provides a 1.5-approximation algorithm (assuming the edge weights satisfy the triangle inequality). In 1984, Christos Papadimitriou and Umesh Vazirani posed the challenge of finding an algorithm with performance guarantee less than 2 for Euclidean graphs (points in R^n) and d > 2. This paper gives the first answer to that challenge, presenting an algorithm to compute a degree-3 spanning tree of cost at most 5/3 times the MST. For points in the plane, the ratio improves to 3/2 and the algorithm can also find a degree-4 spanning tree of cost at most 5/4 times the MST.

preprint2002arXiv

On-Line End-to-End Congestion Control

Congestion control in the current Internet is accomplished mainly by TCP/IP. To understand the macroscopic network behavior that results from TCP/IP and similar end-to-end protocols, one main analytic technique is to show that the the protocol maximizes some global objective function of the network traffic. Here we analyze a particular end-to-end, MIMD (multiplicative-increase, multiplicative-decrease) protocol. We show that if all users of the network use the protocol, and all connections last for at least logarithmically many rounds, then the total weighted throughput (value of all packets received) is near the maximum possible. Our analysis includes round-trip-times, and (in contrast to most previous analyses) gives explicit convergence rates, allows connections to start and stop, and allows capacities to change.

preprint2002arXiv

On-Line File Caching

In the on-line file-caching problem problem, the input is a sequence of requests for files, given on-line (one at a time). Each file has a non-negative size and a non-negative retrieval cost. The problem is to decide which files to keep in a fixed-size cache so as to minimize the sum of the retrieval costs for files that are not in the cache when requested. The problem arises in web caching by browsers and by proxies. This paper describes a natural generalization of LRU called Landlord and gives an analysis showing that it has an optimal performance guarantee (among deterministic on-line algorithms). The paper also gives an analysis of the algorithm in a so-called ``loosely'' competitive model, showing that on a ``typical'' cache size, either the performance guarantee is O(1) or the total retrieval cost is insignificant.

preprint2002arXiv

On-Line Paging against Adversarially Biased Random Inputs

In evaluating an algorithm, worst-case analysis can be overly pessimistic. Average-case analysis can be overly optimistic. An intermediate approach is to show that an algorithm does well on a broad class of input distributions. Koutsoupias and Papadimitriou recently analyzed the least-recently-used (LRU) paging strategy in this manner, analyzing its performance on an input sequence generated by a so-called diffuse adversary -- one that must choose each request probabilitistically so that no page is chosen with probability more than some fixed epsilon>0. They showed that LRU achieves the optimal competitive ratio (for deterministic on-line algorithms), but they didn't determine the actual ratio. In this paper we estimate the optimal ratios within roughly a factor of two for both deterministic strategies (e.g. least-recently-used and first-in-first-out) and randomized strategies. Around the threshold epsilon ~ 1/k (where k is the cache size), the optimal ratios are both Theta(ln k). Below the threshold the ratios tend rapidly to O(1). Above the threshold the ratio is unchanged for randomized strategies but tends rapidly to Theta(k) for deterministic ones. We also give an alternate proof of the optimality of LRU.

preprint2002arXiv

Orienting Graphs to Optimize Reachability

The paper focuses on two problems: (i) how to orient the edges of an undirected graph in order to maximize the number of ordered vertex pairs (x,y) such that there is a directed path from x to y, and (ii) how to orient the edges so as to minimize the number of such pairs. The paper describes a quadratic-time algorithm for the first problem, and a proof that the second problem is NP-hard to approximate within some constant 1+epsilon > 1. The latter proof also shows that the second problem is equivalent to ``comparability graph completion''; neither problem was previously known to be NP-hard.

preprint2002arXiv

Prefix Codes: Equiprobable Words, Unequal Letter Costs

Describes a near-linear-time algorithm for a variant of Huffman coding, in which the letters may have non-uniform lengths (as in Morse code), but with the restriction that each word to be encoded has equal probability. [See also ``Huffman Coding with Unequal Letter Costs'' (2002).]

preprint2002arXiv

Randomized Rounding without Solving the Linear Program

Randomized rounding is a standard method, based on the probabilistic method, for designing combinatorial approximation algorithms. In Raghavan's seminal paper introducing the method (1988), he writes: "The time taken to solve the linear program relaxations of the integer programs dominates the net running time theoretically (and, most likely, in practice as well)." This paper explores how this bottleneck can be avoided for randomized rounding algorithms for packing and covering problems (linear programs, or mixed integer linear programs, having no negative coefficients). The resulting algorithms are greedy algorithms, and are faster and simpler to implement than standard randomized-rounding algorithms. This approach can also be used to understand Lagrangian-relaxation algorithms for packing/covering linear programs: such algorithms can be viewed as as (derandomized) randomized-rounding schemes.

preprint2002arXiv

Sequential and Parallel Algorithms for Mixed Packing and Covering

Mixed packing and covering problems are problems that can be formulated as linear programs using only non-negative coefficients. Examples include multicommodity network flow, the Held-Karp lower bound on TSP, fractional relaxations of set cover, bin-packing, knapsack, scheduling problems, minimum-weight triangulation, etc. This paper gives approximation algorithms for the general class of problems. The sequential algorithm is a simple greedy algorithm that can be implemented to find an epsilon-approximate solution in O(epsilon^-2 log m) linear-time iterations. The parallel algorithm does comparable work but finishes in polylogarithmic time. The results generalize previous work on pure packing and covering (the special case when the constraints are all "less-than" or all "greater-than") by Michael Luby and Noam Nisan (1993) and Naveen Garg and Jochen Konemann (1998).

preprint2002arXiv

Simple Strategies for Large Zero-Sum Games with Applications to Complexity Theory

Von Neumann's Min-Max Theorem guarantees that each player of a zero-sum matrix game has an optimal mixed strategy. This paper gives an elementary proof that each player has a near-optimal mixed strategy that chooses uniformly at random from a multiset of pure strategies of size logarithmic in the number of pure strategies available to the opponent. For exponentially large games, for which even representing an optimal mixed strategy can require exponential space, it follows that there are near-optimal, linear-size strategies. These strategies are easy to play and serve as small witnesses to the approximate value of the game. As a corollary, it follows that every language has small ``hard'' multisets of inputs certifying that small circuits can't decide the language. For example, if SAT does not have polynomial-size circuits, then, for each n and c, there is a set of n^(O(c)) Boolean formulae of size n such that no circuit of size n^c (or algorithm running in time n^c) classifies more than two-thirds of the formulae succesfully.

preprint2002arXiv

The K-Server Dual and Loose Competitiveness for Paging

This paper has two results. The first is based on the surprising observation that the well-known ``least-recently-used'' paging algorithm and the ``balance'' algorithm for weighted caching are linear-programming primal-dual algorithms. This observation leads to a strategy (called ``Greedy-Dual'') that generalizes them both and has an optimal performance guarantee for weighted caching. For the second result, the paper presents empirical studies of paging algorithms, documenting that in practice, on ``typical'' cache sizes and sequences, the performance of paging strategies are much better than their worst-case analyses in the standard model suggest. The paper then presents theoretical results that support and explain this. For example: on any input sequence, with almost all cache sizes, either the performance guarantee of least-recently-used is O(log k) or the fault rate (in an absolute sense) is insignificant. Both of these results are strengthened and generalized in``On-line File Caching'' (1998).

Neal E. Young

What is connected

Connect this record

See the researcher in context

Building this map preview

40 published item(s)

On Huang and Wong's Algorithm for Generalized Binary Split Trees

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Optimal Search Trees with 2-Way Comparisons

Algorithmic approaches to selecting control clones in DNA array hybridization experiments

Distributed algorithms for covering, packing and maximum weighted matching

Incremental Medians via Online Bidding

Approximation Algorithms for the Joint Replenishment Problem with Deadlines

On the Number of Iterations for Dantzig-Wolfe Optimization and Packing-Covering Approximation Algorithms

Nearly Linear-Work Algorithms for Mixed Packing/Covering and Facility-Location Linear Programs

A Nearly Linear-Time PTAS for Explicit Fractional Packing and Covering Linear Programs

Approximating 1-dimensional TSP Requires Omega(n log n) Comparisons

First-Come-First-Served for Online Slot Allocation and Huffman Coding

Hamming Approximation of NP Witnesses

On a Linear Program for Minimum-Weight Triangulation

Caching with rental cost and zapping

Huffman Coding with Letter Costs: A Linear-Time Approximation Scheme

Greedy D-Approximation Algorithm for Covering with Arbitrary Constraints and Submodular Cost

A Bound on the Sum of Weighted Pairwise Distances of Points Constrained to Balls

Approximation Algorithms for Covering/Packing Integer Programs

K-Medians, Facility Location, and the Chernoff-Wald Bound

The reverse greedy algorithm for the metric k-median problem

Rounding Algorithms for a Geometric Embedding of Minimum Multiway Cut

A Network-Flow Technique for Finding Low-Weight Bounded-Degree Spanning Trees

A New Operation on Sequences: the Boustrouphedon Transform

Approximating the Minimum Equivalent Digraph

Balancing Minimum Spanning and Shortest Path Trees

Competitive Paging Algorithms

Designing Multi-Commodity Flow Trees

Lecture Notes on Evasiveness of Graph Properties

Low-Degree Spanning Trees of Small Weight

On-Line End-to-End Congestion Control

On-Line File Caching

On-Line Paging against Adversarially Biased Random Inputs

Orienting Graphs to Optimize Reachability

Prefix Codes: Equiprobable Words, Unequal Letter Costs

Randomized Rounding without Solving the Linear Program

Sequential and Parallel Algorithms for Mixed Packing and Covering

Simple Strategies for Large Zero-Sum Games with Applications to Complexity Theory

The K-Server Dual and Loose Competitiveness for Paging