Source author record

Marek Chrobak

Marek Chrobak appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Complexity Discrete Mathematics

Catalog footprint

What is connected

22works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

On Huang and Wong's Algorithm for Generalized Binary Split Trees

Huang and Wong [1984] proposed a polynomial-time dynamic-programming algorithm for computing optimal generalized binary split trees. We show that their algorithm is incorrect. Thus, it remains open whether such trees can be computed in polynomial time. Spuler [1994] proposed modifying Huang and Wong's algorithm to obtain an algorithm for a different problem: computing optimal two-way-comparison search trees. We show that the dynamic program underlying Spuler's algorithm is not valid, in that it does not satisfy the necessary optimal-substructure property and its proposed recurrence relation is incorrect. It remains unknown whether the algorithm is guaranteed to compute a correct overall solution.

preprint2021arXiv

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

We present a simple $O(n^4)$-time algorithm for computing optimal search trees with two-way comparisons. The only previous solution to this problem, by Anderson et al., has the same running time, but is significantly more complicated and is restricted to the variant where only successful queries are allowed. Our algorithm extends directly to solve the standard full variant of the problem, which also allows unsuccessful queries and for which no polynomial-time algorithm was previously known. The correctness proof of our algorithm relies on a new structural theorem for two-way-comparison search trees.

preprint2021arXiv

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Search trees are commonly used to implement access operations to a set of stored keys. If this set is static and the probabilities of membership queries are known in advance, then one can precompute an optimal search tree, namely one that minimizes the expected access cost. For a non-key query, a search tree can determine its approximate location by returning the inter-key interval containing the query. This is in contrast to other dictionary data structures, like hash tables, that only report a failed search. We address the question "what is the additional cost of determining approximate locations for non-key queries"? We prove that for two-way comparison trees this additional cost is at most 1. Our proof is based on a novel probabilistic argument that involves converting a search tree that does not identify non-key queries into a random tree that does.

preprint2021arXiv

Optimal Search Trees with 2-Way Comparisons

In 1971, Knuth gave an $O(n^2)$-time algorithm for the classic problem of finding an optimal binary search tree. Knuth's algorithm works only for search trees based on 3-way comparisons, while most modern computers support only 2-way comparisons (e.g., $<, \le, =, \ge$, and $>$). Until this paper, the problem of finding an optimal search tree using 2-way comparisons remained open -- poly-time algorithms were known only for restricted variants. We solve the general case, giving (i) an $O(n^4)$-time algorithm and (ii) an $O(n \log n)$-time additive-3 approximation algorithm. Also, for finding optimal binary split trees, we (iii) obtain a linear speedup and (iv) prove some previous work incorrect.

preprint2020arXiv

Algorithmic approaches to selecting control clones in DNA array hybridization experiments

We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonucleotide probe. This experiment produces analog signals, one for each clone, which then need to be classified, that is, converted into binary values 1 and 0 that represent hybridization and non-hybridization events. In addition to the sample rRNA gene clones, the array contains a number of control clones needed to calibrate the classification procedure of the hybridization signals. These control clones must be selected with care to optimize the classification process. We formulate this as a combinatorial optimization problem called Balanced Covering. We prove that the problem is NP-hard, and we show some results on hardness of approximation. We propose approximation algorithms based on randomized rounding and we show that, with high probability, our algorithms approximate well the optimum solution. The experimental results confirm that the algorithms find high quality control clones. The algorithms have been implemented and are publicly available as part of the software package called CloneTools.

preprint2020arXiv

Incremental Medians via Online Bidding

In the k-median problem we are given sets of facilities and customers, and distances between them. For a given set F of facilities, the cost of serving a customer u is the minimum distance between u and a facility in F. The goal is to find a set F of k facilities that minimizes the sum, over all customers, of their service costs. Following Mettu and Plaxton, we study the incremental medians problem, where k is not known in advance, and the algorithm produces a nested sequence of facility sets where the kth set has size k. The algorithm is c-cost-competitive if the cost of each set is at most c times the cost of the optimum set of size k. We give improved incremental algorithms for the metric version: an 8-cost-competitive deterministic algorithm, a 2e ~ 5.44-cost-competitive randomized algorithm, a (24+epsilon)-cost-competitive, poly-time deterministic algorithm, and a (6e+epsilon ~ .31)-cost-competitive, poly-time randomized algorithm. The algorithm is s-size-competitive if the cost of the kth set is at most the minimum cost of any set of size k, and has size at most s k. The optimal size-competitive ratios for this problem are 4 (deterministic) and e (randomized). We present the first poly-time O(log m)-size-approximation algorithm for the offline problem and first poly-time O(log m)-size-competitive algorithm for the incremental problem. Our proofs reduce incremental medians to the following online bidding problem: faced with an unknown threshold T, an algorithm submits "bids" until it submits a bid that is at least the threshold. It pays the sum of all its bids. We prove that folklore algorithms for online bidding are optimally competitive.

preprint2020arXiv

Scheduling with Gaps: New Models and Algorithms

We consider scheduling problems for unit jobs with release times, where the number or size of the gaps in the schedule is taken into consideration, either in the objective function or as a constraint. Except for a few papers on energy minimization, there is no work in the scheduling literature that uses performance metrics depending on the gap structure of a schedule. One of our objectives is to initiate the study of such scheduling problems with gaps. We show that such problems often lead to interesting algorithmic problems, with connections to other areas of algorithmics. We focus on the model with unit jobs. First we examine scheduling problems with deadlines, where we consider variants of minimum-gap scheduling, including maximizing throughput with a budget for gaps or minimizing the number of gaps with a throughput requirement. We then turn to other objective functions. For example, in some scenarios, gaps in a schedule may be actually desirable, leading to the problem of maximizing the number of gaps. Other versions we study include minimizing maximum gap or maximizing minimum gap. The second part of the paper examines the model without deadlines, where we focus on the tradeoff between the number of gaps and the total or maximum flow time. For all these problems we provide polynomial time algorithms, with running times ranging from $O(n \log n)$ for some problems, to $O(n^7)$ for other. The solutions involve a spectrum of algorithmic techniques, including different dynamic programming formulations, speed-up techniques based on searching Monge arrays, searching X + Y matrices, or implicit binary search.

preprint2016arXiv

Online Algorithms for Multi-Level Aggregation

In the Multi-Level Aggregation Problem (MLAP), requests arrive at the nodes of an edge-weighted tree T, and have to be served eventually. A service is defined as a subtree X of T that contains its root. This subtree X serves all requests that are pending in the nodes of X, and the cost of this service is equal to the total weight of X. Each request also incurs waiting cost between its arrival and service times. The objective is to minimize the total waiting cost of all requests plus the total cost of all service subtrees. MLAP is a generalization of some well-studied optimization problems; for example, for trees of depth 1, MLAP is equivalent to the TCP Acknowledgment Problem, while for trees of depth 2, it is equivalent to the Joint Replenishment Problem. Aggregation problem for trees of arbitrary depth arise in multicasting, sensor networks, communication in organization hierarchies, and in supply-chain management. The instances of MLAP associated with these applications are naturally online, in the sense that aggregation decisions need to be made without information about future requests. Constant-competitive online algorithms are known for MLAP with one or two levels. However, it has been open whether there exist constant competitive online algorithms for trees of depth more than 2. Addressing this open problem, we give the first constant competitive online algorithm for networks of arbitrary (fixed) number of levels. The competitive ratio is O(D^4 2^D), where D is the depth of T. The algorithm works for arbitrary waiting cost functions, including the variant with deadlines. We also show several additional lower and upper bound results for some special cases of MLAP, including the Single-Phase variant and the case when the tree is a path.

preprint2016arXiv

Online Clique Clustering

Clique clustering is the problem of partitioning the vertices of a graph into disjoint clusters, where each cluster forms a clique in the graph, while optimizing some objective function. In online clustering, the input graph is given one vertex at a time, and any vertices that have previously been clustered together are not allowed to be separated. The goal is to maintain a clustering with an objective value close to the optimal solution. For the variant where we want to maximize the number of edges in the clusters, we propose an online strategy based on the doubling technique. It has an asymptotic competitive ratio at most 15.646 and an absolute competitive ratio at most 22.641. We also show that no deterministic strategy can have an asymptotic competitive ratio better than 6. For the variant where we want to minimize the number of edges between clusters, we show that the deterministic competitive ratio of the problem is $n-ω(1)$, where n is the number of vertices in the graph.

preprint2016arXiv

Online Packet Scheduling with Bounded Delay and Lookahead

We study the online bounded-delay packet scheduling problem (BDPS), where packets of unit size arrive at a router over time and need to be transmitted over a network link. Each packet has two attributes: a non-negative weight and a deadline for its transmission. The objective is to maximize the total weight of the transmitted packets. This problem has been well studied in the literature, yet its optimal competitive ratio remains unknown: the best upper bound is $1.828$, still quite far from the best lower bound of $ϕ\approx 1.618$. In the variant of BDPS with $s$-bounded instances, each packet can be scheduled in at most $s$ consecutive slots, starting at its release time. The lower bound of $ϕ$ applies even to the special case of $2$-bounded instances, and a $ϕ$-competitive algorithm for $3$-bounded instances was given in Chin et al. Improving that result, and addressing a question posed by Goldwasser, we present a $ϕ$-competitive algorithm for $4$-bounded instances. We also study a variant of BDPS where an online algorithm has the additional power of $1$-lookahead, knowing at time $t$ which packets will arrive at time $t+1$. For BDPS with $1$-lookahead restricted to $2$-bounded instances, we present an online algorithm with competitive ratio $(\sqrt{13} - 1)/2 \approx 1.303$ and we prove a nearly tight lower bound of $(1 + \sqrt{17})/4 \approx 1.281$.

preprint2015arXiv

Approximation Algorithms for the Joint Replenishment Problem with Deadlines

The Joint Replenishment Problem (JRP) is a fundamental optimization problem in supply-chain management, concerned with optimizing the flow of goods from a supplier to retailers. Over time, in response to demands at the retailers, the supplier ships orders, via a warehouse, to the retailers. The objective is to schedule these orders to minimize the sum of ordering costs and retailers' waiting costs. We study the approximability of JRP-D, the version of JRP with deadlines, where instead of waiting costs the retailers impose strict deadlines. We study the integrality gap of the standard linear-program (LP) relaxation, giving a lower bound of 1.207, a stronger, computer-assisted lower bound of 1.245, as well as an upper bound and approximation ratio of 1.574. The best previous upper bound and approximation ratio was 1.667; no lower bound was previously published. For the special case when all demand periods are of equal length we give an upper bound of 1.5, a lower bound of 1.2, and show APX-hardness.

preprint2015arXiv

Faster Information Gathering in Ad-Hoc Radio Tree Networks

We study information gathering in ad-hoc radio networks. Initially, each node of the network has a piece of information called a rumor, and the overall objective is to gather all these rumors in the designated target node. The ad-hoc property refers to the fact that the topology of the network is unknown when the computation starts. Aggregation of rumors is not allowed, which means that each node may transmit at most one rumor in one step. We focus on networks with tree topologies, that is we assume that the network is a tree with all edges directed towards the root, but, being ad-hoc, its actual topology is not known. We provide two deterministic algorithms for this problem. For the model that does not assume any collision detection nor acknowledgement mechanisms, we give an $O(n\log\log n)$-time algorithm, improving the previous upper bound of $O(n\log n)$. We also show that this running time can be further reduced to $O(n)$ if the model allows for acknowledgements of successful transmissions.

preprint2014arXiv

A Note on NP-Hardness of Preemptive Mean Flow-Time Scheduling for Parallel Machines

In the paper "The complexity of mean flow time scheduling problems with release times", by Baptiste, Brucker, Chrobak, Dürr, Kravchenko and Sourd, the authors claimed to prove strong NP-hardness of the scheduling problem $P|pmtn,r_j|\sum C_j$, namely multiprocessor preemptive scheduling where the objective is to minimize the mean flow time. We point out a serious error in their proof and give a new proof of strong NP-hardness for this problem.

preprint2014arXiv

Information Gathering in Ad-Hoc Radio Networks with Tree Topology

We study the problem of information gathering in ad-hoc radio networks without collision detection, focussing on the case when the network forms a tree, with edges directed towards the root. Initially, each node has a piece of information that we refer to as a rumor. Our goal is to design protocols that deliver all rumors to the root of the tree as quickly as possible. The protocol must complete this task within its allotted time even though the actual tree topology is unknown when the computation starts. In the deterministic case, assuming that the nodes are labeled with small integers, we give an O(n)-time protocol that uses unbounded messages, and an O(n log n)-time protocol using bounded messages, where any message can include only one rumor. We also consider fire-and-forward protocols, in which a node can only transmit its own rumor or the rumor received in the previous step. We give a deterministic fire-and- forward protocol with running time O(n^1.5), and we show that it is asymptotically optimal. We then study randomized algorithms where the nodes are not labelled. In this model, we give an O(n log n)-time protocol and we prove that this bound is asymptotically optimal.

preprint2013arXiv

Better Approximation Bounds for the Joint Replenishment Problem

The Joint Replenishment Problem (JRP) deals with optimizing shipments of goods from a supplier to retailers through a shared warehouse. Each shipment involves transporting goods from the supplier to the warehouse, at a fixed cost C, followed by a redistribution of these goods from the warehouse to the retailers that ordered them, where transporting goods to a retailer $ρ$ has a fixed cost $c_ρ$. In addition, retailers incur waiting costs for each order. The objective is to minimize the overall cost of satisfying all orders, namely the sum of all shipping and waiting costs. JRP has been well studied in Operations Research and, more recently, in the area of approximation algorithms. For arbitrary waiting cost functions, the best known approximation ratio is 1.8. This ratio can be reduced to 1.574 for the JRP-D model, where there is no cost for waiting but orders have deadlines. As for hardness results, it is known that the problem is APX-hard and that the natural linear program for JRP has integrality gap at least 1.245. Both results hold even for JRP-D. In the online scenario, the best lower and upper bounds on the competitive ratio are 2.64 and 3, respectively. The lower bound of 2.64 applies even to the restricted version of JRP, denoted JRP-L, where the waiting cost function is linear. We provide several new approximation results for JRP. In the offline case, we give an algorithm with ratio 1.791, breaking the barrier of 1.8. In the online case, we show a lower bound of 2.754 on the competitive ratio for JRP-L (and thus JRP as well), improving the previous bound of 2.64. We also study the online version of JRP-D, for which we prove that the optimal competitive ratio is 2.

preprint2013arXiv

LP-rounding Algorithms for the Fault-Tolerant Facility Placement Problem

The Fault-Tolerant Facility Placement problem (FTFP) is a generalization of the classic Uncapacitated Facility Location Problem (UFL). In FTFP we are given a set of facility sites and a set of clients. Opening a facility at site $i$ costs $f_i$ and connecting client $j$ to a facility at site $i$ costs $d_{ij}$. We assume that the connection costs (distances) $d_{ij}$ satisfy the triangle inequality. Multiple facilities can be opened at any site. Each client $j$ has a demand $r_j$, which means that it needs to be connected to $r_j$ different facilities (some of which could be located on the same site). The goal is to minimize the sum of facility opening cost and connection cost. The main result of this paper is a 1.575-approximation algorithm for FTFP, based on LP-rounding. The algorithm first reduces the demands to values polynomial in the number of sites. Then it uses a technique that we call adaptive partitioning, which partitions the instance by splitting clients into unit demands and creating a number of (not yet opened) facilities at each site. It also partitions the optimal fractional solution to produce a fractional solution for this new instance. The partitioned instance satisfies a number of properties that allow us to exploit existing LP-rounding methods for UFL to round our partitioned solution to an integral solution, preserving the approximation ratio. In particular, our 1.575-approximation algorithm is based on the ideas from the 1.575-approximation algorithm for UFL by Byrka et al., with changes necessary to satisfy the fault-tolerance requirement.

preprint2011arXiv

Better Bounds for Incremental Frequency Allocation in Bipartite Graphs

We study frequency allocation in wireless networks. A wireless network is modeled by an undirected graph, with vertices corresponding to cells. In each vertex we have a certain number of requests, and each of those requests must be assigned a different frequency. Edges represent conflicts between cells, meaning that frequencies in adjacent vertices must be different as well. The objective is to minimize the total number of used frequencies. The offline version of the problem is known to be NP-hard. In the incremental version, requests for frequencies arrive over time and the algorithm is required to assign a frequency to a request as soon as it arrives. Competitive incremental algorithms have been studied for several classes of graphs. For paths, the optimal (asymptotic) ratio is known to be 4/3, while for hexagonal-cell graphs it is between 1.5 and 1.9126. For k-colorable graphs, the ratio of (k+1)/2 can be achieved. In this paper, we prove nearly tight bounds on the asymptotic competitive ratio for bipartite graphs, showing that it is between 1.428 and 1.433. This improves the previous lower bound of 4/3 and upper bound of 1.5. Our proofs are based on reducing the incremental problem to a purely combinatorial (equivalent) problem of constructing set families with certain intersection properties.

preprint2011arXiv

New Results on the Fault-Tolerant Facility Placement Problem

We studied the Fault-Tolerant Facility Placement problem (FTFP) which generalizes the uncapacitated facility location problem (UFL). In FTFP, we are given a set F of sites at which facilities can be built, and a set C of clients with some demands that need to be satisfied by different facilities. A client $j$ has demand $r_j$. Building one facility at a site $i$ incurs a cost $f_i$, and connecting one unit of demand from client $j$ to a facility at site $i\in\fac$ costs $d_{ij}$. $d_{ij}$'s are assumed to form a metric. A feasible solution specifies the number of facilities to be built at each site and the way to connect demands from clients to facilities, with the restriction that demands from the same client must go to different facilities. Facilities at the same site are considered different. The goal is to find a solution with minimum total cost. We gave a 1.7245-approximation algorithm to the FTFP problem. Our technique is via a reduction to the Fault-Tolerant Facility Location problem, in which each client has demand $r_j$ but each site can have at most one facility built.

preprint2010arXiv

Polynomial Time Algorithms for Minimum Energy Scheduling

The aim of power management policies is to reduce the amount of energy consumed by computer systems while maintaining satisfactory level of performance. One common method for saving energy is to simply suspend the system during the idle times. No energy is consumed in the suspend mode. However, the process of waking up the system itself requires a certain fixed amount of energy, and thus suspending the system is beneficial only if the idle time is long enough to compensate for this additional energy expenditure. In the specific problem studied in the paper, we have a set of jobs with release times and deadlines that need to be executed on a single processor. Preemptions are allowed. The processor requires energy L to be woken up and, when it is on, it uses one unit of energy per one unit of time. It has been an open problem whether a schedule minimizing the overall energy consumption can be computed in polynomial time. We solve this problem in positive, by providing an O(n^5)-time algorithm. In addition we provide an O(n^4)-time algorithm for computing the minimum energy schedule when all jobs have unit length.

preprint2010arXiv

Tile Packing Tomography is NP-hard

Discrete tomography deals with reconstructing finite spatial objects from lower dimensional projections and has applications for example in timetable design. In this paper we consider the problem of reconstructing a tile packing from its row and column projections. It consists of disjoint copies of a fixed tile, all contained in some rectangular grid. The projections tell how many cells are covered by a tile in each row and column. How difficult is it to construct a tile packing satisfying given projections? It was known to be solvable by a greedy algorithm for bars (tiles of width or height 1), and NP-hardness results were known for some specific tiles. This paper shows that the problem is NP-hard whenever the tile is not a bar.

preprint2008arXiv

Generalized Whac-a-Mole

We consider online competitive algorithms for the problem of collecting weighted items from a dynamic set S, when items are added to or deleted from S over time. The objective is to maximize the total weight of collected items. We study the general version, as well as variants with various restrictions, including the following: the uniform case, when all items have the same weight, the decremental sets, when all items are present at the beginning and only deletion operations are allowed, and dynamic queues, where the dynamic set is ordered and only its prefixes can be deleted (with no restriction on insertions). The dynamic queue case is a generalization of bounded-delay packet scheduling (also referred to as buffer management). We present several upper and lower bounds on the competitive ratio for these variants.

preprint2005arXiv

The reverse greedy algorithm for the metric k-median problem

The Reverse Greedy algorithm (RGreedy) for the k-median problem works as follows. It starts by placing facilities on all nodes. At each step, it removes a facility to minimize the resulting total distance from the customers to the remaining facilities. It stops when k facilities remain. We prove that, if the distance function is metric, then the approximation ratio of RGreedy is between ?(log n/ log log n) and O(log n).

Marek Chrobak

What is connected

Connect this record

See the researcher in context

Building this map preview

22 published item(s)

On Huang and Wong's Algorithm for Generalized Binary Split Trees

A Simple Algorithm for Optimal Search Trees with Two-Way Comparisons

On the Cost of Unsuccessful Searches in Search Trees with Two-way Comparisons

Optimal Search Trees with 2-Way Comparisons

Algorithmic approaches to selecting control clones in DNA array hybridization experiments

Incremental Medians via Online Bidding

Scheduling with Gaps: New Models and Algorithms

Online Algorithms for Multi-Level Aggregation

Online Clique Clustering

Online Packet Scheduling with Bounded Delay and Lookahead

Approximation Algorithms for the Joint Replenishment Problem with Deadlines

Faster Information Gathering in Ad-Hoc Radio Tree Networks

A Note on NP-Hardness of Preemptive Mean Flow-Time Scheduling for Parallel Machines

Information Gathering in Ad-Hoc Radio Networks with Tree Topology

Better Approximation Bounds for the Joint Replenishment Problem

LP-rounding Algorithms for the Fault-Tolerant Facility Placement Problem

Better Bounds for Incremental Frequency Allocation in Bipartite Graphs

New Results on the Fault-Tolerant Facility Placement Problem

Polynomial Time Algorithms for Minimum Energy Scheduling

Tile Packing Tomography is NP-hard

Generalized Whac-a-Mole

The reverse greedy algorithm for the metric k-median problem