Source author record

Shi Li

Shi Li appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Complexity Databases Distributed, Parallel, and Cluster Computing Artificial Intelligence cond-mat.mtrl-sci Discrete Mathematics hep-th Machine Learning math.CO math.OC Networking and Internet Architecture

Catalog footprint

What is connected

26works

12topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Large language models (LLMs) are rapidly changing how researchers in materials science and chemistry discover, organize, and act on scientific knowledge. This paper analyzes a broad set of community-developed LLM applications in an effort to identify emerging patterns in how these systems can be used across the scientific research lifecycle. We organize the projects into two complementary categories: Knowledge Infrastructure, systems that structure, retrieve, synthesize, and validate scientific information; and Action Systems, systems that execute, coordinate, or automate scientific work across computational and experimental environments. The submissions reveal a shift from single-purpose LLM tools toward integrated, multi-agent workflows that combine retrieval, reasoning, tool use, and domain-specific validation. Prominent themes include retrieval-augmented generation as grounding infrastructure, persistent structured knowledge representations, multimodal and multilingual scientific inputs, and early progress toward laboratory-integrated closed-loop systems. Together, these results suggest that LLMs are evolving from general-purpose assistants into composable infrastructure for scientific reasoning and action. This work provides a community snapshot of that transition and a practical taxonomy for understanding emerging LLM-enabled workflows in materials science and chemistry.

preprint2022arXiv

Nested Active-Time Scheduling

The active-time scheduling problem considers the problem of scheduling preemptible jobs with windows (release times and deadlines) on a parallel machine that can schedule up to $g$ jobs during each timestep. The goal in the active-time problem is to minimize the number of active steps, i.e., timesteps in which at least one job is scheduled. In this way, the active time models parallel scheduling when there is a fixed cost for turning the machine on at each discrete step. This paper presents a 9/5-approximation algorithm for a special case of the active-time scheduling problem in which job windows are laminar (nested). This result improves on the previous best 2-approximation for the general case.

preprint2020arXiv

Consistent $k$-Median: Simpler, Better and Robust

In this paper we introduce and study the online consistent $k$-clustering with outliers problem, generalizing the non-outlier version of the problem studied in [Lattanzi-Vassilvitskii, ICML17]. We show that a simple local-search based online algorithm can give a bicriteria constant approximation for the problem with $O(k^2 \log^2 (nD))$ swaps of medians (recourse) in total, where $D$ is the diameter of the metric. When restricted to the problem without outliers, our algorithm is simpler, deterministic and gives better approximation ratio and recourse, compared to that of [Lattanzi-Vassilvitskii, ICML17].

preprint2020arXiv

Covering the Relational Join

In this paper, we initiate a theoretical study of what we call the join covering problem. We are given a natural join query instance $Q$ on $n$ attributes and $m$ relations $(R_i)_{i \in [m]}$. Let $J_{Q} = \ \Join_{i=1}^m R_i$ denote the join output of $Q$. In addition to $Q$, we are given a parameter $Δ: 1\le Δ\le n$ and our goal is to compute the smallest subset $\mathcal{T}_{Q, Δ} \subseteq J_{Q}$ such that every tuple in $J_{Q}$ is within Hamming distance $Δ- 1$ from some tuple in $\mathcal{T}_{Q, Δ}$. The join covering problem captures both computing the natural join from database theory and constructing a covering code with covering radius $Δ- 1$ from coding theory, as special cases. We consider the combinatorial version of the join covering problem, where our goal is to determine the worst-case $|\mathcal{T}_{Q, Δ}|$ in terms of the structure of $Q$ and value of $Δ$. One obvious approach to upper bound $|\mathcal{T}_{Q, Δ}|$ is to exploit a distance property (of Hamming distance) from coding theory and combine it with the worst-case bounds on output size of natural joins (AGM bound hereon) due to Atserias, Grohe and Marx [SIAM J. of Computing'13]. Somewhat surprisingly, this approach is not tight even for the case when the input relations have arity at most two. Instead, we show that using the polymatroid degree-based bound of Abo Khamis, Ngo and Suciu [PODS'17] in place of the AGM bound gives us a tight bound (up to constant factors) on the $|\mathcal{T}_{Q, Δ}|$ for the arity two case. We prove lower bounds for $|\mathcal{T}_{Q, Δ}|$ using well-known classes of error-correcting codes e.g, Reed-Solomon codes. We can extend our results for the arity two case to general arity with a polynomial gap between our upper and lower bounds.

preprint2020arXiv

Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

We consider the classic problem of scheduling jobs with precedence constraints on a set of identical machines to minimize the makespan objective function. Understanding the exact approximability of the problem when the number of machines is a constant is a well-known question in scheduling theory. Indeed, an outstanding open problem from the classic book of Garey and Johnson asks whether this problem is NP-hard even in the case of 3 machines and unit-length jobs. In a recent breakthrough, Levey and Rothvoss gave a $(1+ε)$-approximation algorithm, which runs in nearly quasi-polynomial time, for the case when job have unit lengths. However, a substantially more difficult case where jobs have arbitrary processing lengths has remained open. We make progress on this more general problem. We show that there exists a $(1+ε)$-approximation algorithm (with similar running time as that of Levey and Rothvoss) for the non-migratory setting: when every job has to be scheduled entirely on a single machine, but within a machine the job need not be scheduled during consecutive time steps. Further, we also show that our algorithmic framework generalizes to another classic scenario where, along with the precedence constraints, the jobs also have communication delay constraints. Both of these fundamental problems are highly relevant to the practice of datacenter scheduling.

preprint2020arXiv

On Approximating Degree-Bounded Network Design Problems

Directed Steiner Tree (DST) is a central problem in combinatorial optimization and theoretical computer science: Given a directed graph $G=(V, E)$ with edge costs $c \in \mathbb{R}_{\geq 0}^E$, a root $r \in V$ and $k$ terminals $K\subseteq V$, we need to output the minimum-cost arborescence in $G$ that contains an $r$\textrightarrow $t$ path for every $t \in K$. Recently, Grandoni, Laekhanukit and Li, and independently Ghuge and Nagarajan, gave quasi-polynomial time $O(\log^2k/\log \log k)$-approximation algorithms for the problem, which are tight under popular complexity assumptions. In this paper, we consider the more general Degree-Bounded Directed Steiner Tree (DB-DST) problem, where we are additionally given a degree bound $d_v$ on each vertex $v \in V$, and we require that every vertex $v$ in the output tree has at most $d_v$ children. We give a quasi-polynomial time $(O(\log n \log k), O(\log^2 n))$-bicriteria approximation: The algorithm produces a solution with cost at most $O(\log n\log k)$ times the cost of the optimum solution that violates the degree constraints by at most a factor of $O(\log^2n)$. This is the first non-trivial result for the problem. While our cost-guarantee is nearly optimal, the degree violation factor of $O(\log^2n)$ is an $O(\log n)$-factor away from the approximation lower bound of $Ω(\log n)$ from the set-cover hardness. The hardness result holds even on the special case of the {\em Degree-Bounded Group Steiner Tree} problem on trees (DB-GST-T). With the hope of closing the gap, we study the question of whether the degree violation factor can be made tight for this special case. We answer the question in the affirmative by giving an $(O(\log n\log k), O(\log n))$-bicriteria approximation algorithm for DB-GST-T.

preprint2020arXiv

The Power of Recourse: Better Algorithms for Facility Location in Online and Dynamic Models

In this paper we study the facility location problem in the online with recourse and dynamic algorithm models. In the online with recourse model, clients arrive one by one and our algorithm needs to maintain good solutions at all time steps with only a few changes to the previously made decisions (called recourse). We show that the classic local search technique can lead to a $(1+\sqrt{2}+ε)$-competitive online algorithm for facility location with only $O\left(\frac{\log n}ε\log\frac1ε\right)$ amortized facility and client recourse. We then turn to the dynamic algorithm model for the problem, where the main goal is to design fast algorithms that maintain good solutions at all time steps. We show that the result for online facility location, combined with the randomized local search technique of Charikar and Guha [10], leads to an $O(1+\sqrt{2}+ε)$ approximation dynamic algorithm with amortized update time of $\tilde O(n)$ in the incremental setting. Notice that the running time is almost optimal, since in general metric space it takes $Ω(n)$ time to specify a new client's position. The approximation factor of our algorithm also matches the best offline analysis of the classic local search algorithm. Finally, we study the fully dynamic model for facility location, where clients can both arrive and depart. Our main result is an $O(1)$-approximation algorithm in this model with $O(|F|)$ preprocessing time and $O(\log^3 D)$ amortized update time for the HST metric spaces. Using the seminal results of Bartal [4] and Fakcharoenphol, Rao and Talwar [17], which show that any arbitrary $N$-point metric space can be embedded into a distribution over HSTs such that the expected distortion is at most $O(\log N)$, we obtain a $O(\log |F|)$ approximation with preprocessing time of $O(|F|^2\log |F|)$ and $O(\log^3 D)$ amortized update time.

preprint2020arXiv

Topology Dependent Bounds For FAQs

In this paper, we prove topology dependent bounds on the number of rounds needed to compute Functional Aggregate Queries (FAQs) studied by Abo Khamis et al. [PODS 2016] in a synchronous distributed network under the model considered by Chattopadhyay et al. [FOCS 2014, SODA 2017]. Unlike the recent work on computing database queries in the Massively Parallel Computation model, in the model of Chattopadhyay et al., nodes can communicate only via private point-to-point channels and we are interested in bounds that work over an {\em arbitrary} communication topology. This is the first work to consider more practically motivated problems in this distributed model. For the sake of exposition, we focus on two special problems in this paper: Boolean Conjunctive Query (BCQ) and computing variable/factor marginals in Probabilistic Graphical Models (PGMs). We obtain tight bounds on the number of rounds needed to compute such queries as long as the underlying hypergraph of the query is $O(1)$-degenerate and has $O(1)$-arity. In particular, the $O(1)$-degeneracy condition covers most well-studied queries that are efficiently computable in the centralized computation model like queries with constant treewidth. These tight bounds depend on a new notion of `width' (namely internal-node-width) for Generalized Hypertree Decompositions (GHDs) of acyclic hypergraphs, which minimizes the number of internal nodes in a sub-class of GHDs. To the best of our knowledge, this width has not been studied explicitly in the theoretical database literature. Finally, we consider the problem of computing the product of a vector with a chain of matrices and prove tight bounds on its round complexity (over the finite field of two elements) using a novel min-entropy based argument.

preprint2020arXiv

Towards PTAS for Precedence Constrained Scheduling via Combinatorial Algorithms

We study the classic problem of scheduling $n$ precedence constrained unit-size jobs on $m = O(1)$ machines so as to minimize the makespan. In a recent breakthrough, Levey and Rothvoss \cite{LR16} developed a $(1+ε)$-approximation for the problem with running time $\exp\Big(\exp\Big(O\big(\frac{m^2}{ε^2}\log^2\log n\big)\Big)\Big)$, via the Sherali-Adams lift of the basic linear programming relaxation for the problem by $\exp\Big(O\big(\frac{m^2}{ε^2}\log^2\log n\big)\Big)$ levels. Garg \cite{Garg18} recently improved the number of levels to $\log ^{O(m^2/ε^2)}n$, and thus the running time to $\exp\big(\log ^{O(m^2/ε^2)}n\big)$, which is quasi-polynomial for constant $m$ and $ε$. In this paper we present an algorithm that achieves $(1+ε)$-approximation for the problem with running time $n^{O\left(\frac{m^4}{ε^3}\log^3\log n\right)}$, which is very close to a polynomial for constant $m$ and $ε$. Unlike the algorithms of Levey-Rothvoss and Garg, which are based on linear-programming hierarchy, our algorithm is purely combinatorial. For this problem, we show that the conditioning operations on the lifted LP solution can be replaced by making guesses about the optimum schedule.

preprint2016arXiv

Better Unrelated Machine Scheduling for Weighted Completion Time via Random Offsets from Non-Uniform Distributions

In this paper we consider the classic scheduling problem of minimizing total weighted completion time on unrelated machines when jobs have release times, i.e, $R | r_{ij} | \sum_j w_j C_j$ using the three-field notation. For this problem, a 2-approximation is known based on a novel convex programming (J. ACM 2001 by Skutella). It has been a long standing open problem if one can improve upon this 2-approximation (Open Problem 8 in J. of Sched. 1999 by Schuurman and Woeginger). We answer this question in the affirmative by giving a 1.8786-approximation. We achieve this via a surprisingly simple linear programming, but a novel rounding algorithm and analysis. A key ingredient of our algorithm is the use of random offsets sampled from non-uniform distributions. We also consider the preemptive version of the problem, i.e, $R | r_{ij},pmtn | \sum_j w_j C_j$. We again use the idea of sampling offsets from non-uniform distributions to give the first better than 2-approximation for this problem. This improvement also requires use of a configuration LP with variables for each job's complete schedules along with more careful analysis. For both non-preemptive and preemptive versions, we break the approximation barrier of 2 for the first time.

preprint2016arXiv

Constant Approximation Algorithm for Non-Uniform Capacitated Multi-Item Lot-Sizing via Strong Covering Inequalities

We study the non-uniform capacitated multi-item lot-sizing (\lotsizing) problem. In this problem, there is a set of demands over a planning horizon of $T$ time periods and all demands must be satisfied on time. We can place an order at the beginning of each period $s$, incurring an ordering cost $K_s$. The total quantity of all products ordered at time $s$ can not exceed a given capacity $C_s$. On the other hand, carrying inventory from time to time incurs inventory holding cost. The goal of the problem is to find a feasible solution that minimizes the sum of ordering and holding costs. Levi et al.\ (Levi, Lodi and Sviridenko, Mathmatics of Operations Research 33(2), 2008) gave a 2-approximation for the problem when the capacities $C_s$ are the same. In this paper, we extend their result to the case of non-uniform capacities. That is, we give a constant approximation algorithm for the capacitated multi-item lot-sizing problem with general capacities. The constant approximation is achieved by adding an exponentially large set of new covering inequalities to the natural facility-location type linear programming relaxation for the problem. Along the way of our algorithm, we reduce the \lotsizing problem to two generalizations of the classic knapsack covering problem. We give LP-based constant approximation algorithms for both generalizations, via the iterative rounding technique.

preprint2016arXiv

Constant Approximation for Capacitated $k$-Median with $(1 + ε)$-Capacity Violation

We study the Capacitated k-Median problem for which existing constant-factor approximation algorithms are all pseudo-approximations that violate either the capacities or the upper bound k on the number of open facilities. Using the natural LP relaxation for the problem, one can only hope to get the violation factor down to 2. Li [SODA'16] introduced a novel LP to go beyond the limit of 2 and gave a constant-factor approximation algorithm that opens $(1 + ε)k$ facilities. We use the configuration LP of Li [SODA'16] to give a constant-factor approximation for the Capacitated k-Median problem in a seemingly harder configuration: we violate only the capacities by $1 + ε$. This result settles the problem as far as pseudo-approximation algorithms are concerned.

preprint2016arXiv

Improved Approximation for Node-Disjoint Paths in Planar Graphs

We study the classical Node-Disjoint Paths (NDP) problem: given an $n$-vertex graph $G$ and a collection $M=\{(s_1,t_1),\ldots,(s_k,t_k)\}$ of pairs of vertices of $G$ called demand pairs, find a maximum-cardinality set of node-disjoint paths connecting the demand pairs. NDP is one of the most basic routing problems, that has been studied extensively. Despite this, there are still wide gaps in our understanding of its approximability: the best currently known upper bound of $O(\sqrt n)$ on its approximation ratio is achieved via a simple greedy algorithm, while the best current negative result shows that the problem does not have a better than $Ω(\log^{1/2-δ}n)$-approximation for any constant $δ$, under standard complexity assumptions. Even for planar graphs no better approximation algorithms are known, and to the best of our knowledge, the best negative bound is APX-hardness. Perhaps the biggest obstacle to obtaining better approximation algorithms for NDP is that most currently known approximation algorithms for this type of problems rely on the standard multicommodity flow relaxation, whose integrality gap is $Ω(\sqrt n)$ for NDP, even in planar graphs. In this paper, we break the barrier of $O(\sqrt n)$ on the approximability of the NDP problem in planar graphs and obtain an $\tilde O(n^{9/19})$-approximation. We introduce a new linear programming relaxation of the problem, and a number of new techniques, that we hope will be helpful in designing more powerful algorithms for this and related problems.

preprint2016arXiv

On the computational complexity of minimum-concave-cost flow in a two-dimensional grid

We study the minimum-concave-cost flow problem on a two-dimensional grid. We characterize the computational complexity of this problem based on the number of rows and columns of the grid, the number of different capacities over all arcs, and the location of sources and sinks. The concave cost over each arc is assumed to be evaluated through an oracle machine, i.e., the oracle machine returns the cost over an arc in a single computational step, given the flow value and the arc index. We propose an algorithm whose running time is polynomial in the number of columns of the grid, for the following cases: (1) the grid has a constant number of rows, a constant number of different capacities over all arcs, and sources and sinks in at most two rows; (2) the grid has two rows and a constant number of different capacities over all arcs connecting rows; (3) the grid has a constant number of rows and all sources in one row, with infinite capacity over each arc. These are presumably the most general polynomially solvable cases, since we show the problem becomes NP-hard when any condition in these cases is removed. Our results apply to abundant variants and generalizations of the dynamic lot sizing model, and answer several questions raised in serial supply chain optimization.

preprint2016arXiv

Tight Network Topology Dependent Bounds on Rounds of Communication

We prove tight network topology dependent bounds on the round complexity of computing well studied $k$-party functions such as set disjointness and element distinctness. Unlike the usual case in the CONGEST model in distributed computing, we fix the function and then vary the underlying network topology. This complements the recent such results on total communication that have received some attention. We also present some applications to distributed graph computation problems. Our main contribution is a proof technique that allows us to reduce the problem on a general graph topology to a relevant two-party communication complexity problem. However, unlike many previous works that also used the same high level strategy, we do not reason about a two-party communication problem that is induced by a cut in the graph. To `stitch' back the various lower bounds from the two party communication problems, we use the notion of timed graph that has seen prior use in network coding. Our reductions use some tools from Steiner tree packing and multi-commodity flow problems that have a delay constraint.

preprint2015arXiv

Approximating capacitated $k$-median with $(1+ε)k$ open facilities

In the capacitated $k$-median (\CKM) problem, we are given a set $F$ of facilities, each facility $i \in F$ with a capacity $u_i$, a set $C$ of clients, a metric $d$ over $F \cup C$ and an integer $k$. The goal is to open $k$ facilities in $F$ and connect the clients $C$ to the open facilities such that each facility $i$ is connected by at most $u_i$ clients, so as to minimize the total connection cost. In this paper, we give the first constant approximation for \CKM, that only violates the cardinality constraint by a factor of $1+ε$. This generalizes the result of [Li15], which only works for the uniform capacitated case. Moreover, the approximation ratio we obtain is $O\big(\frac{1}{ε^2}\log\frac1ε\big)$, which is an exponential improvement over the ratio of $\exp(O(1/ε^2))$ in [Li15]. The natural LP relaxation for the problem, which almost all previous algorithms for \CKM are based on, has unbounded integrality gap even if $(2-ε)k$ facilities can be opened. We introduce a novel configuration LP for the problem, that overcomes this integrality gap.

preprint2014arXiv

On $(1,ε)$-Restricted Assignment Makespan Minimization

Makespan minimization on unrelated machines is a classic problem in approximation algorithms. No polynomial time $(2-δ)$-approximation algorithm is known for the problem for constant $δ> 0$. This is true even for certain special cases, most notably the restricted assignment problem where each job has the same load on any machine but can be assigned to one from a specified subset. Recently in a breakthrough result, Svensson [Svensson, 2011] proved that the integrality gap of a certain configuration LP relaxation is upper bounded by $1.95$ for the restricted assignment problem; however, the rounding algorithm is not known to run in polynomial time. In this paper we consider the $(1,\varepsilon)$-restricted assignment problem where each job is either heavy ($p_j = 1$) or light ($p_j = \varepsilon$), for some parameter $\varepsilon > 0$. Our main result is a $(2-δ)$-approximate polynomial time algorithm for the $(1,ε)$-restricted assignment problem for a fixed constant $δ> 0$. Even for this special case, the best polynomial-time approximation factor known so far is 2. We obtain this result by rounding the configuration LP relaxation for this problem. A simple reduction from vertex cover shows that this special case remains NP-hard to approximate to within a factor better than 7/6.

preprint2014arXiv

On Uniform Capacitated $k$-Median Beyond the Natural LP Relaxation

In this paper, we study the uniform capacitated $k$-median problem. Obtaining a constant approximation algorithm for this problem is a notorious open problem; most previous works gave constant approximations by either violating the capacity constraints or the cardinality constraint. Notably, all these algorithms are based on the natural LP-relaxation for the problem. The LP-relaxation has unbounded integrality gap, even when we are allowed to violate the capacity constraints or the cardinality constraint by a factor of $2-ε$. Our result is an $\exp(O(1/ε^2))$-approximation algorithm for the problem that violates the cardinality constraint by a factor of $1+ε$. This is already beyond the capability of the natural LP relaxation, as it has unbounded integrality gap even if we are allowed to open $(2-ε)k$ facilities. Indeed, our result is based on a novel LP for this problem. The version as we described is the hard-capacitated version of the problem, as we can only open one facility at each location. This is as opposed to the soft-capacitated version, in which we are allowed to open more than one facilities at each location. We give a simple proof that in the uniform capacitated case, the soft-capacitated version and the hard-capacitated version are actually equivalent, up to a small constant loss in the approximation ratio.

preprint2013arXiv

A Constant Factor Approximation Algorithm for Fault-Tolerant k-Median

In this paper, we consider the fault-tolerant $k$-median problem and give the \emph{first} constant factor approximation algorithm for it. In the fault-tolerant generalization of classical $k$-median problem, each client $j$ needs to be assigned to at least $r_j \ge 1$ distinct open facilities. The service cost of $j$ is the sum of its distances to the $r_j$ facilities, and the $k$-median constraint restricts the number of open facilities to at most $k$. Previously, a constant factor was known only for the special case when all $r_j$s are the same, and a logarithmic approximation ratio for the general case. In addition, we present the first polynomial time algorithm for the fault-tolerant $k$-median problem on a path or a HST by showing that the corresponding LP always has an integral optimal solution. We also consider the fault-tolerant facility location problem, where the service cost of $j$ can be a weighted sum of its distance to the $r_j$ facilities. We give a simple constant factor approximation algorithm, generalizing several previous results which only work for nonincreasing weight vectors.

preprint2013arXiv

Traffic Congestion in Expanders, $(p,δ)$--Hyperbolic Spaces and Product of Trees

In this paper we define the notion of $(p,δ)$--Gromov hyperbolic space where we relax Gromov's {\it slimness} condition to allow that not all but a positive fraction of all triangles are $δ$--slim. Furthermore, we study maximum vertex congestion under geodesic routing and show that it scales as $Ω(p^2n^2/D_n^2)$ where $D_n$ is the diameter of the graph. We also construct a constant degree family of expanders with congestion $Θ(n^2)$ in contrast with random regular graphs that have congestion $O(n\log^{3}(n))$. Finally, we study traffic congestion on graphs defined as product of trees.

preprint2012arXiv

A Polylogarithimic Approximation Algorithm for Edge-Disjoint Paths with Congestion 2

In the Edge-Disjoint Paths with Congestion problem (EDPwC), we are given an undirected n-vertex graph G, a collection M={(s_1,t_1),...,(s_k,t_k)} of demand pairs and an integer c. The goal is to connect the maximum possible number of the demand pairs by paths, so that the maximum edge congestion - the number of paths sharing any edge - is bounded by c. When the maximum allowed congestion is c=1, this is the classical Edge-Disjoint Paths problem (EDP). The best current approximation algorithm for EDP achieves an $O(\sqrt n)$-approximation, by rounding the standard multi-commodity flow relaxation of the problem. This matches the $Ω(\sqrt n)$ lower bound on the integrality gap of this relaxation. We show an $O(poly log k)$-approximation algorithm for EDPwC with congestion c=2, by rounding the same multi-commodity flow relaxation. This gives the best possible congestion for a sub-polynomial approximation of EDPwC via this relaxation. Our results are also close to optimal in terms of the number of pairs routed, since EDPwC is known to be hard to approximate to within a factor of $\tildeΩ((\log n)^{1/(c+1)})$ for any constant congestion c. Prior to our work, the best approximation factor for EDPwC with congestion 2 was $\tilde O(n^{3/7})$, and the best algorithm achieving a polylogarithmic approximation required congestion 14.

preprint2012arXiv

Approximating $k$-Median via Pseudo-Approximation

We present a novel approximation algorithm for $k$-median that achieves an approximation guarantee of $1+\sqrt{3}+ε$, improving upon the decade-old ratio of $3+ε$. Our approach is based on two components, each of which, we believe, is of independent interest. First, we show that in order to give an $α$-approximation algorithm for $k$-median, it is sufficient to give a \emph{pseudo-approximation algorithm} that finds an $α$-approximate solution by opening $k+O(1)$ facilities. This is a rather surprising result as there exist instances for which opening $k+1$ facilities may lead to a significant smaller cost than if only $k$ facilities were opened. Second, we give such a pseudo-approximation algorithm with $α= 1+\sqrt{3}+ε$. Prior to our work, it was not even known whether opening $k + o(k)$ facilities would help improve the approximation ratio.

preprint2011arXiv

On the Integrality Gap of the Directed-Component Relaxation for Steiner Tree

In this note, we show that the integrality gap of the $k$-Directed-Component- Relaxation($k$-DCR) LP for the Steiner tree problem, introduced by Byrka, Grandoni, Rothvob and Sanita (STOC 2010), is at most $\ln(4)<1.39$. The proof is constructive: we can efficiently find a Steiner tree whose cost is at most $\ln(4)$ times the cost of the optimal fractional $k$-restricted Steiner tree given by the $k$-DCR LP.

preprint2010arXiv

Hawking Radiation of Fermionic Field and Anomaly in 2+1 Dimensional Black Holes

The method of anomaly cancellation to derive Hawking radiation initiated by Robinson and Wilczek is applied to 2+1 dimensional stationary black holes. Using the dimensional reduction technique, we find that the near-horizon physics for the fermionic field in the background of the general 2+1 dimensional stationary black hole can be approximated by an infinite collection of two component fermionic fields in 1+1 dimensional spacetime background coupled with dilaton field and U(1) gauge field. By restoring the gauge invariance and the general coordinate covariance for the reduced two dimensional theory, Hawking flux and temperature of black hole are obtained. We apply this method to two types of black holes in three dimensional spacetime, which are BTZ black hole in Einstein gravity and a rotating black hole in Bergshoeff-Hohm-Townsend (BHT) massive gravity.

preprint2010arXiv

On constant factor approximation for earth mover distance over doubling metrics

Given a metric space $(X,d_X)$, the earth mover distance between two distributions over $X$ is defined as the minimum cost of a bipartite matching between the two distributions. The doubling dimension of a metric $(X, d_X)$ is the smallest value $α$ such that every ball in $X$ can be covered by $2^α$ ball of half the radius. We study efficient algorithms for approximating earth mover distance over metrics with bounded doubling dimension. Given a metric $(X, d_X)$, with $|X| = n$, we can use $\tilde O(n^2)$ preprocessing time to create a data structure of size $\tilde O(n^{1 + \e})$, such that subsequently queried EMDs can be $O(α_X/\e)$-approximated in $\tilde O(n)$ time. We also show a weaker form of sketching scheme, which we call "encoding scheme". Given $(X, d_X)$, by using $\tilde O(n^2)$ preprocessing time, every subsequent distribution $μ$ over $X$ can be encoded into $F(μ)$ in $\tilde O(n^{1 + \e})$ time. Given $F(μ)$ and $F(ν)$, the EMD between $μ$ and $ν$ can be $O(α_X/\e)$-approximated in $\tilde O(n^\e)$ time.

preprint2010arXiv

Vertex Sparsifiers and Abstract Rounding Algorithms

The notion of vertex sparsification is introduced in \cite{M}, where it was shown that for any graph $G = (V, E)$ and a subset of $k$ terminals $K \subset V$, there is a polynomial time algorithm to construct a graph $H = (K, E_H)$ on just the terminal set so that simultaneously for all cuts $(A, K-A)$, the value of the minimum cut in $G$ separating $A$ from $K -A$ is approximately the same as the value of the corresponding cut in $H$. We give the first super-constant lower bounds for how well a cut-sparsifier $H$ can simultaneously approximate all minimum cuts in $G$. We prove a lower bound of $Ω(\log^{1/4} k)$ -- this is polynomially-related to the known upper bound of $O(\log k/\log \log k)$. This is an exponential improvement on the $Ω(\log \log k)$ bound given in \cite{LM} which in fact was for a stronger vertex sparsification guarantee, and did not apply to cut sparsifiers. Despite this negative result, we show that for many natural problems, we do not need to incur a multiplicative penalty for our reduction. We obtain optimal $O(\log k)$-competitive Steiner oblivious routing schemes, which generalize the results in \cite{R}. We also demonstrate that for a wide range of graph packing problems (which includes maximum concurrent flow, maximum multiflow and multicast routing, among others, as a special case), the integrality gap of the linear program is always at most $O(\log k)$ times the integrality gap restricted to trees. This result helps to explain the ubiquity of the $O(\log k)$ guarantees for such problems. Lastly, we use our ideas to give an efficient construction for vertex-sparsifiers that match the current best existential results -- this was previously open. Our algorithm makes novel use of Earth-mover constraints.

Shi Li

What is connected

Connect this record

See the researcher in context

Building this map preview

26 published item(s)

From Knowledge to Action: Outcomes of the 2025 Large Language Model (LLM) Hackathon for Applications in Materials Science and Chemistry

Nested Active-Time Scheduling

Consistent $k$-Median: Simpler, Better and Robust

Covering the Relational Join

Hierarchy-Based Algorithms for Minimizing Makespan under Precedence and Communication Constraints

On Approximating Degree-Bounded Network Design Problems

The Power of Recourse: Better Algorithms for Facility Location in Online and Dynamic Models

Topology Dependent Bounds For FAQs

Towards PTAS for Precedence Constrained Scheduling via Combinatorial Algorithms

Better Unrelated Machine Scheduling for Weighted Completion Time via Random Offsets from Non-Uniform Distributions

Constant Approximation Algorithm for Non-Uniform Capacitated Multi-Item Lot-Sizing via Strong Covering Inequalities

Constant Approximation for Capacitated $k$-Median with $(1 + ε)$-Capacity Violation

Improved Approximation for Node-Disjoint Paths in Planar Graphs

On the computational complexity of minimum-concave-cost flow in a two-dimensional grid

Tight Network Topology Dependent Bounds on Rounds of Communication

Approximating capacitated $k$-median with $(1+ε)k$ open facilities

On $(1,ε)$-Restricted Assignment Makespan Minimization

On Uniform Capacitated $k$-Median Beyond the Natural LP Relaxation

A Constant Factor Approximation Algorithm for Fault-Tolerant k-Median

Traffic Congestion in Expanders, $(p,δ)$--Hyperbolic Spaces and Product of Trees

A Polylogarithimic Approximation Algorithm for Edge-Disjoint Paths with Congestion 2

Approximating $k$-Median via Pseudo-Approximation

On the Integrality Gap of the Directed-Component Relaxation for Steiner Tree

Hawking Radiation of Fermionic Field and Anomaly in 2+1 Dimensional Black Holes

On constant factor approximation for earth mover distance over doubling metrics

Vertex Sparsifiers and Abstract Rounding Algorithms