Source author record

Ravishankar Krishnaswamy

Ravishankar Krishnaswamy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Computational Complexity Databases math.ST Performance Statistics Theory

Catalog footprint

What is connected

11works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Despite the broad range of algorithms for Approximate Nearest Neighbor Search, most empirical evaluations of algorithms have focused on smaller datasets, typically of 1 million points~\citep{Benchmark}. However, deploying recent advances in embedding based techniques for search, recommendation and ranking at scale require ANNS indices at billion, trillion or larger scale. Barring a few recent papers, there is limited consensus on which algorithms are effective at this scale vis-à-vis their hardware cost. This competition compares ANNS algorithms at billion-scale by hardware cost, accuracy and performance. We set up an open source evaluation framework and leaderboards for both standardized and specialized hardware. The competition involves three tracks. The standard hardware track T1 evaluates algorithms on an Azure VM with limited DRAM, often the bottleneck in serving billion-scale indices, where the embedding data can be hundreds of GigaBytes in size. It uses FAISS~\citep{Faiss17} as the baseline. The standard hardware track T2 additional allows inexpensive SSDs in addition to the limited DRAM and uses DiskANN~\citep{DiskANN19} as the baseline. The specialized hardware track T3 allows any hardware configuration, and again uses FAISS as the baseline. We compiled six diverse billion-scale datasets, four newly released for this competition, that span a variety of modalities, data types, dimensions, deep learning models, distance functions and sources. The outcome of the competition was ranked leaderboards of algorithms in each track based on recall at a query throughput threshold. Additionally, for track T3, separate leaderboards were created based on recall as well as cost-normalized and power-normalized query throughput.

preprint2016arXiv

Online and Dynamic Algorithms for Set Cover

In this paper, we study the set cover problem in the fully dynamic model. In this model, the set of active elements, i.e., those that must be covered at any given time, can change due to element arrivals and departures. The goal is to maintain an algorithmic solution that is competitive with respect to the current optimal solution. This model is popular in both the dynamic algorithms and online algorithms communities. The difference is in the restriction placed on the algorithm: in dynamic algorithms, the running time of the algorithm making updates (called update time) is bounded, while in online algorithms, the number of updates made to the solution (called recourse) is limited. In this paper we show the following results: In the update time setting, we obtain O(log n)-competitiveness with O(f log n) amortized update time, and O(f^3)-competitiveness with O(f^2) update time. The O(log n)-competitive algorithm is the first one to achieve a competitive ratio independent of f in this setting. In the recourse setting, we show a competitive ratio of O(min{log n,f}) with constant amortized recourse. Note that this matches the best offline bounds with just constant recourse, something that is impossible in the classical online model. Our results are based on two algorithmic frameworks in the fully-dynamic model that are inspired by the classic greedy and primal-dual algorithms for offline set cover. We show that both frameworks can be used for obtaining both recourse and update time bounds, thereby demonstrating algorithmic techniques common to these strands of research.

preprint2016arXiv

The Heterogeneous Capacitated $k$-Center Problem

In this paper we initiate the study of the heterogeneous capacitated $k$-center problem: given a metric space $X = (F \cup C, d)$, and a collection of capacities. The goal is to open each capacity at a unique facility location in $F$, and also to assign clients to facilities so that the number of clients assigned to any facility is at most the capacity installed; the objective is then to minimize the maximum distance between a client and its assigned facility. If all the capacities $c_i$'s are identical, the problem becomes the well-studied uniform capacitated $k$-center problem for which constant-factor approximations are known. The additional choice of determining which capacity should be installed in which location makes our problem considerably different from this problem, as well the non-uniform generalizations studied thus far in literature. In fact, one of our contributions is in relating the heterogeneous problem to special-cases of the classical Santa Claus problem. Using this connection, and by designing new algorithms for these special cases, we get the following results: (a)A quasi-polynomial time $O(\log n/ε)$-approximation where every capacity is violated by $1+\varepsilon$, (b) A polynomial time $O(1)$-approximation where every capacity is violated by an $O(\log n)$ factor. We get improved results for the {\em soft-capacities} version where we can place multiple facilities in the same location.

preprint2016arXiv

The Non-Uniform k-Center Problem

In this paper, we introduce and study the Non-Uniform k-Center problem (NUkC). Given a finite metric space $(X,d)$ and a collection of balls of radii $\{r_1\geq \cdots \ge r_k\}$, the NUkC problem is to find a placement of their centers on the metric space and find the minimum dilation $α$, such that the union of balls of radius $α\cdot r_i$ around the $i$th center covers all the points in $X$. This problem naturally arises as a min-max vehicle routing problem with fleets of different speeds. The NUkC problem generalizes the classic $k$-center problem when all the $k$ radii are the same (which can be assumed to be $1$ after scaling). It also generalizes the $k$-center with outliers (kCwO) problem when there are $k$ balls of radius $1$ and $\ell$ balls of radius $0$. There are $2$-approximation and $3$-approximation algorithms known for these problems respectively; the former is best possible unless P=NP and the latter remains unimproved for 15 years. We first observe that no $O(1)$-approximation is to the optimal dilation is possible unless P=NP, implying that the NUkC problem is more non-trivial than the above two problems. Our main algorithmic result is an $(O(1),O(1))$-bi-criteria approximation result: we give an $O(1)$-approximation to the optimal dilation, however, we may open $Θ(1)$ centers of each radii. Our techniques also allow us to prove a simple (uni-criteria), optimal $2$-approximation to the kCwO problem improving upon the long-standing $3$-factor. Our main technical contribution is a connection between the NUkC problem and the so-called firefighter problems on trees which have been studied recently in the TCS community.

preprint2015arXiv

Online Buy-at-Bulk Network Design

We present the first non-trivial online algorithms for the non-uniform, multicommodity buy-at-bulk (MC-BB) network design problem in undirected and directed graphs. Our competitive ratios qualitatively match the best known approximation factors for the corresponding offline problems. The main engine for our results is an online reduction theorem of MC-BB problems to their single-sink (SS-BB) counterparts. We use the concept of junction-tree solutions (Chekuri et al., FOCS 2006) that play an important role in solving the offline versions of the problem via a greedy subroutine -- an inherently offline procedure. Our main technical contribution is in designing an online algorithm using only the existence of good junction-trees to reduce an MC-BB instance to multiple SS-BB sub-instances. Along the way, we also give the first non-trivial online node-weighted/directed single-sink buy-at-bulk algorithms. In addition to the new results, our generic reduction also yields new proofs of recent results for the online node-weighted Steiner forest and online group Steiner forest problems.

preprint2015arXiv

Relax, no need to round: integrality of clustering formulations

We study exact recovery conditions for convex relaxations of point cloud clustering problems, focusing on two of the most common optimization problems for unsupervised clustering: $k$-means and $k$-median clustering. Motivations for focusing on convex relaxations are: (a) they come with a certificate of optimality, and (b) they are generic tools which are relatively parameter-free, not tailored to specific assumptions over the input. More precisely, we consider the distributional setting where there are $k$ clusters in $\mathbb{R}^m$ and data from each cluster consists of $n$ points sampled from a symmetric distribution within a ball of unit radius. We ask: what is the minimal separation distance between cluster centers needed for convex relaxations to exactly recover these $k$ clusters as the optimal integral solution? For the $k$-median linear programming relaxation we show a tight bound: exact recovery is obtained given arbitrarily small pairwise separation $ε> 0$ between the balls. In other words, the pairwise center separation is $Δ> 2+ε$. Under the same distributional model, the $k$-means LP relaxation fails to recover such clusters at separation as large as $Δ= 4$. Yet, if we enforce PSD constraints on the $k$-means LP, we get exact cluster recovery at center separation $Δ> 2\sqrt2(1+\sqrt{1/m})$. In contrast, common heuristics such as Lloyd's algorithm (a.k.a. the $k$-means algorithm) can fail to recover clusters in this setting; even with arbitrarily large cluster separation, k-means++ with overseeding by any constant factor fails with high probability at exact cluster recovery. To complement the theoretical analysis, we provide an experimental study of the recovery guarantees for these various methods, and discuss several open problems which these experiments suggest.

preprint2015arXiv

The Hardness of Approximation of Euclidean k-means

The Euclidean $k$-means problem is a classical problem that has been extensively studied in the theoretical computer science, machine learning and the computational geometry communities. In this problem, we are given a set of $n$ points in Euclidean space $R^d$, and the goal is to choose $k$ centers in $R^d$ so that the sum of squared distances of each point to its nearest center is minimized. The best approximation algorithms for this problem include a polynomial time constant factor approximation for general $k$ and a $(1+ε)$-approximation which runs in time $poly(n) 2^{O(k/ε)}$. At the other extreme, the only known computational complexity result for this problem is NP-hardness [ADHP'09]. The main difficulty in obtaining hardness results stems from the Euclidean nature of the problem, and the fact that any point in $R^d$ can be a potential center. This gap in understanding left open the intriguing possibility that the problem might admit a PTAS for all $k,d$. In this paper we provide the first hardness of approximation for the Euclidean $k$-means problem. Concretely, we show that there exists a constant $ε> 0$ such that it is NP-hard to approximate the $k$-means objective to within a factor of $(1+ε)$. We show this via an efficient reduction from the vertex cover problem on triangle-free graphs: given a triangle-free graph, the goal is to choose the fewest number of vertices which are incident on all the edges. Additionally, we give a proof that the current best hardness results for vertex cover can be carried over to triangle-free graphs. To show this we transform $G$, a known hard vertex cover instance, by taking a graph product with a suitably chosen graph $H$, and showing that the size of the (normalized) maximum independent set is almost exactly preserved in the product graph using a spectral analysis, which might be of independent interest.

preprint2011arXiv

Approximation Algorithms for Correlated Knapsacks and Non-Martingale Bandits

In the stochastic knapsack problem, we are given a knapsack of size B, and a set of jobs whose sizes and rewards are drawn from a known probability distribution. However, we know the actual size and reward only when the job completes. How should we schedule jobs to maximize the expected total reward? We know O(1)-approximations when we assume that (i) rewards and sizes are independent random variables, and (ii) we cannot prematurely cancel jobs. What can we say when either or both of these assumptions are changed? The stochastic knapsack problem is of interest in its own right, but techniques developed for it are applicable to other stochastic packing problems. Indeed, ideas for this problem have been useful for budgeted learning problems, where one is given several arms which evolve in a specified stochastic fashion with each pull, and the goal is to pull the arms a total of B times to maximize the reward obtained. Much recent work on this problem focus on the case when the evolution of the arms follows a martingale, i.e., when the expected reward from the future is the same as the reward at the current state. What can we say when the rewards do not form a martingale? In this paper, we give constant-factor approximation algorithms for the stochastic knapsack problem with correlations and/or cancellations, and also for budgeted learning problems where the martingale condition is not satisfied. Indeed, we can show that previously proposed LP relaxations have large integrality gaps. We propose new time-indexed LP relaxations, and convert the fractional solutions into distributions over strategies, and then use the LP values and the time ordering information from these strategies to devise a randomized adaptive scheduling algorithm. We hope our LP formulation and decomposition methods may provide a new way to address other correlated bandit problems with more general contexts.

preprint2011arXiv

Online Primal-Dual For Non-linear Optimization with Applications to Speed Scaling

We reinterpret some online greedy algorithms for a class of nonlinear "load-balancing" problems as solving a mathematical program online. For example, we consider the problem of assigning jobs to (unrelated) machines to minimize the sum of the alpha^{th}-powers of the loads plus assignment costs (the online Generalized Assignment Problem); or choosing paths to connect terminal pairs to minimize the alpha^{th}-powers of the edge loads (online routing with speed-scalable routers). We give analyses of these online algorithms using the dual of the primal program as a lower bound for the optimal algorithm, much in the spirit of online primal-dual results for linear problems. We then observe that a wide class of uni-processor speed scaling problems (with essentially arbitrary scheduling objectives) can be viewed as such load balancing problems with linear assignment costs. This connection gives new algorithms for problems that had resisted solutions using the dominant potential function approaches used in the speed scaling literature, as well as alternate, cleaner proofs for other known results.

preprint2011arXiv

Scalably Scheduling Power-Heterogeneous Processors

We show that a natural online algorithm for scheduling jobs on a heterogeneous multiprocessor, with arbitrary power functions, is scalable for the objective function of weighted flow plus energy.

preprint2009arXiv

Scheduling with Outliers

In classical scheduling problems, we are given jobs and machines, and have to schedule all the jobs to minimize some objective function. What if each job has a specified profit, and we are no longer required to process all jobs -- we can schedule any subset of jobs whose total profit is at least a (hard) target profit requirement, while still approximately minimizing the objective function? We refer to this class of problems as scheduling with outliers. This model was initiated by Charikar and Khuller (SODA'06) on the minimum max-response time in broadcast scheduling. We consider three other well-studied scheduling objectives: the generalized assignment problem, average weighted completion time, and average flow time, and provide LP-based approximation algorithms for them. For the minimum average flow time problem on identical machines, we give a logarithmic approximation algorithm for the case of unit profits based on rounding an LP relaxation; we also show a matching integrality gap. For the average weighted completion time problem on unrelated machines, we give a constant factor approximation. The algorithm is based on randomized rounding of the time-indexed LP relaxation strengthened by the knapsack-cover inequalities. For the generalized assignment problem with outliers, we give a simple reduction to GAP without outliers to obtain an algorithm whose makespan is within 3 times the optimum makespan, and whose cost is at most (1 + ε) times the optimal cost.

Ravishankar Krishnaswamy

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Results of the NeurIPS'21 Challenge on Billion-Scale Approximate Nearest Neighbor Search

Online and Dynamic Algorithms for Set Cover

The Heterogeneous Capacitated $k$-Center Problem

The Non-Uniform k-Center Problem

Online Buy-at-Bulk Network Design

Relax, no need to round: integrality of clustering formulations

The Hardness of Approximation of Euclidean k-means

Approximation Algorithms for Correlated Knapsacks and Non-Martingale Bandits

Online Primal-Dual For Non-linear Optimization with Applications to Speed Scaling

Scalably Scheduling Power-Heterogeneous Processors

Scheduling with Outliers