Source author record

Warren Schudy

Warren Schudy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Discrete Mathematics math.PR Machine Learning math.OC

Catalog footprint

What is connected

7works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Stars: Tera-Scale Graph Building for Clustering and Graph Learning

A fundamental procedure in the analysis of massive datasets is the construction of similarity graphs. Such graphs play a key role for many downstream tasks, including clustering, classification, graph learning, and nearest neighbor search. For these tasks, it is critical to build graphs which are sparse yet still representative of the underlying data. The benefits of sparsity are twofold: firstly, constructing dense graphs is infeasible in practice for large datasets, and secondly, the runtime of downstream tasks is directly influenced by the sparsity of the similarity graph. In this work, we present $\textit{Stars}$: a highly scalable method for building extremely sparse graphs via two-hop spanners, which are graphs where similar points are connected by a path of length at most two. Stars can construct two-hop spanners with significantly fewer similarity comparisons, which are a major bottleneck for learning based models where comparisons are expensive to evaluate. Theoretically, we demonstrate that Stars builds a graph in nearly-linear time, where approximate nearest neighbors are contained within two-hop neighborhoods. In practice, we have deployed Stars for multiple data sets allowing for graph building at the $\textit{Tera-Scale}$, i.e., for graphs with tens of trillions of edges. We evaluate the performance of Stars for clustering and graph learning, and demonstrate 10~1000-fold improvements in pairwise similarity comparisons compared to different baselines, and 2~10-fold improvement in running time without quality loss.

preprint2022arXiv

Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient

We present PDLP, a practical first-order method for linear programming (LP) that can solve to the high levels of accuracy that are expected in traditional LP applications. In addition, it can scale to very large problems because its core operation is matrix-vector multiplications. PDLP is derived by applying the primal-dual hybrid gradient (PDHG) method, popularized by Chambolle and Pock (2011), to a saddle-point formulation of LP. PDLP enhances PDHG for LP by combining several new techniques with older tricks from the literature; the enhancements include diagonal preconditioning, presolving, adaptive step sizes, and adaptive restarting. PDLP improves the state of the art for first-order methods applied to LP. We compare PDLP with SCS, an ADMM-based solver, on a set of 383 LP instances derived from MIPLIB 2017. With a target of $10^{-8}$ relative accuracy and 1 hour time limit, PDLP achieves a 6.3x reduction in the geometric mean of solve times and a 4.6x reduction in the number of instances unsolved (from 227 to 49). Furthermore, we highlight standard benchmark instances and a large-scale application (PageRank) where our open-source prototype of PDLP, written in Julia, outperforms a commercial LP solver.

preprint2012arXiv

Bernstein-like Concentration and Moment Inequalities for Polynomials of Independent Random Variables: Multilinear Case

We show that the probability that a multilinear polynomial $f$ of independent random variables exceeds its mean by $λ$ is at most $e^{-λ^2 / (R^q Var(f))}$ for sufficiently small $λ$, where $R$ is an absolute constant. This matches (up to constants in the exponent) what one would expect from the central limit theorem. Our methods handle a variety of types of random variables including Gaussian, Boolean, exponential, and Poisson. Previous work by Kim-Vu and Schudy-Sviridenko gave bounds of the same form that involved less natural parameters in place of the variance.

preprint2012arXiv

Concentration and Moment Inequalities for Polynomials of Independent Random Variables

In this work we design a general method for proving moment inequalities for polynomials of independent random variables. Our method works for a wide range of random variables including Gaussian, Boolean, exponential, Poisson and many others. We apply our method to derive general concentration inequalities for polynomials of independent random variables. We show that our method implies concentration inequalities for some previously open problems, e.g. permanent of a random symmetric matrices. We show that our concentration inequality is stronger than the well-known concentration inequality due to Kim and Vu. The main advantage of our method in comparison with the existing ones is a wide range of random variables we can handle and bounds for previously intractable regimes of high degree polynomials and small expectations. On the negative side we show that even for boolean random variables each term in our concentration inequality is tight.

preprint2010arXiv

Approximation Schemes for the Betweenness Problem in Tournaments and Related Ranking Problems

We design the first polynomial time approximation schemes (PTASs) for the Minimum Betweenness problem in tournaments and some related higher arity ranking problems. This settles the approximation status of the Betweenness problem in tournaments along with other ranking problems which were open for some time now. The results depend on a new technique of dealing with fragile ranking constraints and could be of independent interest.

preprint2010arXiv

Faster Algorithms for Feedback Arc Set Tournament, Kemeny Rank Aggregation and Betweenness Tournament

We study fixed parameter algorithms for three problems: Kemeny rank aggregation, feedback arc set tournament, and betweenness tournament. For Kemeny rank aggregation we give an algorithm with runtime O*(2^O(sqrt{OPT})), where n is the number of candidates, OPT is the cost of the optimal ranking, and O* hides polynomial factors. This is a dramatic improvement on the previously best known runtime of O*(2^O(OPT)). For feedback arc set tournament we give an algorithm with runtime O*(2^O(sqrt{OPT})), an improvement on the previously best known O*(OPT^O(sqrt{OPT})) (Alon, Lokshtanov and Saurabh 2009). For betweenness tournament we give an algorithm with runtime O*(2^O(sqrt{OPT/n})), where n is the number of vertices and OPT is the optimal cost. This improves on the previously known O*(OPT^O(OPT^{1/3}))$ (Saurabh 2009), especially when OPT is small. Unusually we can solve instances with OPT as large as n (log n)^2 in polynomial time!

preprint2010arXiv

Online Correlation Clustering

We study the online clustering problem where data items arrive in an online fashion. The algorithm maintains a clustering of data items into similarity classes. Upon arrival of v, the relation between v and previously arrived items is revealed, so that for each u we are told whether v is similar to u. The algorithm can create a new cluster for v and merge existing clusters. When the objective is to minimize disagreements between the clustering and the input, we prove that a natural greedy algorithm is O(n)-competitive, and this is optimal. When the objective is to maximize agreements between the clustering and the input, we prove that the greedy algorithm is .5-competitive; that no online algorithm can be better than .834-competitive; we prove that it is possible to get better than 1/2, by exhibiting a randomized algorithm with competitive ratio .5+c for a small positive fixed constant c.

Warren Schudy

What is connected

Connect this record

See the researcher in context

Building this map preview

7 published item(s)

Stars: Tera-Scale Graph Building for Clustering and Graph Learning

Practical Large-Scale Linear Programming using Primal-Dual Hybrid Gradient

Bernstein-like Concentration and Moment Inequalities for Polynomials of Independent Random Variables: Multilinear Case

Concentration and Moment Inequalities for Polynomials of Independent Random Variables

Approximation Schemes for the Betweenness Problem in Tournaments and Related Ranking Problems

Faster Algorithms for Feedback Arc Set Tournament, Kemeny Rank Aggregation and Betweenness Tournament

Online Correlation Clustering