Researcher profile

Xiaorui Sun

Xiaorui Sun contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
2topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2022arXiv

Fully Dynamic s-t Edge Connectivity in Subpolynomial Time

We present a deterministic fully dynamic algorithm to answer $c$-edge connectivity queries on pairs of vertices in $n^{o(1)}$ worst case update and query time for any positive integer $c = (\log n)^{o(1)}$ for a graph with $n$ vertices. Previously, only polylogarithmic and $O(\sqrt{n})$ worst case update time fully dynamic algorithms were known for answering $1$, $2$ and $3$-edge connectivity queries respectively [Henzinger and King 1995, Frederikson 1997, Galil and Italiano 1991]. Our result extends the $c$-edge connectivity vertex sparsifier [Chalermsook et al. 2021] to a multi-level sparsification framework. As our main technical contribution, we present a novel update algorithm for the multi-level $c$-edge connectivity vertex sparsifier with subpolynomial update time.

preprint2022arXiv

Minor Sparsifiers and the Distributed Laplacian Paradigm

We study distributed algorithms built around minor-based vertex sparsifiers, and give the first algorithm in the CONGEST model for solving linear systems in graph Laplacian matrices to high accuracy. Our Laplacian solver has a round complexity of $O(n^{o(1)}(\sqrt{n}+D))$, and thus almost matches the lower bound of $\widetildeΩ(\sqrt{n}+D)$, where $n$ is the number of nodes in the network and $D$ is its diameter. We show that our distributed solver yields new sublinear round algorithms for several cornerstone problems in combinatorial optimization. This is achieved by leveraging the powerful algorithmic framework of Interior Point Methods (IPMs) and the Laplacian paradigm in the context of distributed graph algorithms, which entails numerically solving optimization problems on graphs via a series of Laplacian systems. Problems that benefit from our distributed algorithmic paradigm include exact mincost flow, negative weight shortest paths, maxflow, and bipartite matching on sparse directed graphs. For the maxflow problem, this is the first exact distributed algorithm that applies to directed graphs, while the previous work by [Ghaffari et al. SICOMP'18] considered the approximate setting and works only for undirected graphs. For the mincost flow and the negative weight shortest path problems, our results constitute the first exact distributed algorithms running in a sublinear number of rounds. Given that the hybrid between IPMs and the Laplacian paradigm has proven useful for tackling numerous optimization problems in the centralized setting, we believe that our distributed solver will find future applications.

preprint2020arXiv

Approximating LCS in Linear Time: Beating the $\sqrt{n}$ Barrier

Longest common subsequence (LCS) is one of the most fundamental problems in combinatorial optimization. Apart from theoretical importance, LCS has enormous applications in bioinformatics, revision control systems, and data comparison programs. Although a simple dynamic program computes LCS in quadratic time, it has been recently proven that the problem admits a conditional lower bound and may not be solved in truly subquadratic time. In addition to this, LCS is notoriously hard with respect to approximation algorithms. Apart from a trivial sampling technique that obtains a $n^{x}$ approximation solution in time $O(n^{2-2x})$ nothing else is known for LCS. This is in sharp contrast to its dual problem edit distance for which several linear time solutions are obtained in the past two decades.

preprint2020arXiv

Fast Noise Removal for $k$-Means Clustering

This paper considers $k$-means clustering in the presence of noise. It is known that $k$-means clustering is highly sensitive to noise, and thus noise should be removed to obtain a quality solution. A popular formulation of this problem is called $k$-means clustering with outliers. The goal of $k$-means clustering with outliers is to discard up to a specified number $z$ of points as noise/outliers and then find a $k$-means solution on the remaining data. The problem has received significant attention, yet current algorithms with theoretical guarantees suffer from either high running time or inherent loss in the solution quality. The main contribution of this paper is two-fold. Firstly, we develop a simple greedy algorithm that has provably strong worst case guarantees. The greedy algorithm adds a simple preprocessing step to remove noise, which can be combined with any $k$-means clustering algorithm. This algorithm gives the first pseudo-approximation-preserving reduction from $k$-means with outliers to $k$-means without outliers. Secondly, we show how to construct a coreset of size $O(k \log n)$. When combined with our greedy algorithm, we obtain a scalable, near linear time algorithm. The theoretical contributions are verified experimentally by demonstrating that the algorithm quickly removes noise and obtains a high-quality clustering.

preprint2020arXiv

On the Hardness of Massively Parallel Computation

We investigate whether there are inherent limits of parallelization in the (randomized) massively parallel computation (MPC) model by comparing it with the (sequential) RAM model. As our main result, we show the existence of hard functions that are essentially not parallelizable in the MPC model. Based on the widely-used random oracle methodology in cryptography with a cryptographic hash function $h:\{0,1\}^n \rightarrow \{0,1\}^n$ computable in time $t_h$, we show that there exists a function that can be computed in time $O(T\cdot t_h)$ and space $S$ by a RAM algorithm, but any MPC algorithm with local memory size $s < S/c$ for some $c>1$ requires at least $\tildeΩ(T)$ rounds to compute the function, even in the average case, for a wide range of parameters $n \leq S \leq T \leq 2^{n^{1/4}}$. Our result is almost optimal in the sense that by taking $T$ to be much larger than $t_h$, \textit{e.g.}, $T$ to be sub-exponential in $t_h$, to compute the function, the round complexity of any MPC algorithm with small local memory size is asymptotically the same (up to a polylogarithmic factor) as the time complexity of the RAM algorithm. Our result is obtained by adapting the so-called compression argument from the data structure lower bounds and cryptography literature to the context of massively parallel computation.