Researcher profile

Ronitt Rubinfeld

Ronitt Rubinfeld contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover

We present $O(\log\log n)$-round algorithms in the Massively Parallel Computation (MPC) model, with $\tilde{O}(n)$ memory per machine, that compute a maximal independent set, a $1+ε$ approximation of maximum matching, and a $2+ε$ approximation of minimum vertex cover, for any $n$-vertex graph and any constant $ε>0$. These improve the state of the art as follows: - Our MIS algorithm leads to a simple $O(\log\log Δ)$-round MIS algorithm in the Congested Clique model of distributed computing, which improves on the $\tilde{O}(\sqrt{\log Δ})$-round algorithm of Ghaffari [PODC'17]. - Our $O(\log\log n)$-round $(1+ε)$-approximate maximum matching algorithm simplifies or improves on the following prior work: $O(\log^2\log n)$-round $(1+ε)$-approximation algorithm of Czumaj et al. [STOC'18] and $O(\log\log n)$-round $(1+ε)$-approximation algorithm of Assadi et al. [SODA'19]. - Our $O(\log\log n)$-round $(2+ε)$-approximate minimum vertex cover algorithm improves on an $O(\log\log n)$-round $O(1)$-approximation of Assadi et al. [arXiv'17].

preprint2022arXiv

Massively Parallel Algorithms for Small Subgraph Counting

Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph $G=(V, E)$ with $n$ vertices, $m$ edges and $T$ triangles, our first result is an algorithm that outputs a $(1+\varepsilon)$-approximation to $T$, with asymptotically \emph{optimal round and total space complexity} provided any $S \geq \max{(\sqrt m, n^2/m)}$ space per machine and assuming $T=Ω(\sqrt{m/n})$. Our result gives a quadratic improvement on the bound on $T$ over previous works. We also provide a simple extension of our result to counting \emph{any} subgraph of $k$ size for constant $k \geq 1$. Our second result is an $O_{\varepsilon}(\log \log n)$-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the \emph{arboricity} $α$ of the input graph. We extend this result to exactly counting $k$-cliques for any constant $k$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$ can be implemented in the MPC model in total space.

preprint2022arXiv

Triangle and Four Cycle Counting with Predictions in Graph Streams

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

preprint2021arXiv

Local Access to Random Walks

For a graph $G$ on $n$ vertices, naively sampling the position of a random walk of at time $t$ requires work $Ω(t)$. We desire local access algorithms supporting $\text{position}(G,s,t)$ queries, which return the position of a random walk from some start vertex $s$ at time $t$, where the joint distribution of returned positions is $1/\text{poly}(n)$ close to the uniform distribution over such walks in $\ell_1$ distance. We first give an algorithm for local access to walks on undirected regular graphs with $\widetilde{O}(\frac{1}{1-λ}\sqrt{n})$ runtime per query, where $λ$ is the second-largest eigenvalue in absolute value. Since random $d$-regular graphs are expanders with high probability, this gives an $\widetilde{O}(\sqrt{n})$ algorithm for $G(n,d)$, which improves on the naive method for small numbers of queries. We then prove that no that algorithm with sub-constant error given probe access to random $d$-regular graphs can have runtime better than $Ω(\sqrt{n}/\log(n))$ per query in expectation, obtaining a nearly matching lower bound. We further show an $Ω(n^{1/4})$ runtime per query lower bound even with an oblivious adversary (i.e. when the query sequence is fixed in advance). We then show that for families of graphs with additional group theoretic structure, dramatically better results can be achieved. We give local access to walks on small-degree abelian Cayley graphs, including cycles and hypercubes, with runtime $\text{polylog}(n)$ per query. This also allows for efficient local access to walks on $\text{polylog}$ degree expanders. We extend our results to graphs constructed using the tensor product (giving local access to walks on degree $n^ε$ graphs for any $ε\in (0,1]$) and Cartesian product.

preprint2020arXiv

Monotone probability distributions over the Boolean cube can be learned with sublinear samples

A probability distribution over the Boolean cube is monotone if flipping the value of a coordinate from zero to one can only increase the probability of an element. Given samples of an unknown monotone distribution over the Boolean cube, we give (to our knowledge) the first algorithm that learns an approximation of the distribution in statistical distance using a number of samples that is sublinear in the domain. To do this, we develop a structural lemma describing monotone probability distributions. The structural lemma has further implications to the sample complexity of basic testing tasks for analyzing monotone probability distributions over the Boolean cube: We use it to give nontrivial upper bounds on the tasks of estimating the distance of a monotone distribution to uniform and of estimating the support size of a monotone distribution. In the setting of monotone probability distributions over the Boolean cube, our algorithms are the first to have sample complexity lower than known lower bounds for the same testing tasks on arbitrary (not necessarily monotone) probability distributions. One further consequence of our learning algorithm is an improved sample complexity for the task of testing whether a distribution on the Boolean cube is monotone.

preprint2020arXiv

Online Page Migration with ML Advice

We consider online algorithms for the {\em page migration problem} that use predictions, potentially imperfect, to improve their performance. The best known online algorithms for this problem, due to Westbrook'94 and Bienkowski et al'17, have competitive ratios strictly bounded away from 1. In contrast, we show that if the algorithm is given a prediction of the input sequence, then it can achieve a competitive ratio that tends to $1$ as the prediction error rate tends to $0$. Specifically, the competitive ratio is equal to $1+O(q)$, where $q$ is the prediction error rate. We also design a ``fallback option'' that ensures that the competitive ratio of the algorithm for {\em any} input sequence is at most $O(1/q)$. Our result adds to the recent body of work that uses machine learning to improve the performance of ``classic'' algorithms.

preprint2020arXiv

Rapid Approximate Aggregation with Distribution-Sensitive Interval Guarantees

Aggregating data is fundamental to data analytics, data exploration, and OLAP. Approximate query processing (AQP) techniques are often used to accelerate computation of aggregates using samples, for which confidence intervals (CIs) are widely used to quantify the associated error. CIs used in practice fall into two categories: techniques that are tight but not correct, i.e., they yield tight intervals but only offer asymptotic guarantees, making them unreliable, or techniques that are correct but not tight, i.e., they offer rigorous guarantees, but are overly conservative, leading to confidence intervals that are too loose to be useful. In this paper, we develop a CI technique that is both correct and tighter than traditional approaches. Starting from conservative CIs, we identify two issues they often face: pessimistic mass allocation (PMA) and phantom outlier sensitivity (PHOS). By developing a novel range-trimming technique for eliminating PHOS and pairing it with known CI techniques without PMA, we develop a technique for computing CIs with strong guarantees that requires fewer samples for the same width. We implement our techniques underneath a sampling-optimized in-memory column store and show how to accelerate queries involving aggregates on a real dataset with speedups of up to 124x over traditional AQP-with-guarantees and more than 1000x over exact methods.