Source author record

Talya Eden

Talya Eden appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Computational Complexity Discrete Mathematics Distributed, Parallel, and Cluster Computing Machine Learning math.ST Statistics Theory

Catalog footprint

What is connected

5works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Bias Reduction for Sum Estimation

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution $P$, and that when an element $x$ is sampled, the probability $P$ of sampling $x$ is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution $Q$ that is sufficiently close to $P$. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution $Q$ sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of $N$ real values $x_1, \ldots, x_N$. This problem is well-studied in the statistical literature in the case $P = Q$, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution $P$, values are sampled from a distribution $Q$ that is pointwise close to $P$. For every positive integer $k$ we define an estimator $ζ_k$ for $μ= \sum_i x_i$ whose bias is proportional to $γ^k$ (where our $ζ_1$ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if $Q$ is pointwise $γ$-close to uniform and all $x_i \in \{0, 1\}$, for any $ε> 0$, we can estimate $μ$ to within additive error $εN$ using $m = Θ({N^{1-\frac{1}{k}} / ε^{2/k}})$ samples, where $k = \left\lceil (\log ε)/(\log γ)\right\rceil$. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter $ε$: for some values of $ε$, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

preprint2022arXiv

Massively Parallel Algorithms for Small Subgraph Counting

Over the last two decades, frameworks for distributed-memory parallel computation, such as MapReduce, Hadoop, Spark and Dryad, have gained significant popularity with the growing prevalence of large network datasets. The Massively Parallel Computation (MPC) model is the de-facto standard for studying graph algorithms in these frameworks theoretically. Subgraph counting is one such fundamental problem in analyzing massive graphs, with the main algorithmic challenges centering on designing methods which are both scalable and accurate. Given a graph $G=(V, E)$ with $n$ vertices, $m$ edges and $T$ triangles, our first result is an algorithm that outputs a $(1+\varepsilon)$-approximation to $T$, with asymptotically \emph{optimal round and total space complexity} provided any $S \geq \max{(\sqrt m, n^2/m)}$ space per machine and assuming $T=Ω(\sqrt{m/n})$. Our result gives a quadratic improvement on the bound on $T$ over previous works. We also provide a simple extension of our result to counting \emph{any} subgraph of $k$ size for constant $k \geq 1$. Our second result is an $O_{\varepsilon}(\log \log n)$-round algorithm for exactly counting the number of triangles, whose total space usage is parametrized by the \emph{arboricity} $α$ of the input graph. We extend this result to exactly counting $k$-cliques for any constant $k$. Finally, we prove that a recent result of Bera, Pashanasangi and Seshadhri (ITCS 2020) for exactly counting all subgraphs of size at most $5$ can be implemented in the MPC model in total space.

preprint2022arXiv

Triangle and Four Cycle Counting with Predictions in Graph Streams

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

preprint2016arXiv

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism

A function $f\colon \{-1,1\}^n \to \{-1,1\}$ is a $k$-junta if it depends on at most $k$ of its variables. We consider the problem of tolerant testing of $k$-juntas, where the testing algorithm must accept any function that is $ε$-close to some $k$-junta and reject any function that is $ε'$-far from every $k'$-junta for some $ε'= O(ε)$ and $k' = O(k)$. Our first result is an algorithm that solves this problem with query complexity polynomial in $k$ and $1/ε$. This result is obtained via a new polynomial-time approximation algorithm for submodular function minimization (SFM) under large cardinality constraints, which holds even when only given an approximate oracle access to the function. Our second result considers the case where $k'=k$. We show how to obtain a smooth tradeoff between the amount of tolerance and the query complexity in this setting. Specifically, we design an algorithm that given $ρ\in(0,1/2)$ accepts any function that is $\frac{ερ}{16}$-close to some $k$-junta and rejects any function that is $ε$-far from every $k$-junta. The query complexity of the algorithm is $O\big( \frac{k\log k}{ερ(1-ρ)^k} \big)$. Finally, we show how to apply the second result to the problem of tolerant isomorphism testing between two unknown Boolean functions $f$ and $g$. We give an algorithm for this problem whose query complexity only depends on the (unknown) smallest $k$ such that either $f$ or $g$ is close to being a $k$-junta.

preprint2015arXiv

Approximately Counting Triangles in Sublinear Time

We consider the problem of estimating the number of triangles in a graph. This problem has been extensively studied in both theory and practice, but all existing algorithms read the entire graph. In this work we design a {\em sublinear-time\/} algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph. The allowed queries are degree queries, vertex-pair queries and neighbor queries. We show that for any given approximation parameter $0<ε<1$, the algorithm provides an estimate $\widehat{t}$ such that with high constant probability, $(1-ε)\cdot t< \widehat{t}<(1+ε)\cdot t$, where $t$ is the number of triangles in the graph $G$. The expected query complexity of the algorithm is $\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)\cdot {\rm poly}(\log n, 1/ε)$, where $n$ is the number of vertices in the graph and $m$ is the number of edges, and the expected running time is $\!\left(\frac{n}{t^{1/3}} + \frac{m^{3/2}}{t}\right)\cdot {\rm poly}(\log n, 1/ε)$. We also prove that $Ω\!\left(\frac{n}{t^{1/3}} + \min\left\{m, \frac{m^{3/2}}{t}\right\}\right)$ queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in $n$ (and the dependence on $1/ε$).

Talya Eden

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Bias Reduction for Sum Estimation

Massively Parallel Algorithms for Small Subgraph Counting

Triangle and Four Cycle Counting with Predictions in Graph Streams

Tolerant Junta Testing and the Connection to Submodular Optimization and Function Isomorphism

Approximately Counting Triangles in Sublinear Time