Source author record

Shyam Narayanan

Shyam Narayanan appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Data Structures and Algorithms Machine Learning Cryptography and Security math.ST Statistics Theory Discrete Mathematics Computational Complexity Computational Geometry Emerging Technologies Information Theory math.CO math.IT math.PR

Catalog footprint

What is connected

9works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

We provide optimal lower bounds for two well-known parameter estimation (also known as statistical estimation) tasks in high dimensions with approximate differential privacy. First, we prove that for any $α\le O(1)$, estimating the covariance of a Gaussian up to spectral error $α$ requires $\tildeΩ\left(\frac{d^{3/2}}{α\varepsilon} + \frac{d}{α^2}\right)$ samples, which is tight up to logarithmic factors. This result improves over previous work which established this for $α\le O\left(\frac{1}{\sqrt{d}}\right)$, and is also simpler than previous work. Next, we prove that estimating the mean of a heavy-tailed distribution with bounded $k$th moments requires $\tildeΩ\left(\frac{d}{α^{k/(k-1)} \varepsilon} + \frac{d}{α^2}\right)$ samples. Previous work for this problem was only able to establish this lower bound against pure differential privacy, or in the special case of $k = 2$. Our techniques follow the method of fingerprinting and are generally quite simple. Our lower bound for heavy-tailed estimation is based on a black-box reduction from privately estimating identity-covariance Gaussians. Our lower bound for covariance estimation utilizes a Bayesian approach to show that, under an Inverse Wishart prior distribution for the covariance matrix, no private estimator can be accurate even in expectation, without sufficiently many samples.

preprint2022arXiv

All-Pairs Shortest Path Distances with Differential Privacy: Improved Algorithms for Bounded and Unbounded Weights

We revisit the problem of privately releasing the all-pairs shortest path distances of a weighted undirected graph up to low additive error, which was first studied by Sealfon [Sea16]. In this paper, we improve significantly on Sealfon's results, both for arbitrary weighted graphs and for bounded-weight graphs on $n$ nodes. Specifically, we provide an approximate-DP algorithm that outputs all-pairs shortest path distances up to maximum additive error $\tilde{O}(\sqrt{n})$, and a pure-DP algorithm that outputs all pairs shortest path distances up to maximum additive error $\tilde{O}(n^{2/3})$ (where we ignore dependencies on $\varepsilon, δ$). This improves over the previous best result of $\tilde{O}(n)$ additive error for both approximate-DP and pure-DP [Sea16], and partially resolves an open question posed by Sealfon [Sea16, Sea20]. We also show that if the graph is promised to have reasonably bounded weights, one can improve the error further to roughly $n^{\sqrt{2}-1+o(1)}$ in the approximate-DP setting and roughly $n^{(\sqrt{17}-3)/2 + o(1)}$ in the pure-DP setting. Previously, it was only known how to obtain $\tilde{O}(n^{1/2})$ additive error in the approximate-DP setting and $\tilde{O}(n^{2/3})$ additive error in the pure-DP setting for bounded-weight graphs [Sea16].

preprint2022arXiv

Bias Reduction for Sum Estimation

In classical statistics and distribution testing, it is often assumed that elements can be sampled from some distribution $P$, and that when an element $x$ is sampled, the probability $P$ of sampling $x$ is also known. Recent work in distribution testing has shown that many algorithms are robust in the sense that they still produce correct output if the elements are drawn from any distribution $Q$ that is sufficiently close to $P$. This phenomenon raises interesting questions: under what conditions is a "noisy" distribution $Q$ sufficient, and what is the algorithmic cost of coping with this noise? We investigate these questions for the problem of estimating the sum of a multiset of $N$ real values $x_1, \ldots, x_N$. This problem is well-studied in the statistical literature in the case $P = Q$, where the Hansen-Hurwitz estimator is frequently used. We assume that for some known distribution $P$, values are sampled from a distribution $Q$ that is pointwise close to $P$. For every positive integer $k$ we define an estimator $ζ_k$ for $μ= \sum_i x_i$ whose bias is proportional to $γ^k$ (where our $ζ_1$ reduces to the classical Hansen-Hurwitz estimator). As a special case, we show that if $Q$ is pointwise $γ$-close to uniform and all $x_i \in \{0, 1\}$, for any $ε> 0$, we can estimate $μ$ to within additive error $εN$ using $m = Θ({N^{1-\frac{1}{k}} / ε^{2/k}})$ samples, where $k = \left\lceil (\log ε)/(\log γ)\right\rceil$. We show that this sample complexity is essentially optimal. Our bounds show that the sample complexity need not vary uniformly with the desired error parameter $ε$: for some values of $ε$, perturbations in its value have no asymptotic effect on the sample complexity, while for other values, any decrease in its value results in an asymptotically larger sample complexity.

preprint2022arXiv

Bounds on expected propagation time of probabilistic zero forcing

Probabilistic zero forcing is a coloring game played on a graph where the goal is to color every vertex blue starting with an initial blue vertex set. As long as the graph is connected, if at least one vertex is blue then eventually all of the vertices will be colored blue. The most studied parameter in probabilistic zero forcing is the expected propagation time starting from a given vertex of $G.$ In this paper we improve on upper bounds for the expected propagation time by Geneson and Hogben and Chan et al. in terms of a graph's order and radius. In particular, for a connected graph $G$ of order $n$ and radius $r,$ we prove the bound $\text{ept}(G) = O(r\log(n/r)).$ We also show using Doob's Optional Stopping Theorem and a combinatorial object known as a cornerstone that $\text{ept}(G) \le n/2 + O(\log n).$ Finally, we derive an explicit lower bound $\text{ept}(G)\ge \log_2 \log_2 n.$

preprint2022arXiv

Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets

Motivated by data analysis and machine learning applications, we consider the popular high-dimensional Euclidean $k$-median and $k$-means problems. We propose a new primal-dual algorithm, inspired by the classic algorithm of Jain and Vazirani and the recent algorithm of Ahmadian, Norouzi-Fard, Svensson, and Ward. Our algorithm achieves an approximation ratio of $2.406$ and $5.912$ for Euclidean $k$-median and $k$-means, respectively, improving upon the 2.633 approximation ratio of Ahmadian et al. and the 6.1291 approximation ratio of Grandoni, Ostrovsky, Rabani, Schulman, and Venkat. Our techniques involve a much stronger exploitation of the Euclidean metric than previous work on Euclidean clustering. In addition, we introduce a new method of removing excess centers using a variant of independent sets over graphs that we dub a "nested quasi-independent set". In turn, this technique may be of interest for other optimization problems in Euclidean and $\ell_p$ metric spaces.

preprint2022arXiv

Stochastic dendrites enable online learning in mixed-signal neuromorphic processing systems

The stringent memory and power constraints required in edge-computing sensory-processing applications have made event-driven neuromorphic systems a promising technology. On-chip online learning provides such systems the ability to learn the statistics of the incoming data and to adapt to their changes. Implementing online learning on event driven-neuromorphic systems requires (i) a spike-based learning algorithm that calculates the weight updates using only local information from streaming data, (ii) mapping these weight updates onto limited bit precision memory and (iii) doing so in a robust manner that does not lead to unnecessary updates as the system is reaching its optimal output. Recent neuroscience studies have shown how dendritic compartments of cortical neurons can solve these problems in biological neural networks. Inspired by these studies we propose spike-based learning circuits to implement stochastic dendritic online learning. The circuits are embedded in a prototype spiking neural network fabricated using a 180nm process. Following an algorithm-circuits co-design approach we present circuits and behavioral simulation results that demonstrate the learning rule features. We validate the proposed method using behavioral simulations of a single-layer network with 4-bit precision weights applied to the MNIST benchmark and demonstrating results that reach accuracy levels above 85%.

preprint2022arXiv

Tight and Robust Private Mean Estimation with Few Users

In this work, we study high-dimensional mean estimation under user-level differential privacy, and design an $(\varepsilon,δ)$-differentially private mechanism using as few users as possible. In particular, we provide a nearly optimal trade-off between the number of users and the number of samples per user required for private mean estimation, even when the number of users is as low as $O(\frac{1}{\varepsilon}\log\frac{1}δ)$. Interestingly, this bound on the number of \emph{users} is independent of the dimension (though the number of \emph{samples per user} is allowed to depend polynomially on the dimension), unlike the previous work that requires the number of users to depend polynomially on the dimension. This resolves a problem first proposed by Amin et al. Moreover, our mechanism is robust against corruptions in up to $49\%$ of the users. Finally, our results also apply to optimal algorithms for privately learning discrete distributions with few users, answering a question of Liu et al., and a broader range of problems such as stochastic convex optimization and a variant of stochastic gradient descent via a reduction to differentially private mean estimation.

preprint2022arXiv

Triangle and Four Cycle Counting with Predictions in Graph Streams

We propose data-driven one-pass streaming algorithms for estimating the number of triangles and four cycles, two fundamental problems in graph analytics that are widely studied in the graph data stream literature. Recently, (Hsu 2018) and (Jiang 2020) applied machine learning techniques in other data stream problems, using a trained oracle that can predict certain properties of the stream elements to improve on prior "classical" algorithms that did not use oracles. In this paper, we explore the power of a "heavy edge" oracle in multiple graph edge streaming models. In the adjacency list model, we present a one-pass triangle counting algorithm improving upon the previous space upper bounds without such an oracle. In the arbitrary order model, we present algorithms for both triangle and four cycle estimation with fewer passes and the same space complexity as in previous algorithms, and we show several of these bounds are optimal. We analyze our algorithms under several noise models, showing that the algorithms perform well even when the oracle errs. Our methodology expands upon prior work on "classical" streaming algorithms, as previous multi-pass and random order streaming algorithms can be seen as special cases of our algorithms, where the first pass or random order was used to implement the heavy edge oracle. Lastly, our experiments demonstrate advantages of the proposed method compared to state-of-the-art streaming algorithms.

preprint2020arXiv

3-wise Independent Random Walks can be Slightly Unbounded

Recently, many streaming algorithms have utilized generalizations of the fact that the expected maximum distance of any $4$-wise independent random walk on a line over $n$ steps is $O(\sqrt{n})$. In this paper, we show that $4$-wise independence is required for all of these algorithms, by constructing a $3$-wise independent random walk with expected maximum distance $Ω(\sqrt{n} \lg n)$ from the origin. We prove that this bound is tight for the first and second moment, and also extract a surprising matrix inequality from these results. Next, we consider a generalization where the steps $X_i$ are $k$-wise independent random variables with bounded $p$th moments. For general $k, p$, we determine the (asymptotically) maximum possible $p$th moment of the supremum of $X_1 + \dots + X_i$ over $1 \le i \le n$. We highlight the case $k = 4, p = 2$: here, we prove that the second moment of the furthest distance traveled is $O(\sum X_i^2)$. For this case, we only need the $X_i$'s to have bounded second moments and do not even need the $X_i$'s to be identically distributed. This implies an asymptotically stronger statement than Kolmogorov's maximal inequality that requires only $4$-wise independent random variables, and generalizes a recent result of Błasiok.

Shyam Narayanan

What is connected

Connect this record

See the researcher in context

Building this map preview

9 published item(s)

Better and Simpler Lower Bounds for Differentially Private Statistical Estimation

All-Pairs Shortest Path Distances with Differential Privacy: Improved Algorithms for Bounded and Unbounded Weights

Bias Reduction for Sum Estimation

Bounds on expected propagation time of probabilistic zero forcing

Improved Approximations for Euclidean $k$-means and $k$-median, via Nested Quasi-Independent Sets

Stochastic dendrites enable online learning in mixed-signal neuromorphic processing systems

Tight and Robust Private Mean Estimation with Few Users

Triangle and Four Cycle Counting with Predictions in Graph Streams

3-wise Independent Random Walks can be Slightly Unbounded