Source author record

Sivakanth Gopi

Sivakanth Gopi appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Machine Learning Data Structures and Algorithms Computational Complexity Information Theory math.IT Computation and Language Computational Geometry Information Retrieval math.CO math.OC math.PR math.RA

Catalog footprint

What is connected

10works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Differentially Private Fine-tuning of Language Models

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on many standard NLP tasks. We propose a meta-framework for this problem, inspired by the recent success of highly parameter-efficient methods for fine-tuning. Our experiments show that differentially private adaptations of these approaches outperform previous private algorithms in three important dimensions: utility, privacy, and the computational and memory cost of private training. On many commonly studied datasets, the utility of private models approaches that of non-private models. For example, on the MNLI dataset we achieve an accuracy of $87.8\%$ using RoBERTa-Large and $83.5\%$ using RoBERTa-Base with a privacy budget of $ε= 6.7$. In comparison, absent privacy constraints, RoBERTa-Large achieves an accuracy of $90.2\%$. Our findings are similar for natural language generation tasks. Privately fine-tuning with DART, GPT-2-Small, GPT-2-Medium, GPT-2-Large, and GPT-2-XL achieve BLEU scores of 38.5, 42.0, 43.1, and 43.8 respectively (privacy budget of $ε= 6.8,δ=$ 1e-5) whereas the non-private baseline is $48.1$. All our experiments suggest that larger models are better suited for private fine-tuning: while they are well known to achieve superior accuracy non-privately, we find that they also better maintain their accuracy when privacy is introduced.

preprint2022arXiv

Differentially Private Set Union

We study the basic operation of set union in the global model of differential privacy. In this problem, we are given a universe $U$ of items, possibly of infinite size, and a database $D$ of users. Each user $i$ contributes a subset $W_i \subseteq U$ of items. We want an ($ε$,$δ$)-differentially private algorithm which outputs a subset $S \subset \cup_i W_i$ such that the size of $S$ is as large as possible. The problem arises in countless real world applications; it is particularly ubiquitous in natural language processing (NLP) applications as vocabulary extraction. For example, discovering words, sentences, $n$-grams etc., from private text data belonging to users is an instance of the set union problem. Known algorithms for this problem proceed by collecting a subset of items from each user, taking the union of such subsets, and disclosing the items whose noisy counts fall above a certain threshold. Crucially, in the above process, the contribution of each individual user is always independent of the items held by other users, resulting in a wasteful aggregation process, where some item counts happen to be way above the threshold. We deviate from the above paradigm by allowing users to contribute their items in a $\textit{dependent fashion}$, guided by a $\textit{policy}$. In this new setting ensuring privacy is significantly delicate. We prove that any policy which has certain $\textit{contractive}$ properties would result in a differentially private algorithm. We design two new algorithms, one using Laplace noise and other Gaussian noise, as specific instances of policies satisfying the contractive properties. Our experiments show that the new algorithms significantly outperform previously known mechanisms for the problem.

preprint2022arXiv

Improved Maximally Recoverable LRCs using Skew Polynomials

An $(n,r,h,a,q)$-Local Reconstruction Code (LRC) is a linear code over $\mathbb{F}_q$ of length $n$, whose codeword symbols are partitioned into $n/r$ local groups each of size $r$. Each local group satisfies `$a$' local parity checks to recover from `$a$' erasures in that local group and there are further $h$ global parity checks to provide fault tolerance from more global erasure patterns. Such an LRC is Maximally Recoverable (MR), if it offers the best blend of locality and global erasure resilience -- namely it can correct all erasure patterns whose recovery is information-theoretically feasible given the locality structure (these are precisely patterns with up to `$a$' erasures in each local group and an additional $h$ erasures anywhere in the codeword). Random constructions can easily show the existence of MR LRCs over very large fields, but a major algebraic challenge is to construct MR LRCs, or even show their existence, over smaller fields, as well as understand inherent lower bounds on their field size. We give an explicit construction of $(n,r,h,a,q)$-MR LRCs with field size $q$ bounded by $\left(O\left(\max\{r,n/r\}\right)\right)^{\min\{h,r-a\}}$. This improves upon known constructions in many relevant parameter ranges. Moreover, it matches the lower bound from Gopi et al. (2020) in an interesting range of parameters where $r=Θ(\sqrt{n})$, $r-a=Θ(\sqrt{n})$ and $h$ is a fixed constant with $h\le a+2$, achieving the optimal field size of $Θ_{h}(n^{h/2}).$ Our construction is based on the theory of skew polynomials. We believe skew polynomials should have further applications in coding and complexity theory; as a small illustration we show how to capture algebraic results underlying list decoding folded Reed-Solomon and multiplicity codes in a unified way within this theory.

preprint2022arXiv

Private Convex Optimization via Exponential Mechanism

In this paper, we study private optimization problems for non-smooth convex functions $F(x)=\mathbb{E}_i f_i(x)$ on $\mathbb{R}^d$. We show that modifying the exponential mechanism by adding an $\ell_2^2$ regularizer to $F(x)$ and sampling from $π(x)\propto \exp(-k(F(x)+μ\|x\|_2^2/2))$ recovers both the known optimal empirical risk and population loss under $(ε,δ)$-DP. Furthermore, we show how to implement this mechanism using $\widetilde{O}(n \min(d, n))$ queries to $f_i(x)$ for the DP-SCO where $n$ is the number of samples/users and $d$ is the ambient dimension. We also give a (nearly) matching lower bound $\widetildeΩ(n \min(d, n))$ on the number of evaluation queries. Our results utilize the following tools that are of independent interest: (1) We prove Gaussian Differential Privacy (GDP) of the exponential mechanism if the loss function is strongly convex and the perturbation is Lipschitz. Our privacy bound is \emph{optimal} as it includes the privacy of Gaussian mechanism as a special case and is proved using the isoperimetric inequality for strongly log-concave measures. (2) We show how to sample from $\exp(-F(x)-μ\|x\|^2_2/2)$ for $G$-Lipschitz $F$ with $η$ error in total variation (TV) distance using $\widetilde{O}((G^2/μ) \log^2(d/η))$ unbiased queries to $F(x)$. This is the first sampler whose query complexity has \emph{polylogarithmic dependence} on both dimension $d$ and accuracy $η$.

preprint2021arXiv

Fast and Memory Efficient Differentially Private-SGD via JL Projections

Differentially Private-SGD (DP-SGD) of Abadi et al. (2016) and its variations are the only known algorithms for private training of large scale neural networks. This algorithm requires computation of per-sample gradients norms which is extremely slow and memory intensive in practice. In this paper, we present a new framework to design differentially private optimizers called DP-SGD-JL and DP-Adam-JL. Our approach uses Johnson-Lindenstrauss (JL) projections to quickly approximate the per-sample gradient norms without exactly computing them, thus making the training time and memory requirements of our optimizers closer to that of their non-DP versions. Unlike previous attempts to make DP-SGD faster which work only on a subset of network architectures or use compiler techniques, we propose an algorithmic solution which works for any network in a black-box manner which is the main contribution of this paper. To illustrate this, on IMDb dataset, we train a Recurrent Neural Network (RNN) to achieve good privacy-vs-accuracy tradeoff, while being significantly faster than DP-SGD and with a similar memory footprint as non-private SGD. The privacy analysis of our algorithms is more involved than DP-SGD, we use the recently proposed f-DP framework of Dong et al. (2019) to prove privacy.

preprint2020arXiv

Locally Private Hypothesis Selection

We initiate the study of hypothesis selection under local differential privacy. Given samples from an unknown probability distribution $p$ and a set of $k$ probability distributions $\mathcal{Q}$, we aim to output, under the constraints of $\varepsilon$-local differential privacy, a distribution from $\mathcal{Q}$ whose total variation distance to $p$ is comparable to the best such distribution. This is a generalization of the classic problem of $k$-wise simple hypothesis testing, which corresponds to when $p \in \mathcal{Q}$, and we wish to identify $p$. Absent privacy constraints, this problem requires $O(\log k)$ samples from $p$, and it was recently shown that the same complexity is achievable under (central) differential privacy. However, the naive approach to this problem under local differential privacy would require $\tilde O(k^2)$ samples. We first show that the constraint of local differential privacy incurs an exponential increase in cost: any algorithm for this problem requires at least $Ω(k)$ samples. Second, for the special case of $k$-wise simple hypothesis testing, we provide a non-interactive algorithm which nearly matches this bound, requiring $\tilde O(k)$ samples. Finally, we provide sequentially interactive algorithms for the general case, requiring $\tilde O(k)$ samples and only $O(\log \log k)$ rounds of interactivity. Our algorithms are achieved through a reduction to maximum selection with adversarial comparators, a problem of independent interest for which we initiate study in the parallel setting. For this problem, we provide a family of algorithms for each number of allowed rounds of interaction $t$, as well as lower bounds showing that they are near-optimal for every $t$. Notably, our algorithms result in exponential improvements on the round complexity of previous methods.

preprint2016arXiv

Competitive analysis of the top-K ranking problem

Motivated by applications in recommender systems, web search, social choice and crowdsourcing, we consider the problem of identifying the set of top $K$ items from noisy pairwise comparisons. In our setting, we are non-actively given $r$ pairwise comparisons between each pair of $n$ items, where each comparison has noise constrained by a very general noise model called the strong stochastic transitivity (SST) model. We analyze the competitive ratio of algorithms for the top-$K$ problem. In particular, we present a linear time algorithm for the top-$K$ problem which has a competitive ratio of $\tilde{O}(\sqrt{n})$; i.e. to solve any instance of top-$K$, our algorithm needs at most $\tilde{O}(\sqrt{n})$ times as many samples needed as the best possible algorithm for that instance (in contrast, all previous known algorithms for the top-$K$ problem have competitive ratios of $\tildeΩ(n)$ or worse). We further show that this is tight: any algorithm for the top-$K$ problem has competitive ratio at least $\tildeΩ(\sqrt{n})$.

preprint2015arXiv

Lower bounds for constant query affine-invariant LCCs and LTCs

Affine-invariant codes are codes whose coordinates form a vector space over a finite field and which are invariant under affine transformations of the coordinate space. They form a natural, well-studied class of codes; they include popular codes such as Reed-Muller and Reed-Solomon. A particularly appealing feature of affine-invariant codes is that they seem well-suited to admit local correctors and testers. In this work, we give lower bounds on the length of locally correctable and locally testable affine-invariant codes with constant query complexity. We show that if a code $\mathcal{C} \subset Σ^{\mathbb{K}^n}$ is an $r$-query locally correctable code (LCC), where $\mathbb{K}$ is a finite field and $Σ$ is a finite alphabet, then the number of codewords in $\mathcal{C}$ is at most $\exp(O_{\mathbb{K}, r, |Σ|}(n^{r-1}))$. Also, we show that if $\mathcal{C} \subset Σ^{\mathbb{K}^n}$ is an $r$-query locally testable code (LTC), then the number of codewords in $\mathcal{C}$ is at most $\exp(O_{\mathbb{K}, r, |Σ|}(n^{r-2}))$. The dependence on $n$ in these bounds is tight for constant-query LCCs/LTCs, since Guo, Kopparty and Sudan (ITCS `13) construct affine-invariant codes via lifting that have the same asymptotic tradeoffs. Note that our result holds for non-linear codes, whereas previously, Ben-Sasson and Sudan (RANDOM `11) assumed linearity to derive similar results. Our analysis uses higher-order Fourier analysis. In particular, we show that the codewords corresponding to an affine-invariant LCC/LTC must be far from each other with respect to Gowers norm of an appropriate order. This then allows us to bound the number of codewords, using known decomposition theorems which approximate any bounded function in terms of a finite number of low-degree non-classical polynomials, upto a small error in the Gowers norm.

preprint2014arXiv

2-Server PIR with sub-polynomial communication

A 2-server Private Information Retrieval (PIR) scheme allows a user to retrieve the $i$th bit of an $n$-bit database replicated among two servers (which do not communicate) while not revealing any information about $i$ to either server. In this work we construct a 1-round 2-server PIR with total communication cost $n^{O({\sqrt{\log\log n/\log n}})}$. This improves over the currently known 2-server protocols which require $O(n^{1/3})$ communication and matches the communication cost of known 3-server PIR schemes. Our improvement comes from reducing the number of servers in existing protocols, based on Matching Vector Codes, from 3 or 4 servers to 2. This is achieved by viewing these protocols in an algebraic way (using polynomial interpolation) and extending them using partial derivatives.

preprint2014arXiv

On the number of rich lines in truly high dimensional sets

We prove a new upper bound on the number of $r$-rich lines (lines with at least $r$ points) in a `truly' $d$-dimensional configuration of points $v_1,\ldots,v_n \in \mathbb{C}^d$. More formally, we show that, if the number of $r$-rich lines is significantly larger than $n^2/r^d$ then there must exist a large subset of the points contained in a hyperplane. We conjecture that the factor $r^d$ can be replaced with a tight $r^{d+1}$. If true, this would generalize the classic Szemerédi-Trotter theorem which gives a bound of $n^2/r^3$ on the number of $r$-rich lines in a planar configuration. This conjecture was shown to hold in $\mathbb{R}^3$ in the seminal work of Guth and Katz \cite{GK10} and was also recently proved over $\mathbb{R}^4$ (under some additional restrictions) \cite{SS14}. For the special case of arithmetic progressions ($r$ collinear points that are evenly distanced) we give a bound that is tight up to low order terms, showing that a $d$-dimensional grid achieves the largest number of $r$-term progressions. The main ingredient in the proof is a new method to find a low degree polynomial that vanishes on many of the rich lines. Unlike previous applications of the polynomial method, we do not find this polynomial by interpolation. The starting observation is that the degree $r-2$ Veronese embedding takes $r$-collinear points to $r$ linearly dependent images. Hence, each collinear $r$-tuple of points, gives us a dependent $r$-tuple of images. We then use the design-matrix method of \cite{BDWY12} to convert these 'local' linear dependencies into a global one, showing that all the images lie in a hyperplane. This then translates into a low degree polynomial vanishing on the original set.

Sivakanth Gopi

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Differentially Private Fine-tuning of Language Models

Differentially Private Set Union

Improved Maximally Recoverable LRCs using Skew Polynomials

Private Convex Optimization via Exponential Mechanism

Fast and Memory Efficient Differentially Private-SGD via JL Projections

Locally Private Hypothesis Selection

Competitive analysis of the top-K ranking problem

Lower bounds for constant query affine-invariant LCCs and LTCs

2-Server PIR with sub-polynomial communication

On the number of rich lines in truly high dimensional sets