Source author record

Pritish Kamath

Pritish Kamath appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computational Complexity Cryptography and Security Data Structures and Algorithms Information Theory Logic in Computer Science math.IT Artificial Intelligence math.LO

Catalog footprint

What is connected

12works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions

The privacy loss distribution (PLD) provides a tight characterization of the privacy loss of a mechanism in the context of differential privacy (DP). Recent work has shown that PLD-based accounting allows for tighter $(\varepsilon, δ)$-DP guarantees for many popular mechanisms compared to other known methods. A key question in PLD-based accounting is how to approximate any (potentially continuous) PLD with a PLD over any specified discrete support. We present a novel approach to this problem. Our approach supports both pessimistic estimation, which overestimates the hockey-stick divergence (i.e., $δ$) for any value of $\varepsilon$, and optimistic estimation, which underestimates the hockey-stick divergence. Moreover, we show that our pessimistic estimate is the best possible among all pessimistic estimates. Experimental evaluation shows that our approach can work with much larger discretization intervals while keeping a similar error bound compared to previous approaches and yet give a better approximation than existing methods.

preprint2022arXiv

Do More Negative Samples Necessarily Hurt in Contrastive Learning?

Recent investigations in noise contrastive estimation suggest, both empirically as well as theoretically, that while having more "negative samples" in the contrastive loss improves downstream classification performance initially, beyond a threshold, it hurts downstream performance due to a "collision-coverage" trade-off. But is such a phenomenon inherent in contrastive learning? We show in a simple theoretical setting, where positive pairs are generated by sampling from the underlying latent class (introduced by Saunshi et al. (ICML 2019)), that the downstream performance of the representation optimizing the (population) contrastive loss in fact does not degrade with the number of negative samples. Along the way, we give a structural characterization of the optimal representation in our framework, for noise contrastive estimation. We also provide empirical support for our theoretical results on CIFAR-10 and CIFAR-100 datasets.

preprint2022arXiv

Faster Privacy Accounting via Evolving Discretization

We introduce a new algorithm for numerical composition of privacy random variables, useful for computing the accurate differential privacy parameters for composition of mechanisms. Our algorithm achieves a running time and memory usage of $\mathrm{polylog}(k)$ for the task of self-composing a mechanism, from a broad class of mechanisms, $k$ times; this class, e.g., includes the sub-sampled Gaussian mechanism, that appears in the analysis of differentially private stochastic gradient descent. By comparison, recent work by Gopi et al. (NeurIPS 2021) has obtained a running time of $\widetilde{O}(\sqrt{k})$ for the same task. Our approach extends to the case of composing $k$ different mechanisms in the same class, improving upon their running time and memory usage from $\widetilde{O}(k^{1.5})$ to $\widetilde{O}(k)$.

preprint2022arXiv

On the Power of Differentiable Learning versus PAC and SQ Learning

We study the power of learning via mini-batch stochastic gradient descent (SGD) on the population loss, and batch Gradient Descent (GD) on the empirical loss, of a differentiable model or neural network, and ask what learning problems can be learnt using these paradigms. We show that SGD and GD can always simulate learning with statistical queries (SQ), but their ability to go beyond that depends on the precision $ρ$ of the gradient calculations relative to the minibatch size $b$ (for SGD) and sample size $m$ (for GD). With fine enough precision relative to minibatch size, namely when $b ρ$ is small enough, SGD can go beyond SQ learning and simulate any sample-based learning algorithm and thus its learning power is equivalent to that of PAC learning; this extends prior work that achieved this result for $b=1$. Similarly, with fine enough precision relative to the sample size $m$, GD can also simulate any sample-based learning algorithm based on $m$ samples. In particular, with polynomially many bits of precision (i.e. when $ρ$ is exponentially small), SGD and GD can both simulate PAC learning regardless of the mini-batch size. On the other hand, when $b ρ^2$ is large enough, the power of SGD is equivalent to that of SQ learning.

preprint2021arXiv

Does Invariant Risk Minimization Capture Invariance?

We show that the Invariant Risk Minimization (IRM) formulation of Arjovsky et al. (2019) can fail to capture "natural" invariances, at least when used in its practical "linear" form, and even on very simple problems which directly follow the motivating examples for IRM. This can lead to worse generalization on new environments, even when compared to unconstrained ERM. The issue stems from a significant gap between the linear variant (as in their concrete method IRMv1) and the full non-linear IRM formulation. Additionally, even when capturing the "right" invariances, we show that it is possible for IRM to learn a sub-optimal predictor, due to the loss function not being invariant across environments. The issues arise even when measuring invariance on the population distributions, but are exacerbated by the fact that IRM is extremely fragile to sampling.

preprint2021arXiv

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

We study the relative power of learning with gradient descent on differentiable models, such as neural networks, versus using the corresponding tangent kernels. We show that under certain conditions, gradient descent achieves small error only if a related tangent kernel method achieves a non-trivial advantage over random guessing (a.k.a. weak learning), though this advantage might be very small even when gradient descent can achieve arbitrarily high accuracy. Complementing this, we show that without these conditions, gradient descent can in fact learn with small error even when no kernel method, in particular using the tangent kernel, can achieve a non-trivial advantage over random guessing.

preprint2020arXiv

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity

We present and study approximate notions of dimensional and margin complexity, which correspond to the minimal dimension or norm of an embedding required to approximate, rather then exactly represent, a given hypothesis class. We show that such notions are not only sufficient for learning using linear predictors or a kernel, but unlike the exact variants, are also necessary. Thus they are better suited for discussing limitations of linear or kernel methods.

preprint2020arXiv

On the Complexity of Modulo-q Arguments and the Chevalley-Warning Theorem

We study the search problem class $\mathrm{PPA}_q$ defined as a modulo-$q$ analog of the well-known $\textit{polynomial parity argument}$ class $\mathrm{PPA}$ introduced by Papadimitriou '94. Our first result shows that this class can be characterized in terms of $\mathrm{PPA}_p$ for prime $p$. Our main result is to establish that an $\textit{explicit}$ version of a search problem associated to the Chevalley--Warning theorem is complete for $\mathrm{PPA}_p$ for prime $p$. This problem is $\textit{natural}$ in that it does not explicitly involve circuits as part of the input. It is the first such complete problem for $\mathrm{PPA}_p$ when $p \ge 3$. Finally we discuss connections between Chevalley-Warning theorem and the well-studied $\textit{short integer solution}$ problem and survey the structural properties of $\mathrm{PPA}_q$.

preprint2016arXiv

Decidability of Non-Interactive Simulation of Joint Distributions

We present decidability results for a sub-class of "non-interactive" simulation problems, a well-studied class of problems in information theory. A non-interactive simulation problem is specified by two distributions $P(x,y)$ and $Q(u,v)$: The goal is to determine if two players, Alice and Bob, that observe sequences $X^n$ and $Y^n$ respectively where $\{(X_i, Y_i)\}_{i=1}^n$ are drawn i.i.d. from $P(x,y)$ can generate pairs $U$ and $V$ respectively (without communicating with each other) with a joint distribution that is arbitrarily close in total variation to $Q(u,v)$. Even when $P$ and $Q$ are extremely simple: e.g., $P$ is uniform on the triples $\{(0,0), (0,1), (1,0)\}$ and $Q$ is a "doubly symmetric binary source", i.e., $U$ and $V$ are uniform $\pm 1$ variables with correlation say $0.49$, it is open if $P$ can simulate $Q$. In this work, we show that whenever $P$ is a distribution on a finite domain and $Q$ is a $2 \times 2$ distribution, then the non-interactive simulation problem is decidable: specifically, given $δ> 0$ the algorithm runs in time bounded by some function of $P$ and $δ$ and either gives a non-interactive simulation protocol that is $δ$-close to $Q$ or asserts that no protocol gets $O(δ)$-close to $Q$. The main challenge to such a result is determining explicit (computable) convergence bounds on the number $n$ of samples that need to be drawn from $P(x,y)$ to get $δ$-close to $Q$. We invoke contemporary results from the analysis of Boolean functions such as the invariance principle and a regularity lemma to obtain such explicit bounds.

preprint2015arXiv

Communication Complexity of Permutation-Invariant Functions

Motivated by the quest for a broader understanding of communication complexity of simple functions, we introduce the class of "permutation-invariant" functions. A partial function $f:\{0,1\}^n \times \{0,1\}^n\to \{0,1,?\}$ is permutation-invariant if for every bijection $π:\{1,\ldots,n\} \to \{1,\ldots,n\}$ and every $\mathbf{x}, \mathbf{y} \in \{0,1\}^n$, it is the case that $f(\mathbf{x}, \mathbf{y}) = f(\mathbf{x}^π, \mathbf{y}^π)$. Most of the commonly studied functions in communication complexity are permutation-invariant. For such functions, we present a simple complexity measure (computable in time polynomial in $n$ given an implicit description of $f$) that describes their communication complexity up to polynomial factors and up to an additive error that is logarithmic in the input size. This gives a coarse taxonomy of the communication complexity of simple functions. Our work highlights the role of the well-known lower bounds of functions such as 'Set-Disjointness' and 'Indexing', while complementing them with the relatively lesser-known upper bounds for 'Gap-Inner-Product' (from the sketching literature) and 'Sparse-Gap-Inner-Product' (from the recent work of Canonne et al. [ITCS 2015]). We also present consequences to the study of communication complexity with imperfectly shared randomness where we show that for total permutation-invariant functions, imperfectly shared randomness results in only a polynomial blow-up in communication complexity after an additive $O(\log \log n)$ overhead.

preprint2012arXiv

Faster Algorithms for Alternating Refinement Relations

One central issue in the formal design and analysis of reactive systems is the notion of refinement that asks whether all behaviors of the implementation is allowed by the specification. The local interpretation of behavior leads to the notion of simulation. Alternating transition systems (ATSs) provide a general model for composite reactive systems, and the simulation relation for ATSs is known as alternating simulation. The simulation relation for fair transition systems is called fair simulation. In this work our main contributions are as follows: (1) We present an improved algorithm for fair simulation with Büchi fairness constraints; our algorithm requires $O(n^3 \cdot m)$ time as compared to the previous known $O(n^6)$-time algorithm, where $n$ is the number of states and $m$ is the number of transitions. (2) We present a game based algorithm for alternating simulation that requires $O(m^2)$-time as compared to the previous known $O((n \cdot m)^2)$-time algorithm, where $n$ is the number of states and $m$ is the size of transition relation. (3) We present an iterative algorithm for alternating simulation that matches the time complexity of the game based algorithm, but is more space efficient than the game based algorithm.

preprint2012arXiv

Preservation under Substructures modulo Bounded Cores

We investigate a model-theoretic property that generalizes the classical notion of "preservation under substructures". We call this property \emph{preservation under substructures modulo bounded cores}, and present a syntactic characterization via $Σ_2^0$ sentences for properties of arbitrary structures definable by FO sentences. As a sharper characterization, we further show that the count of existential quantifiers in the $Σ_2^0$ sentence equals the size of the smallest bounded core. We also present our results on the sharper characterization for special fragments of FO and also over special classes of structures. We present a (not FO-definable) class of finite structures for which the sharper characterization fails, but for which the classical Łoś-Tarski preservation theorem holds. As a fallout of our studies, we obtain combinatorial proofs of the Łoś-Tarski theorem for some of the aforementioned cases.

Pritish Kamath

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Connect the Dots: Tighter Discrete Approximations of Privacy Loss Distributions

Do More Negative Samples Necessarily Hurt in Contrastive Learning?

Faster Privacy Accounting via Evolving Discretization

On the Power of Differentiable Learning versus PAC and SQ Learning

Does Invariant Risk Minimization Capture Invariance?

Quantifying the Benefit of Using Differentiable Learning over Tangent Kernels

Approximate is Good Enough: Probabilistic Variants of Dimensional and Margin Complexity

On the Complexity of Modulo-q Arguments and the Chevalley-Warning Theorem

Decidability of Non-Interactive Simulation of Joint Distributions

Communication Complexity of Permutation-Invariant Functions

Faster Algorithms for Alternating Refinement Relations

Preservation under Substructures modulo Bounded Cores