Source author record

Bharath K. Sriperumbudur

Bharath K. Sriperumbudur appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory Applications Information Theory math.FA math.IT math.PR Methodology

Catalog footprint

What is connected

11works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Sobolev Regularized MMD Gradient Flow

We propose Sobolev-regularized Maximum Mean Discrepancy (SrMMD) gradient flow, a regularized variant of maximum mean discrepancy (MMD) gradient flow based on a gradient penalty on the witness function. The proposed regularization mitigates the non-convexity of the MMD objective and yields provable \emph{global} convergence guarantees in MMD in both continuous and discrete time. A more surprising appeal is that our convergence analysis does not rely on isoperimetric assumptions on the target distribution. Instead, it is based on a regularity condition on the difference between kernel mean embeddings. A key highlight of the proposed flow is that it is applicable in both sampling (from an unnormalized target distribution) -- using Stein kernels -- and generative modeling settings, unlike previous works, where a gradient flow is suitable for only generative modeling or sampling but not both. The effectiveness of the proposed flow is empirically verified on a broad range of tasks in both generative modelling and sampling.

preprint2022arXiv

Shrinkage Estimation for the Diagonal Multivariate Exponential Families

We study shrinkage estimation of the mean parameters of a class of multivariate distributions for which the diagonal entries of the corresponding covariance matrix are certain quadratic functions of the mean parameter. This class of distributions includes the diagonal multivariate natural exponential families. We propose two classes of semi-parametric shrinkage estimators for the mean and construct unbiased estimators of the corresponding risk. We establish the asymptotic consistency and convergence rates for these shrinkage estimators under squared error loss as both $n$, the sample size, and $p$, the dimension, tend to infinity. Next, we specialize these results to the diagonal multivariate natural exponential families, which have been classified as consisting of the normal, Poisson, gamma, multinomial, negative multinomial, and hybrid classes of distributions. We establish the consistency of our estimators in the normal, gamma, and negative multinomial cases subject to the condition that $p n^{-1/3} (\log{n})^{4/3} \to 0$, and in the Poisson and multinomial cases if $p n^{-1/2} \to 0$, as $n,p \to \infty$. Simulation studies are provided to evaluate the performance of our estimators and we illustrate that, in the gamma and Poisson cases, our estimators achieve lower risk than the maximum likelihood estimator, thereby demonstrating the superiority of our estimators over the maximum likelihood estimator.

preprint2022arXiv

Shrinkage Estimation of Higher Order Bochner Integrals

We consider shrinkage estimation of higher order Hilbert space valued Bochner integrals in a non-parametric setting. We propose estimators that shrink the $U$-statistic estimator of the Bochner integral towards a pre-specified target element in the Hilbert space. Depending on the degeneracy of the kernel of the $U$-statistic, we construct consistent shrinkage estimators with fast rates of convergence, and develop oracle inequalities comparing the risks of the the $U$-statistic estimator and its shrinkage version. Surprisingly, we show that the shrinkage estimator designed by assuming complete degeneracy of the kernel of the $U$-statistic is a consistent estimator even when the kernel is not complete degenerate. This work subsumes and improves upon Krikamol et al., 2016, JMLR and Zhou et al., 2019, JMVA, which only handle mean element and covariance operator estimation in a reproducing kernel Hilbert space. We also specialize our results to normal mean estimation and show that for $d\ge 3$, the proposed estimator strictly improves upon the sample mean in terms of the mean squared error.

preprint2021arXiv

Local minimax rates for closeness testing of discrete distributions

We consider the closeness testing problem for discrete distributions. The goal is to distinguish whether two samples are drawn from the same unspecified distribution, or whether their respective distributions are separated in $L_1$-norm. In this paper, we focus on adapting the rate to the shape of the underlying distributions, i.e. we consider \textit{a local minimax setting}. We provide, to the best of our knowledge, the first local minimax rate for the separation distance up to logarithmic factors, together with a test that achieves it. In view of the rate, closeness testing turns out to be substantially harder than the related one-sample testing problem over a wide range of cases.

preprint2020arXiv

Gaussian Sketching yields a J-L Lemma in RKHS

The main contribution of the paper is to show that Gaussian sketching of a kernel-Gram matrix $\boldsymbol K$ yields an operator whose counterpart in an RKHS $\mathcal H$, is a \emph{random projection} operator---in the spirit of Johnson-Lindenstrauss (J-L) lemma. To be precise, given a random matrix $Z$ with i.i.d. Gaussian entries, we show that a sketch $Z\boldsymbol{K}$ corresponds to a particular random operator in (infinite-dimensional) Hilbert space $\mathcal H$ that maps functions $f \in \mathcal H$ to a low-dimensional space $\mathbb R^d$, while preserving a weighted RKHS inner-product of the form $\langle f, g \rangle_Σ \doteq \langle f, Σ^3 g \rangle_{\mathcal H}$, where $Σ$ is the \emph{covariance} operator induced by the data distribution. In particular, under similar assumptions as in kernel PCA (KPCA), or kernel $k$-means (K-$k$-means), well-separated subsets of feature-space $\{K(\cdot, x): x \in \cal X\}$ remain well-separated after such operation, which suggests similar benefits as in KPCA and/or K-$k$-means, albeit at the much cheaper cost of a random projection. In particular, our convergence rates suggest that, given a large dataset $\{X_i\}_{i=1}^N$ of size $N$, we can build the Gram matrix $\boldsymbol K$ on a much smaller subsample of size $n\ll N$, so that the sketch $Z\boldsymbol K$ is very cheap to obtain and subsequently apply as a projection operator on the original data $\{X_i\}_{i=1}^N$. We verify these insights empirically on synthetic data, and on real-world clustering applications.

preprint2020arXiv

On Distance and Kernel Measures of Conditional Independence

Measuring conditional independence is one of the important tasks in statistical inference and is fundamental in causal discovery, feature selection, dimensionality reduction, Bayesian network learning, and others. In this work, we explore the connection between conditional independence measures induced by distances on a metric space and reproducing kernels associated with a reproducing kernel Hilbert space (RKHS). For certain distance and kernel pairs, we show the distance-based conditional independence measures to be equivalent to that of kernel-based measures. On the other hand, we also show that some popular---in machine learning---kernel conditional independence measures based on the Hilbert-Schmidt norm of a certain cross-conditional covariance operator, do not have a simple distance representation, except in some limiting cases. This paper, therefore, shows the distance and kernel measures of conditional independence to be not quite equivalent unlike in the case of joint independence as shown by Sejdinovic et al. (2013).

preprint2016arXiv

Convergence guarantees for kernel-based quadrature rules in misspecified settings

Kernel-based quadrature rules are becoming important in machine learning and statistics, as they achieve super-$\sqrt{n}$ convergence rates in numerical integration, and thus provide alternatives to Monte Carlo integration in challenging settings where integrands are expensive to evaluate or where integrands are high dimensional. These rules are based on the assumption that the integrand has a certain degree of smoothness, which is expressed as that the integrand belongs to a certain reproducing kernel Hilbert space (RKHS). However, this assumption can be violated in practice (e.g., when the integrand is a black box function), and no general theory has been established for the convergence of kernel quadratures in such misspecified settings. Our contribution is in proving that kernel quadratures can be consistent even when the integrand does not belong to the assumed RKHS, i.e., when the integrand is less smooth than assumed. Specifically, we derive convergence rates that depend on the (unknown) lesser smoothness of the integrand, where the degree of smoothness is expressed via powers of RKHSs or via Sobolev spaces.

preprint2015arXiv

Optimal Rates for Random Fourier Features

Kernel methods represent one of the most powerful tools in machine learning to tackle problems expressed in terms of function values and derivatives due to their capability to represent and model complex relations. While these methods show good versatility, they are computationally intensive and have poor scalability to large data as they require operations on Gram matrices. In order to mitigate this serious computational limitation, recently randomized constructions have been proposed in the literature, which allow the application of fast linear algorithms. Random Fourier features (RFF) are among the most popular and widely applied constructions: they provide an easily computable, low-dimensional feature representation for shift-invariant kernels. Despite the popularity of RFFs, very little is understood theoretically about their approximation quality. In this paper, we provide a detailed finite-sample theoretical analysis about the approximation quality of RFFs by (i) establishing optimal (in terms of the RFF dimension, and growing set size) performance guarantees in uniform norm, and (ii) presenting guarantees in $L^r$ ($1\le r<\infty$) norms. We also propose an RFF approximation to derivatives of a kernel with a theoretical study on its approximation quality.

preprint2010arXiv

Discussion of: Brownian distance covariance

Discussion on "Brownian distance covariance" by Gábor J. Székely and Maria L. Rizzo [arXiv:1010.0297]

preprint2010arXiv

Hilbert space embeddings and metrics on probability measures

A Hilbert space embedding for probability measures has recently been proposed, with applications including dimensionality reduction, homogeneity testing, and independence testing. This embedding represents any probability measure as a mean element in a reproducing kernel Hilbert space (RKHS). A pseudometric on the space of probability measures can be defined as the distance between distribution embeddings: we denote this as $γ_k$, indexed by the kernel function $k$ that defines the inner product in the RKHS. We present three theoretical properties of $γ_k$. First, we consider the question of determining the conditions on the kernel $k$ for which $γ_k$ is a metric: such $k$ are denoted {\em characteristic kernels}. Unlike pseudometrics, a metric is zero only when two distributions coincide, thus ensuring the RKHS embedding maps all distributions uniquely (i.e., the embedding is injective). While previously published conditions may apply only in restricted circumstances (e.g. on compact domains), and are difficult to check, our conditions are straightforward and intuitive: bounded continuous strictly positive definite kernels are characteristic. Alternatively, if a bounded continuous kernel is translation-invariant on $\bb{R}^d$, then it is characteristic if and only if the support of its Fourier transform is the entire $\bb{R}^d$. Second, we show that there exist distinct distributions that are arbitrarily close in $γ_k$. Third, to understand the nature of the topology induced by $γ_k$, we relate $γ_k$ to other popular metrics on probability measures, and present conditions on the kernel $k$ under which $γ_k$ metrizes the weak topology.

preprint2010arXiv

Universality, Characteristic Kernels and RKHS Embedding of Measures

A Hilbert space embedding for probability measures has recently been proposed, wherein any probability measure is represented as a mean element in a reproducing kernel Hilbert space (RKHS). Such an embedding has found applications in homogeneity testing, independence testing, dimensionality reduction, etc., with the requirement that the reproducing kernel is characteristic, i.e., the embedding is injective. In this paper, we generalize this embedding to finite signed Borel measures, wherein any finite signed Borel measure is represented as a mean element in an RKHS. We show that the proposed embedding is injective if and only if the kernel is universal. This therefore, provides a novel characterization of universal kernels, which are proposed in the context of achieving the Bayes risk by kernel-based classification/regression algorithms. By exploiting this relation between universality and the embedding of finite signed Borel measures into an RKHS, we establish the relation between universal and characteristic kernels.

Bharath K. Sriperumbudur

What is connected

Connect this record

See the researcher in context

Building this map preview

11 published item(s)

Sobolev Regularized MMD Gradient Flow

Shrinkage Estimation for the Diagonal Multivariate Exponential Families

Shrinkage Estimation of Higher Order Bochner Integrals

Local minimax rates for closeness testing of discrete distributions

Gaussian Sketching yields a J-L Lemma in RKHS

On Distance and Kernel Measures of Conditional Independence

Convergence guarantees for kernel-based quadrature rules in misspecified settings

Optimal Rates for Random Fourier Features

Discussion of: Brownian distance covariance

Hilbert space embeddings and metrics on probability measures

Universality, Characteristic Kernels and RKHS Embedding of Measures