Researcher profile

Kshitij Khare

Kshitij Khare contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Optimality of Sub-network Laplace Approximations: New Results and Methods

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

preprint2022arXiv

A generalized likelihood based Bayesian approach for scalable joint regression and covariance selection in high dimensions

The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches which only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression based generalized likelihood and spike-and-slab priors, we develop an algorithm called Joint Regression Network Selector (JRNS) for joint regression and covariance selection which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.

preprint2021arXiv

Geometric ergodicity of Gibbs samplers for the Horseshoe and its regularized variants

The Horseshoe is a widely used and popular continuous shrinkage prior for high-dimensional Bayesian linear regression. Recently, regularized versions of the Horseshoe prior have also been introduced in the literature. Various Gibbs sampling Markov chains have been developed in the literature to generate approximate samples from the corresponding intractable posterior densities. Establishing geometric ergodicity of these Markov chains provides crucial technical justification for the accuracy of asymptotic standard errors for Markov chain based estimates of posterior quantities. In this paper, we establish geometric ergodicity for various Gibbs samplers corresponding to the Horseshoe prior and its regularized variants in the context of linear regression. First, we establish geometric ergodicity of a Gibbs sampler for the original Horseshoe posterior under strictly weaker conditions than existing analyses in the literature. Second, we consider the regularized Horseshoe prior introduced in Piironen and Vehtari (2017), and prove geometric ergodicity for a Gibbs sampling Markov chain to sample from the corresponding posterior without any truncation constraint on the global and local shrinkage parameters. Finally, we consider a variant of this regularized Horseshoe prior introduced in Nishimura and Suchard (2020), and again establish geometric ergodicity for a Gibbs sampling Markov chain to sample from the corresponding posterior.

preprint2020arXiv

B-CONCORD -- A scalable Bayesian high-dimensional precision matrix estimation procedure

Sparse estimation of the precision matrix under high-dimensional scaling constitutes a canonical problem in statistics and machine learning. Numerous regression and likelihood based approaches, many frequentist and some Bayesian in nature have been developed. Bayesian methods provide direct uncertainty quantification of the model parameters through the posterior distribution and thus do not require a second round of computations for obtaining debiased estimates of the model parameters and their confidence intervals. However, they are computationally expensive for settings involving more than 500 variables. To that end, we develop B-CONCORD for the problem at hand, a Bayesian analogue of the CONvex CORrelation selection methoD (CONCORD) introduced by Khare et al. (2015). B-CONCORD leverages the CONCORD generalized likelihood function together with a spike-and-slab prior distribution to induce sparsity in the precision matrix parameters. We establish model selection and estimation consistency under high-dimensional scaling; further, we develop a procedure that refits only the non-zero parameters of the precision matrix, leading to significant improvements in the estimates in finite samples. Extensive numerical work illustrates the computational scalability of the proposed approach vis-a-vis competing Bayesian methods, as well as its accuracy.

preprint2019arXiv

A Hybrid Scan Gibbs Sampler for Bayesian Models with Latent Variables

Gibbs sampling is a widely popular Markov chain Monte Carlo algorithm that can be used to analyze intractable posterior distributions associated with Bayesian hierarchical models. There are two standard versions of the Gibbs sampler: The systematic scan (SS) version, where all variables are updated at each iteration, and the random scan (RS) version, where a single, randomly selected variable is updated at each iteration. The literature comparing the theoretical properties of SS and RS Gibbs samplers is reviewed, and an alternative hybrid scan Gibbs sampler is introduced, which is particularly well suited to Bayesian models with latent variables. The word "hybrid" reflects the fact that the scan used within this algorithm has both systematic and random elements. Indeed, at each iteration, one updates the entire set of latent variables, along with a randomly chosen block of the remaining variables. The hybrid scan (HS) Gibbs sampler has important advantages over the two standard scan Gibbs samplers. Firstly, the HS algorithm is often easier to analyze from a theoretical standpoint. In particular, it can be much easier to establish the geometric ergodicity of a HS Gibbs Markov chain than to do the same for the corresponding SS and RS versions. Secondly, the sandwich methodology developed in Hobert and Marchev (2008), which is also reviewed, can be applied to the HS Gibbs algorithm (but not to the standard scan Gibbs samplers). It is shown that, under weak regularity conditions, adding sandwich steps to the HS Gibbs sampler always results in a theoretically superior algorithm. Three specific Bayesian hierarchical models of varying complexity are used to illustrate the results. One is a simple location-scale model for data from the Student's $t$ distribution, which is used as a pedagogical tool. The other two are sophisticated, yet practical Bayesian regression models.

preprint2014arXiv

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights.

preprint2014arXiv

Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection

Sparse high dimensional graphical model selection is a popular topic in contemporary machine learning. To this end, various useful approaches have been proposed in the context of $\ell_1$-penalized estimation in the Gaussian framework. Though many of these inverse covariance estimation approaches are demonstrably scalable and have leveraged recent advances in convex optimization, they still depend on the Gaussian functional form. To address this gap, a convex pseudo-likelihood based partial correlation graph estimation method (CONCORD) has been recently proposed. This method uses coordinate-wise minimization of a regression based pseudo-likelihood, and has been shown to have robust model selection properties in comparison with the Gaussian approach. In direct contrast to the parallel work in the Gaussian setting however, this new convex pseudo-likelihood framework has not leveraged the extensive array of methods that have been proposed in the machine learning literature for convex optimization. In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework. We present timing comparisons with coordinate-wise minimization and demonstrate that our approach yields tremendous payoffs for $\ell_1$-penalized partial correlation graph estimation outside the Gaussian setting, thus yielding the fastest and most scalable approach for such problems. We undertake a theoretical analysis of our approach and rigorously demonstrate convergence, and also derive rates thereof.