Source author record

Kshitij Khare

Kshitij Khare appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology math.ST Statistics Theory Computation Machine Learning math.CO math.OC math.PR Populations and Evolution Quantitative Methods

Catalog footprint

What is connected

18works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Optimality of Sub-network Laplace Approximations: New Results and Methods

Although the Laplace approximation offers a simple route to uncertainty quantification in deep neural networks, its reliance on inverting large Hessian matrices has motivated a range of computationally feasible low-dimensional or sparse approximations. A prominent class of such methods - sub-network Laplace approximations, constructs surrogates by restricting attention to a small subset of parameters. Existing approaches in this family typically rely on diagonal, layer-wise, or other architectural heuristics for subset selection, which ignore cross-parameter interactions and lack formal optimality guarantees. In this paper, we provide a rigorous theoretical analysis of the sub-network Laplace paradigm. We prove that all sub-network Laplace methods systematically underestimate the predictive variance of the full Laplace posterior, and that this bias decreases monotonically as the retained sub-matrix expands. Leveraging this insight, we propose two principled, analytically grounded sub-network Hessian approximations: \textit{Gradient-Laplace} selects parameters with the largest average squared gradients of the model output with respect to the parameters over a reference dataset; while \textit{Greedy-Laplace} iteratively refines this selection by accounting for off-diagonal interactions in the precision matrix. We establish theoretical guarantees characterizing their optimality properties and show that Gradient-Laplace provably outperforms existing heuristic approaches. Extensive numerical studies across diverse settings indicate that these methods perform strongly relative to existing benchmarks.

preprint2022arXiv

A generalized likelihood based Bayesian approach for scalable joint regression and covariance selection in high dimensions

The paper addresses joint sparsity selection in the regression coefficient matrix and the error precision (inverse covariance) matrix for high-dimensional multivariate regression models in the Bayesian paradigm. The selected sparsity patterns are crucial to help understand the network of relationships between the predictor and response variables, as well as the conditional relationships among the latter. While Bayesian methods have the advantage of providing natural uncertainty quantification through posterior inclusion probabilities and credible intervals, current Bayesian approaches either restrict to specific sub-classes of sparsity patterns and/or are not scalable to settings with hundreds of responses and predictors. Bayesian approaches which only focus on estimating the posterior mode are scalable, but do not generate samples from the posterior distribution for uncertainty quantification. Using a bi-convex regression based generalized likelihood and spike-and-slab priors, we develop an algorithm called Joint Regression Network Selector (JRNS) for joint regression and covariance selection which (a) can accommodate general sparsity patterns, (b) provides posterior samples for uncertainty quantification, and (c) is scalable and orders of magnitude faster than the state-of-the-art Bayesian approaches providing uncertainty quantification. We demonstrate the statistical and computational efficacy of the proposed approach on synthetic data and through the analysis of selected cancer data sets. We also establish high-dimensional posterior consistency for one of the developed algorithms.

preprint2021arXiv

Geometric ergodicity of Gibbs samplers for the Horseshoe and its regularized variants

The Horseshoe is a widely used and popular continuous shrinkage prior for high-dimensional Bayesian linear regression. Recently, regularized versions of the Horseshoe prior have also been introduced in the literature. Various Gibbs sampling Markov chains have been developed in the literature to generate approximate samples from the corresponding intractable posterior densities. Establishing geometric ergodicity of these Markov chains provides crucial technical justification for the accuracy of asymptotic standard errors for Markov chain based estimates of posterior quantities. In this paper, we establish geometric ergodicity for various Gibbs samplers corresponding to the Horseshoe prior and its regularized variants in the context of linear regression. First, we establish geometric ergodicity of a Gibbs sampler for the original Horseshoe posterior under strictly weaker conditions than existing analyses in the literature. Second, we consider the regularized Horseshoe prior introduced in Piironen and Vehtari (2017), and prove geometric ergodicity for a Gibbs sampling Markov chain to sample from the corresponding posterior without any truncation constraint on the global and local shrinkage parameters. Finally, we consider a variant of this regularized Horseshoe prior introduced in Nishimura and Suchard (2020), and again establish geometric ergodicity for a Gibbs sampling Markov chain to sample from the corresponding posterior.

preprint2020arXiv

B-CONCORD -- A scalable Bayesian high-dimensional precision matrix estimation procedure

Sparse estimation of the precision matrix under high-dimensional scaling constitutes a canonical problem in statistics and machine learning. Numerous regression and likelihood based approaches, many frequentist and some Bayesian in nature have been developed. Bayesian methods provide direct uncertainty quantification of the model parameters through the posterior distribution and thus do not require a second round of computations for obtaining debiased estimates of the model parameters and their confidence intervals. However, they are computationally expensive for settings involving more than 500 variables. To that end, we develop B-CONCORD for the problem at hand, a Bayesian analogue of the CONvex CORrelation selection methoD (CONCORD) introduced by Khare et al. (2015). B-CONCORD leverages the CONCORD generalized likelihood function together with a spike-and-slab prior distribution to induce sparsity in the precision matrix parameters. We establish model selection and estimation consistency under high-dimensional scaling; further, we develop a procedure that refits only the non-zero parameters of the precision matrix, leading to significant improvements in the estimates in finite samples. Extensive numerical work illustrates the computational scalability of the proposed approach vis-a-vis competing Bayesian methods, as well as its accuracy.

preprint2019arXiv

A Hybrid Scan Gibbs Sampler for Bayesian Models with Latent Variables

Gibbs sampling is a widely popular Markov chain Monte Carlo algorithm that can be used to analyze intractable posterior distributions associated with Bayesian hierarchical models. There are two standard versions of the Gibbs sampler: The systematic scan (SS) version, where all variables are updated at each iteration, and the random scan (RS) version, where a single, randomly selected variable is updated at each iteration. The literature comparing the theoretical properties of SS and RS Gibbs samplers is reviewed, and an alternative hybrid scan Gibbs sampler is introduced, which is particularly well suited to Bayesian models with latent variables. The word "hybrid" reflects the fact that the scan used within this algorithm has both systematic and random elements. Indeed, at each iteration, one updates the entire set of latent variables, along with a randomly chosen block of the remaining variables. The hybrid scan (HS) Gibbs sampler has important advantages over the two standard scan Gibbs samplers. Firstly, the HS algorithm is often easier to analyze from a theoretical standpoint. In particular, it can be much easier to establish the geometric ergodicity of a HS Gibbs Markov chain than to do the same for the corresponding SS and RS versions. Secondly, the sandwich methodology developed in Hobert and Marchev (2008), which is also reviewed, can be applied to the HS Gibbs algorithm (but not to the standard scan Gibbs samplers). It is shown that, under weak regularity conditions, adding sandwich steps to the HS Gibbs sampler always results in a theoretically superior algorithm. Three specific Bayesian hierarchical models of varying complexity are used to illustrate the results. One is a simple location-scale model for data from the Student's $t$ distribution, which is used as a pedagogical tool. The other two are sophisticated, yet practical Bayesian regression models.

preprint2016arXiv

A convex framework for high-dimensional sparse Cholesky based covariance estimation

Covariance estimation for high-dimensional datasets is a fundamental problem in modern day statistics with numerous applications. In these high dimensional datasets, the number of variables p is typically larger than the sample size n. A popular way of tackling this challenge is to induce sparsity in the covariance matrix, its inverse or a relevant transformation. In particular, methods inducing sparsity in the Cholesky pa- rameter of the inverse covariance matrix can be useful as they are guaranteed to give a positive definite estimate of the covariance matrix. Also, the estimated sparsity pattern corresponds to a Directed Acyclic Graph (DAG) model for Gaussian data. In recent years, two useful penalized likelihood methods for sparse estimation of this Cholesky parameter (with no restrictions on the sparsity pattern) have been developed. How- ever, these methods either consider a non-convex optimization problem which can lead to convergence issues and singular estimates of the covariance matrix when p > n, or achieve a convex formulation by placing a strict constraint on the conditional variance parameters. In this paper, we propose a new penalized likelihood method for sparse estimation of the inverse covariance Cholesky parameter that aims to overcome some of the shortcomings of current methods, but retains their respective strengths. We ob- tain a jointly convex formulation for our objective function, which leads to convergence guarantees, even when p > n. The approach always leads to a positive definite and symmetric estimator of the covariance matrix. We establish high-dimensional estima- tion and graph selection consistency, and also demonstrate finite sample performance on simulated/real data.

preprint2016arXiv

Convergence Analysis of the Data Augmentation Algorithm for Bayesian Linear Regression with Non-Gaussian Errors

Gaussian errors are sometimes inappropriate in a multivariate linear regression setting because, for example, the data contain outliers. In such situations, it is often assumed that the error density is a scale mixture of multivariate normal densities that takes the form $f(\varepsilon) = \int_0^\infty |Σ|^{-\frac{1}{2}} u^{\frac{d}{2}} \, ϕ_d \big( Σ^{-\frac{1}{2}} \sqrt{u} \, \varepsilon \big) \, h(u) \, du$, where $d$ is the dimension of the response, $ϕ_d(\cdot)$ is the standard $d$-variate normal density, $Σ$ is an unknown $d \times d$ positive definite scale matrix, and $h(\cdot)$ is some fixed mixing density. Combining this alternative regression model with a default prior on the unknown parameters results in a highly intractable posterior density. Fortunately, there is a simple data augmentation (DA) algorithm and a corresponding Haar PX-DA algorithm that can be used to explore this posterior. This paper provides conditions (on $h$) for geometric ergodicity of the Markov chains underlying these Markov chain Monte Carlo (MCMC) algorithms. These results are extremely important from a practical standpoint because geometric ergodicity guarantees the existence of the central limit theorems that form the basis of all the standard methods of calculating valid asymptotic standard errors for MCMC-based estimators. The main result is that, if $h$ converges to 0 at the origin at an appropriate rate, and $\int_0^\infty u^{\frac{d}{2}} \, h(u) \, du < \infty$, then the DA and Haar PX-DA Markov chains are both geometrically ergodic. This result is quite far-reaching. For example, it implies the geometric ergodicity of the DA and Haar PX-DA Markov chains whenever $h$ is generalized inverse Gaussian, log-normal, inverted gamma (with shape parameter larger than $d/2$), or Fréchet (with shape parameter larger than $d/2$).

preprint2016arXiv

Generalized Pseudolikelihood Methods for Inverse Covariance Estimation

We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties. We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use in a downstream application, where obtaining low estimation error can be important. We also show, under regularity conditions, that PseudoNet is consistent. Our proof assumes the existence of accurate estimates of the diagonal entries of the underlying inverse covariance matrix; we additionally provide a two-step method to obtain these estimates, even in a high-dimensional setting, going beyond the proofs for related methods. Unlike other pseudolikelihood-based methods, we also show that PseudoNet does not saturate, i.e., in high dimensions, there is no hard limit on the number of nonzero entries in the PseudoNet estimate. We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable.

preprint2015arXiv

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph G is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To address this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized G-Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the G-Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized G-Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

preprint2015arXiv

Convergence of cyclic coordinatewise l1 minimization

We consider the general problem of minimizing an objective function which is the sum of a convex function (not strictly convex) and absolute values of a subset of variables (or equivalently the l1-norm of the variables). This problem appears exten- sively in modern statistical applications associated with high-dimensional data or "big data", and corresponds to optimizing l1-regularized likelihoods in the context of model selection. In such applications, cyclic coordinatewise minimization (CCM), where the objective function is sequentially minimized with respect to each individual coordi- nate, is often employed as it offers a computationally cheap and effective optimization method. Consequently, it is crucial to obtain theoretical guarantees of convergence for the sequence of iterates produced by the cyclic coordinatewise minimization in this setting. Moreover, as the objective corresponds to at l1-regularized likelihoods of many variables, it is important to obtain convergence of the iterates themselves, and not just the function values. Previous results in the literature only establish either, (i) that every limit point of the sequence of iterates is a stationary point of the objective function, or (ii) establish convergence under special assumptions, or (iii) establish con- vergence for a different minimization approach (which uses quadratic approximation based gradient descent followed by an inexact line search), (iv) establish convergence of only the function values of the sequence of iterates produced by random coordinatewise minimization (a variant of CCM). In this paper, a rigorous general proof of convergence for the cyclic coordinatewise minimization algorithm is provided. We demonstrate the usefulness of our general results in contemporary applications.

preprint2015arXiv

Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under $g$-Priors

We examine necessary and sufficient conditions for posterior consistency under $g$-priors, including extensions to hierarchical and empirical Bayesian models. The key features of this article are that we allow the number of regressors to grow at the same rate as the sample size and define posterior consistency under the sup vector norm instead of the more conventional Euclidean norm. We consider in particular the empirical Bayesian model of George and Foster (2000), the hyper-$g$-prior of Liang et al. (2008), and the prior considered by Zellner and Siow (1980).

preprint2015arXiv

On the Bayesness, minimaxity, and admissibility of point estimators of allelic frequencies

In this paper, decision theory was used to derive Bayes and minimax decision rules to estimate allelic frequencies and to explore their admissibility. Decision rules with uniformly smallest risk usually do not exist and one approach to solve this problem is to use the Bayes principle and the minimax principle to find decision rules satisfying some general optimality criterion based on their risk functions. Two cases were considered, the simpler case of biallelic loci and the more complex case of multiallelic loci. For each locus, the sampling model was a multinomial distribution and the prior was a Beta (biallelic case) or a Dirichlet (multiallelic case) distribution. Three loss functions were considered: squared error loss (SEL), Kulback-Leibler loss (KLL) and quadratic error loss (QEL). Bayes estimators were derived under these three loss functions and were subsequently used to find minimax estimators using results from decision theory. The Bayes estimators obtained from SEL and KLL turned out to be the same. Under certain conditions, the Bayes estimator derived from QEL led to an admissible minimax estimator (which was also equal to the maximum likelihood estimator). The SEL also allowed finding admissible minimax estimators. Some estimators had uniformly smaller variance than the MLE and under suitable conditions the remaining estimators also satisfied this property. In addition to their statistical properties, the estimators derived here allow variation in allelic frequencies, which is closer to the reality of finite populations exposed to evolutionary forces.

preprint2014arXiv

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights.

preprint2014arXiv

Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection

Sparse high dimensional graphical model selection is a popular topic in contemporary machine learning. To this end, various useful approaches have been proposed in the context of $\ell_1$-penalized estimation in the Gaussian framework. Though many of these inverse covariance estimation approaches are demonstrably scalable and have leveraged recent advances in convex optimization, they still depend on the Gaussian functional form. To address this gap, a convex pseudo-likelihood based partial correlation graph estimation method (CONCORD) has been recently proposed. This method uses coordinate-wise minimization of a regression based pseudo-likelihood, and has been shown to have robust model selection properties in comparison with the Gaussian approach. In direct contrast to the parallel work in the Gaussian setting however, this new convex pseudo-likelihood framework has not leveraged the extensive array of methods that have been proposed in the machine learning literature for convex optimization. In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework. We present timing comparisons with coordinate-wise minimization and demonstrate that our approach yields tremendous payoffs for $\ell_1$-penalized partial correlation graph estimation outside the Gaussian setting, thus yielding the fastest and most scalable approach for such problems. We undertake a theoretical analysis of our approach and rigorously demonstrate convergence, and also derive rates thereof.

preprint2013arXiv

Convergence analysis of some multivariate Markov chains using stochastic monotonicity

We provide a nonasymptotic analysis of convergence to stationarity for a collection of Markov chains on multivariate state spaces, from arbitrary starting points, thereby generalizing results in [Khare and Zhou Ann. Appl. Probab. 19 (2009) 737-777]. Our examples include the multi-allele Moran model in population genetics and its variants in community ecology, a generalized Ehrenfest urn model and variants of the Polya urn model. It is shown that all these Markov chains are stochastically monotone with respect to an appropriate partial ordering. Then, using a generalization of the results in [Diaconis, Khare and Saloff-Coste Sankhya 72 (2010) 45-76] and [Wilson Ann. Appl. Probab. 14 (2004) 274-325] (for univariate totally ordered spaces) to multivariate partially ordered spaces, we obtain explicit nonasymptotic bounds for the distance to stationarity from arbitrary starting points. In previous literature, bounds, if any, were available only from special starting points. The analysis also works for nonreversible Markov chains, and allows us to analyze cases of the multi-allele Moran model not considered in [Khare and Zhou Ann. Appl. Probab. 19 (2009) 737-777].

preprint2012arXiv

A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants

The data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm that is easy to implement but often suffers from slow convergence. The sandwich algorithm is an alternative that can converge much faster while requiring roughly the same computational effort per iteration. Theoretically, the sandwich algorithm always converges at least as fast as the corresponding DA algorithm in the sense that $\Vert {K^*}\Vert \le \Vert {K}\Vert$, where $K$ and $K^*$ are the Markov operators associated with the DA and sandwich algorithms, respectively, and $\Vert\cdot\Vert$ denotes operator norm. In this paper, a substantial refinement of this operator norm inequality is developed. In particular, under regularity conditions implying that $K$ is a trace-class operator, it is shown that $K^*$ is also a positive, trace-class operator, and that the spectrum of $K^*$ dominates that of $K$ in the sense that the ordered elements of the former are all less than or equal to the corresponding elements of the latter. Furthermore, if the sandwich algorithm is constructed using a group action, as described by Liu and Wu [J. Amer. Statist. Assoc. 94 (1999) 1264--1274] and Hobert and Marchev [Ann. Statist. 36 (2008) 532--554], then there is strict inequality between at least one pair of eigenvalues. These results are applied to a new DA algorithm for Bayesian quantile regression introduced by Kozumi and Kobayashi [J. Stat. Comput. Simul. 81 (2011) 1565--1578].

preprint2012arXiv

Sparse Matrix Decompositions and Graph Characterizations

The question of when zeros (i.e., sparsity) in a positive definite matrix $A$ are preserved in its Cholesky decomposition, and vice versa, was addressed by Paulsen et al. in the Journal of Functional Analysis (85, pp151-178). In particular, they prove that for the pattern of zeros in $A$ to be retained in the Cholesky decomposition of $A$, the pattern of zeros in $A$ has to necessarily correspond to a chordal (or decomposable) graph associated with a specific type of vertex ordering. This result therefore yields a characterization of chordal graphs in terms of sparse positive definite matrices. It has also proved to be extremely useful in probabilistic and statistical analysis of Markov random fields where zeros in positive definite correlation matrices are intimately related to the notion of stochastic independence. Now, consider a positive definite matrix $A$ and its Cholesky decomposition given by $A = LDL^T$, where $L$ is lower triangular with unit diagonal entries, and $D$ a diagonal matrix with positive entries. In this paper, we prove that a necessary and sufficient condition for zeros (i.e., sparsity) in a positive definite matrix $A$ to be preserved in its associated Cholesky matrix $L$, \, and in addition also preserved in the inverse of the Cholesky matrix $L^{-1}$, is that the pattern of zeros corresponds to a co-chordal or homogeneous graph associated with a specific type of vertex ordering. We proceed to provide a second characterization of this class of graphs in terms of determinants of submatrices that correspond to cliques in the graph. These results add to the growing body of literature in the field of sparse matrix decompositions, and also prove to be critical ingredients in the probabilistic analysis of an important class of Markov random fields.

preprint2011arXiv

Wishart distributions for decomposable covariance graph models

Gaussian covariance graph models encode marginal independence among the components of a multivariate random vector by means of a graph $G$. These models are distinctly different from the traditional concentration graph models (often also referred to as Gaussian graphical models or covariance selection models) since the zeros in the parameter are now reflected in the covariance matrix $Σ$, as compared to the concentration matrix $Ω=Σ^{-1}$. The parameter space of interest for covariance graph models is the cone $P_G$ of positive definite matrices with fixed zeros corresponding to the missing edges of $G$. As in Letac and Massam [Ann. Statist. 35 (2007) 1278--1323], we consider the case where $G$ is decomposable. In this paper, we construct on the cone $P_G$ a family of Wishart distributions which serve a similar purpose in the covariance graph setting as those constructed by Letac and Massam [Ann. Statist. 35 (2007) 1278--1323] and Dawid and Lauritzen [Ann. Statist. 21 (1993) 1272--1317] do in the concentration graph setting. We proceed to undertake a rigorous study of these "covariance" Wishart distributions and derive several deep and useful properties of this class.

Kshitij Khare

What is connected

Connect this record

See the researcher in context

Building this map preview

18 published item(s)

Optimality of Sub-network Laplace Approximations: New Results and Methods

A generalized likelihood based Bayesian approach for scalable joint regression and covariance selection in high dimensions

Geometric ergodicity of Gibbs samplers for the Horseshoe and its regularized variants

B-CONCORD -- A scalable Bayesian high-dimensional precision matrix estimation procedure

A Hybrid Scan Gibbs Sampler for Bayesian Models with Latent Variables

A convex framework for high-dimensional sparse Cholesky based covariance estimation

Convergence Analysis of the Data Augmentation Algorithm for Bayesian Linear Regression with Non-Gaussian Errors

Generalized Pseudolikelihood Methods for Inverse Covariance Estimation

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Convergence of cyclic coordinatewise l1 minimization

Necessary and Sufficient Conditions for High-Dimensional Posterior Consistency under $g$-Priors

On the Bayesness, minimaxity, and admissibility of point estimators of allelic frequencies

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection

Convergence analysis of some multivariate Markov chains using stochastic monotonicity

A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants

Sparse Matrix Decompositions and Graph Characterizations

Wishart distributions for decomposable covariance graph models