Source author record

Bala Rajaratnam

Bala Rajaratnam appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Statistics Theory Machine Learning Methodology math.CA math.CO Computation math.FA math.PR Applications math.GR math.OC math.RA math.SP stat.OT

Catalog footprint

What is connected

42works

15topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A unified framework for correlation mining in ultra-high dimension

Many applications benefit from theory relevant to the identification of variables having large correlations or partial correlations in high dimension. Recently there has been progress in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p$ tends to infinity. Despite these advances, the correlation screening framework suffers from practical, methodological and theoretical deficiencies. For instance, previous correlation screening theory requires that the population covariance matrix be sparse and block diagonal. This block sparsity assumption is however restrictive in practical applications. As a second example, correlation and partial correlation screening requires the estimation of dependence measures, which can be computationally prohibitive. In this paper, we propose a unifying approach to correlation and partial correlation mining that is not restricted to block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p$ tends to infinity. The unifying framework also demonstrates a duality between correlation and partial correlation screening with theoretical and practical consequences.

preprint2020arXiv

Probability inequalities and tail estimates for metric semigroups

We study probability inequalities leading to tail estimates in a general semigroup $\mathscr{G}$ with a translation-invariant metric $d_{\mathscr{G}}$. (An important and central example of this in the functional analysis literature is that of $\mathscr{G}$ a Banach space.) Using our prior work [Ann. Prob. 2017] that extends the Hoffmann-Jorgensen inequality to all metric semigroups, we obtain tail estimates and approximate bounds for sums of independent semigroup-valued random variables, their moments, and decreasing rearrangements. In particular, we obtain the "correct" universal constants in several cases, extending results in the Banach space literature by Johnson-Schechtman-Zinn [Ann. Prob. 1985], Hitczenko [Ann. Prob. 1994], and Hitczenko and Montgomery-Smith [Ann. Prob. 2001]. Our results also hold more generally, in a very primitive mathematical framework required to state them: metric semigroups $\mathscr{G}$. This includes all compact, discrete, or (connected) abelian Lie groups.

preprint2016arXiv

A convex framework for high-dimensional sparse Cholesky based covariance estimation

Covariance estimation for high-dimensional datasets is a fundamental problem in modern day statistics with numerous applications. In these high dimensional datasets, the number of variables p is typically larger than the sample size n. A popular way of tackling this challenge is to induce sparsity in the covariance matrix, its inverse or a relevant transformation. In particular, methods inducing sparsity in the Cholesky pa- rameter of the inverse covariance matrix can be useful as they are guaranteed to give a positive definite estimate of the covariance matrix. Also, the estimated sparsity pattern corresponds to a Directed Acyclic Graph (DAG) model for Gaussian data. In recent years, two useful penalized likelihood methods for sparse estimation of this Cholesky parameter (with no restrictions on the sparsity pattern) have been developed. How- ever, these methods either consider a non-convex optimization problem which can lead to convergence issues and singular estimates of the covariance matrix when p > n, or achieve a convex formulation by placing a strict constraint on the conditional variance parameters. In this paper, we propose a new penalized likelihood method for sparse estimation of the inverse covariance Cholesky parameter that aims to overcome some of the shortcomings of current methods, but retains their respective strengths. We ob- tain a jointly convex formulation for our objective function, which leads to convergence guarantees, even when p > n. The approach always leads to a positive definite and symmetric estimator of the covariance matrix. We establish high-dimensional estima- tion and graph selection consistency, and also demonstrate finite sample performance on simulated/real data.

preprint2016arXiv

Generalized Pseudolikelihood Methods for Inverse Covariance Estimation

We introduce PseudoNet, a new pseudolikelihood-based estimator of the inverse covariance matrix, that has a number of useful statistical and computational properties. We show, through detailed experiments with synthetic and also real-world finance as well as wind power data, that PseudoNet outperforms related methods in terms of estimation error and support recovery, making it well-suited for use in a downstream application, where obtaining low estimation error can be important. We also show, under regularity conditions, that PseudoNet is consistent. Our proof assumes the existence of accurate estimates of the diagonal entries of the underlying inverse covariance matrix; we additionally provide a two-step method to obtain these estimates, even in a high-dimensional setting, going beyond the proofs for related methods. Unlike other pseudolikelihood-based methods, we also show that PseudoNet does not saturate, i.e., in high dimensions, there is no hard limit on the number of nonzero entries in the PseudoNet estimate. We present a fast algorithm as well as screening rules that make computing the PseudoNet estimate over a range of tuning parameters tractable.

preprint2016arXiv

Model-free consistency of graph partitioning

In this paper, we exploit the theory of dense graph limits to provide a new framework to study the stability of graph partitioning methods, which we call structural consistency. Both stability under perturbation as well as asymptotic consistency (i.e., convergence with probability $1$ as the sample size goes to infinity under a fixed probability model) follow from our notion of structural consistency. By formulating structural consistency as a continuity result on the graphon space, we obtain robust results that are completely independent of the data generating mechanism. In particular, our results apply in settings where observations are not independent, thereby significantly generalizing the common probabilistic approach where data are assumed to be i.i.d. In order to make precise the notion of structural consistency of graph partitioning, we begin by extending the theory of graph limits to include vertex colored graphons. We then define continuous node-level statistics and prove that graph partitioning based on such statistics is consistent. Finally, we derive the structural consistency of commonly used clustering algorithms in a general model-free setting. These include clustering based on local graph statistics such as homomorphism densities, as well as the popular spectral clustering using the normalized Laplacian. We posit that proving the continuity of clustering algorithms in the graph limit topology can stand on its own as a more robust form of model-free consistency. We also believe that the mathematical framework developed in this paper goes beyond the study of clustering algorithms, and will guide the development of similar model-free frameworks to analyze other procedures in the broader mathematical sciences.

preprint2016arXiv

Preserving positivity for matrices with sparsity constraints

Functions preserving Loewner positivity when applied entrywise to positive semidefinite matrices have been widely studied in the literature. Following the work of Schoenberg [Duke Math. J. 9], Rudin [Duke Math. J. 26], and others, it is well-known that functions preserving positivity for matrices of all dimensions are absolutely monotonic (i.e., analytic with nonnegative Taylor coefficients). In this paper, we study functions preserving positivity when applied entrywise to sparse matrices, with zeros encoded by a graph $G$ or a family of graphs $G_n$. Our results generalize Schoenberg and Rudin's results to a modern setting, where functions are often applied entrywise to sparse matrices in order to improve their properties (e.g. better conditioning). The only such result known in the literature is for the complete graph $K_2$. We provide the first such characterization result for a large family of non-complete graphs. Specifically, we characterize functions preserving Loewner positivity on matrices with zeros according to a tree. These functions are multiplicatively midpoint-convex and super-additive. Leveraging the underlying sparsity in matrices thus admits the use of functions which are not necessarily analytic nor absolutely monotonic. We further show that analytic functions preserving positivity on matrices with zeros according to trees can contain arbitrarily long sequences of negative coefficients, thus obviating the need for absolute monotonicity in a very strong sense. This result leads to the question of exactly when absolute monotonicity is necessary when preserving positivity for an arbitrary class of graphs. We then provide a stronger condition in terms of the numerical range of all symmetric matrices, such that functions satisfying this condition on matrices with zeros according to any family of graphs with unbounded degrees are necessarily absolutely monotonic.

preprint2016arXiv

Towards a sparse, scalable, and stably positive definite (inverse) covariance estimator

High dimensional covariance estimation and graphical models is a contemporary topic in statistics and machine learning having widespread applications. An important line of research in this regard is to shrink the extreme spectrum of the covariance matrix estimators. A separate line of research in the literature has considered sparse inverse covariance estimation which in turn gives rise to graphical models. In practice, however, a sparse covariance or inverse covariance matrix which is simultaneously well-conditioned and at the same time computationally tractable is desired. There has been little research at the confluence of these three topics. In this paper we consider imposing a condition number constraint to various types of losses used in covariance and inverse covariance matrix estimation. When the loss function can be decomposed as a sum of an orthogonally invariant function of the estimate and its inner product with a function of the sample covariance matrix, we show that a solution path algorithm can be derived, involving a series of ordinary differential equations. The path algorithm is attractive because it provides the entire family of estimates for all possible values of the condition number bound, at the same computational cost of a single estimate with a fixed upper bound. An important finding is that the proximal operator for the condition number constraint, which turns out to be very useful in regularizing loss functions that are not orthogonally invariant and may yield non-positive-definite estimates, can be efficiently computed by this path algorithm. As a concrete illustration of its practical importance, we develop an operator-splitting algorithm that imposes a guarantee of well-conditioning as well as positive definiteness to recently proposed convex pseudo-likelihood based graphical model selection methods.

preprint2016arXiv

Two-stage Sampling, Prediction and Adaptive Regression via Correlation Screening (SPARCS)

This paper proposes a general adaptive procedure for budget-limited predictor design in high dimensions called two-stage Sampling, Prediction and Adaptive Regression via Correlation Screening (SPARCS). SPARCS can be applied to high dimensional prediction problems in experimental science, medicine, finance, and engineering, as illustrated by the following. Suppose one wishes to run a sequence of experiments to learn a sparse multivariate predictor of a dependent variable $Y$ (disease prognosis for instance) based on a $p$ dimensional set of independent variables $\mathbf X=[X_1,\ldots, X_p]^T$ (assayed biomarkers). Assume that the cost of acquiring the full set of variables $\mathbf X$ increases linearly in its dimension. SPARCS breaks the data collection into two stages in order to achieve an optimal tradeoff between sampling cost and predictor performance. In the first stage we collect a few ($n$) expensive samples $\{y_i,\mathbf x_i\}_{i=1}^n$, at the full dimension $p\gg n$ of $\mathbf X$, winnowing the number of variables down to a smaller dimension $l < p$ using a type of cross-correlation or regression coefficient screening. In the second stage we collect a larger number $(t-n)$ of cheaper samples of the $l$ variables that passed the screening of the first stage. At the second stage, a low dimensional predictor is constructed by solving the standard regression problem using all $t$ samples of the selected variables. SPARCS is an adaptive online algorithm that implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for the Familywise Error Rate (FWER), specify high dimensional convergence rates for support recovery, and establish optimal sample allocation rules to the first and second stages.

preprint2015arXiv

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Bayesian inference for graphical models has received much attention in the literature in recent years. It is well known that when the graph G is decomposable, Bayesian inference is significantly more tractable than in the general non-decomposable setting. Penalized likelihood inference on the other hand has made tremendous gains in the past few years in terms of scalability and tractability. Bayesian inference, however, has not had the same level of success, though a scalable Bayesian approach has its respective strengths, especially in terms of quantifying uncertainty. To address this gap, we propose a scalable and flexible novel Bayesian approach for estimation and model selection in Gaussian undirected graphical models. We first develop a class of generalized G-Wishart distributions with multiple shape parameters for an arbitrary underlying graph. This class contains the G-Wishart distribution as a special case. We then introduce the class of Generalized Bartlett (GB) graphs, and derive an efficient Gibbs sampling algorithm to obtain posterior draws from generalized G-Wishart distributions corresponding to a GB graph. The class of Generalized Bartlett graphs contains the class of decomposable graphs as a special case, but is substantially larger than the class of decomposable graphs. We proceed to derive theoretical properties of the proposed Gibbs sampler. We then demonstrate that the proposed Gibbs sampler is scalable to significantly higher dimensional problems as compared to using an accept-reject or a Metropolis-Hasting algorithm. Finally, we show the efficacy of the proposed approach on simulated and real data.

preprint2015arXiv

Convergence of cyclic coordinatewise l1 minimization

We consider the general problem of minimizing an objective function which is the sum of a convex function (not strictly convex) and absolute values of a subset of variables (or equivalently the l1-norm of the variables). This problem appears exten- sively in modern statistical applications associated with high-dimensional data or "big data", and corresponds to optimizing l1-regularized likelihoods in the context of model selection. In such applications, cyclic coordinatewise minimization (CCM), where the objective function is sequentially minimized with respect to each individual coordi- nate, is often employed as it offers a computationally cheap and effective optimization method. Consequently, it is crucial to obtain theoretical guarantees of convergence for the sequence of iterates produced by the cyclic coordinatewise minimization in this setting. Moreover, as the objective corresponds to at l1-regularized likelihoods of many variables, it is important to obtain convergence of the iterates themselves, and not just the function values. Previous results in the literature only establish either, (i) that every limit point of the sequence of iterates is a stationary point of the objective function, or (ii) establish convergence under special assumptions, or (iii) establish con- vergence for a different minimization approach (which uses quadratic approximation based gradient descent followed by an inexact line search), (iv) establish convergence of only the function values of the sequence of iterates produced by random coordinatewise minimization (a variant of CCM). In this paper, a rigorous general proof of convergence for the cyclic coordinatewise minimization algorithm is provided. We demonstrate the usefulness of our general results in contemporary applications.

preprint2015arXiv

Critical exponents of graphs

The study of entrywise powers of matrices was originated by Loewner in the pursuit of the Bieberbach conjecture. Since the work of FitzGerald and Horn (1977), it is known that $A^{\circ α} := (a_{ij}^α)$ is positive semidefinite for every entrywise nonnegative $n \times n$ positive semidefinite matrix $A = (a_{ij})$ if and only if $α$ is a positive integer or $α\geq n-2$. This surprising result naturally extends the Schur product theorem, and demonstrates the existence of a sharp phase transition in preserving positivity. In this paper, we study when entrywise powers preserve positivity for matrices with structure of zeros encoded by graphs. To each graph is associated an invariant called its "critical exponent", beyond which every power preserves positivity. In our main result, we determine the critical exponents of all chordal/decomposable graphs, and relate them to the geometry of the underlying graphs. We then examine the critical exponent of important families of non-chordal graphs such as cycles and bipartite graphs. Surprisingly, large families of dense graphs have small critical exponents that do not depend on the number of vertices of the graphs.

preprint2015arXiv

Differential Calculus on Graphon Space

Recently, the theory of dense graph limits has received attention from multiple disciplines including graph theory, computer science, statistical physics, probability, statistics, and group theory. In this paper we initiate the study of the general structure of differentiable graphon parameters $F$. We derive consistency conditions among the higher Gâteaux derivatives of $F$ when restricted to the subspace of edge weighted graphs $\mathcal{W}_{\bf p}$. Surprisingly, these constraints are rigid enough to imply that the multilinear functionals $Λ: \mathcal{W}_{\bf p}^n \to \mathbb{R}$ satisfying the constraints are determined by a finite set of constants indexed by isomorphism classes of multigraphs with $n$ edges and no isolated vertices. Using this structure theory, we explain the central role that homomorphism densities play in the analysis of graphons, by way of a new combinatorial interpretation of their derivatives. In particular, homomorphism densities serve as the monomials in a polynomial algebra that can be used to approximate differential graphon parameters as Taylor polynomials. These ideas are summarized by our main theorem, which asserts that homomorphism densities $t(H,-)$ where $H$ has at most $N$ edges form a basis for the space of smooth graphon parameters whose $(N+1)$st derivatives vanish. As a consequence of this theory, we also extend and derive new proofs of linear independence of multigraph homomorphism densities, and characterize homomorphism densities. In addition, we develop a theory of series expansions, including Taylor's theorem for graph parameters and a uniqueness principle for series. We use this theory to analyze questions raised by Lovász, including studying infinite quantum algebras and the connection between right- and left-homomorphism densities.

preprint2015arXiv

Extracting Common Time Trends from Concurrent Time Series: Maximum Autocorrelation Factors with Application to Tree Ring Time Series Data

Concurrent time series commonly arise in various applications, including when monitoring the environment such as in air quality measurement networks, weather stations, oceanographic buoys, or in paleo form such as lake sediments, tree rings, ice cores, or coral isotopes, with each monitoring or sampling site providing one of the time series. The goal in such applications is to extract a common time trend or signal in the observed data. Other examples where the goal is to extract a common time trend for multiple time series are in stock price time series, neurological time series, and quality control time series. For this purpose we develop properties of MAF [Maximum Autocorrelation Factors] that linearly combines time series in order to maximize the resulting SNR [signal-to-noise-ratio] where there are multiple smooth signals present in the data. Equivalence is established in a regression setting between MAF and CCA [Canonical Correlation Analysis] even though MAF does not require specific signal knowledge as opposed to CCA. We proceed to derive the theoretical properties of MAF and quantify the SNR advantages of MAF in comparison with PCA [Principal Components Analysis], a commonly used method for linearly combining time series, and compare their statistical sample properties. MAF and PCA are then applied to real and simulated data sets to illustrate MAFs efficacy.

preprint2015arXiv

Foundational principles for large scale inference: Illustrations through correlation mining

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number $n$ of acquired samples (statistical replicates) is far fewer than the number $p$ of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size $n$ is fixed, and the dimension $p$ grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

preprint2015arXiv

Graphical Markov models for infinitely many variables

Representing the conditional independences present in a multivariate random vector via graphs has found widespread use in applications, and such representations are popularly known as graphical models or Markov random fields. These models have many useful properties, but their fundamental attractive feature is their ability to reflect conditional independences between blocks of variables through graph separation, a consequence of the equivalence of the pairwise, local and global Markov properties demonstrated by Pearl and Paz (1985). Modern day applications often necessitate working with either an infinite collection of variables (such as in a spatial-temporal field) or approximating a large high-dimensional finite stochastic system with an infinite-dimensional system. However, it is unclear whether the conditional independences present in an infinite-dimensional random vector or stochastic process can still be represented by separation criteria in an infinite graph. In light of the advantages of using graphs as tools to represent stochastic relationships, we undertake in this paper a general study of infinite graphical models. First, we demonstrate that naive extensions of the assumptions required for the finite case results do not yield equivalence of the Markov properties in the infinite-dimensional setting, thus calling for a more in-depth analysis. To this end, we proceed to derive general conditions which do allow representing the conditional independence in an infinite-dimensional random system by means of graphs, and our results render the result of Pearl and Paz as a special case of a more general phenomenon. We conclude by demonstrating the applicability of our theory through concrete examples of infinite-dimensional graphical models.

preprint2015arXiv

High dimensional Bayesian inference for Gaussian directed acyclic graph models

In this paper, we consider Gaussian models Markov with respect to an arbitrary DAG. We first construct a family of conjugate priors for the Cholesky parametrization of the covariance matrix of such models. This family has as many shape parameters as the DAG has vertices, and naturally extends the work of Geiger and Heckerman [8]. From these distributions, we derive prior distributions for the covariance and precision parameters of the Gaussian DAG Markov models. Our works thus extends the work of Dawid and Lauritzen [5] and Letac and Massam [16] for Gaussian models Markov with respect to a decomposable graph to arbitrary DAGs. For this reason, we call our distributions DAG-Wishart distributions. An advantage of these distributions is that they possess strong hyper Markov properties and thus allow for explicit estimation of the covariance and precision parameters, regardless of the dimension of the problem. They also allow us to develop methodology for model selection and covariance estimation in the space of DAG-Markov models. We demonstrate via several numerical examples that the proposed method scales well to high-dimensions.

preprint2015arXiv

Integration and measures on the space of countable labelled graphs

In this paper we develop a rigorous foundation for the study of integration and measures on the space $\mathscr{G}(V)$ of all graphs defined on a countable labelled vertex set $V$. We first study several interrelated $σ$-algebras and a large family of probability measures on graph space. We then focus on a "dyadic" Hamming distance function $\left\| \cdot \right\|_{ψ,2}$, which was very useful in the study of differentiation on $\mathscr{G}(V)$. The function $\left\| \cdot \right\|_{ψ,2}$ is shown to be a Haar measure-preserving bijection from the subset of infinite graphs to the circle (with the Haar/Lebesgue measure), thereby naturally identifying the two spaces. As a consequence, we establish a "change of variables" formula that enables the transfer of the Riemann-Lebesgue theory on $\mathbb{R}$ to graph space $\mathscr{G}(V)$. This also complements previous work in which a theory of Newton-Leibnitz differentiation was transferred from the real line to $\mathscr{G}(V)$ for countable $V$. Finally, we identify the Pontryagin dual of $\mathscr{G}(V)$, and characterize the positive definite functions on $\mathscr{G}(V)$.

preprint2015arXiv

Lasso Regression: Estimation and Shrinkage via Limit of Gibbs Sampling

The application of the lasso is espoused in high-dimensional settings where only a small number of the regression coefficients are believed to be nonzero. Moreover, statistical properties of high-dimensional lasso estimators are often proved under the assumption that the correlation between the predictors is bounded. In this vein, coordinatewise methods, the most common means of computing the lasso solution, work well in the presence of low to moderate multicollinearity. The computational speed of coordinatewise algorithms degrades however as sparsity decreases and multicollinearity increases. Motivated by these limitations, we propose the novel "Deterministic Bayesian Lasso" algorithm for computing the lasso solution. This algorithm is developed by considering a limiting version of the Bayesian lasso. The performance of the Deterministic Bayesian Lasso improves as sparsity decreases and multicollinearity increases, and can offer substantial increases in computational speed. A rigorous theoretical analysis demonstrates that (1) the Deterministic Bayesian Lasso algorithm converges to the lasso solution, and (2) it leads to a representation of the lasso estimator which shows how it achieves both $\ell_1$ and $\ell_2$ types of shrinkage simultaneously. Connections to other algorithms are also provided. The benefits of the Deterministic Bayesian Lasso algorithm are then illustrated on simulated and real data.

preprint2015arXiv

MCMC-Based Inference in the Era of Big Data: A Fundamental Analysis of the Convergence Complexity of High-Dimensional Chains

Markov chain Monte Carlo (MCMC) lies at the core of modern Bayesian methodology, much of which would be impossible without it. Thus, the convergence properties of MCMCs have received significant attention, and in particular, proving (geometric) ergodicity is of critical interest. Trust in the ability of MCMCs to sample from modern-day high-dimensional posteriors, however, has been limited by a widespread perception that these chains typically experience serious convergence problems. In this paper, we first demonstrate that contemporary methods for obtaining convergence rates have serious limitations when the dimension grows. We then propose a framework for rigorously establishing the convergence behavior of commonly used high-dimensional MCMCs. In particular, we demonstrate theoretically the precise nature and severity of the convergence problems of popular MCMCs when implemented in high dimensions, including phase transitions in the convergence rates in various $n$ and $p$ regimes, and a universality result across an entire spectrum of models. We also show that convergence problems effectively eliminate the apparent safeguard of geometric ergodicity. We then demonstrate theoretical principles by which MCMCs can be constructed and analyzed to yield bounded geometric convergence rates even as the dimension $p$ grows without bound. Additionally, we propose a diagnostic tool for establishing convergence.

preprint2015arXiv

Statistical paleoclimate reconstructions via Markov random fields

Understanding centennial scale climate variability requires data sets that are accurate, long, continuous and of broad spatial coverage. Since instrumental measurements are generally only available after 1850, temperature fields must be reconstructed using paleoclimate archives, known as proxies. Various climate field reconstructions (CFR) methods have been proposed to relate past temperature to such proxy networks. In this work, we propose a new CFR method, called GraphEM, based on Gaussian Markov random fields embedded within an EM algorithm. Gaussian Markov random fields provide a natural and flexible framework for modeling high-dimensional spatial fields. At the same time, they provide the parameter reduction necessary for obtaining precise and well-conditioned estimates of the covariance structure, even in the sample-starved setting common in paleoclimate applications. In this paper, we propose and compare the performance of different methods to estimate the graphical structure of climate fields, and demonstrate how the GraphEM algorithm can be used to reconstruct past climate variations. The performance of GraphEM is compared to the widely used CFR method RegEM with regularization via truncated total least squares, using synthetic data. Our results show that GraphEM can yield significant improvements, with uniform gains over space, and far better risk properties. We demonstrate that the spatial structure of temperature fields can be well estimated by graphs where each neighbor is only connected to a few geographically close neighbors, and that the increase in performance is directly related to recovering the underlying sparsity in the covariance of the spatial field. Our work demonstrates how significant improvements can be made in climate reconstruction methods by better modeling the covariance structure of the climate field.

preprint2014arXiv

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

Sparse high dimensional graphical model selection is a topic of much interest in modern day statistics. A popular approach is to apply l1-penalties to either (1) parametric likelihoods, or, (2) regularized regression/pseudo-likelihoods, with the latter having the distinct advantage that they do not explicitly assume Gaussianity. As none of the popular methods proposed for solving pseudo-likelihood based objective functions have provable convergence guarantees, it is not clear if corresponding estimators exist or are even computable, or if they actually yield correct partial correlation graphs. This paper proposes a new pseudo-likelihood based graphical model selection method that aims to overcome some of the shortcomings of current methods, but at the same time retain all their respective strengths. In particular, we introduce a novel framework that leads to a convex formulation of the partial covariance regression graph problem, resulting in an objective function comprised of quadratic forms. The objective is then optimized via a coordinate-wise approach. The specific functional form of the objective function facilitates rigorous convergence analysis leading to convergence guarantees; an important property that cannot be established using standard results, when the dimension is larger than the sample size, as is often the case in high dimensional applications. These convergence guarantees ensure that estimators are well-defined under very general conditions, and are always computable. In addition, the approach yields estimators that have good large sample properties and also respect symmetry. Furthermore, application to simulated/real data, timing comparisons and numerical convergence is demonstrated. We also present a novel unifying framework that places all graphical pseudo-likelihood methods as special cases of a more general formulation, leading to important insights.

preprint2014arXiv

G-AMA: Sparse Gaussian graphical model estimation via alternating minimization

Several methods have been recently proposed for estimating sparse Gaussian graphical models using $\ell_{1}$ regularization on the inverse covariance matrix. Despite recent advances, contemporary applications require methods that are even faster in order to handle ill-conditioned high dimensional modern day datasets. In this paper, we propose a new method, G-AMA, to solve the sparse inverse covariance estimation problem using Alternating Minimization Algorithm (AMA), that effectively works as a proximal gradient algorithm on the dual problem. Our approach has several novel advantages over existing methods. First, we demonstrate that G-AMA is faster than the previous best algorithms by many orders of magnitude and is thus an ideal approach for modern high throughput applications. Second, global linear convergence of G-AMA is demonstrated rigorously, underscoring its good theoretical properties. Third, the dual algorithm operates on the covariance matrix, and thus easily facilitates incorporating additional constraints on pairwise/marginal relationships between feature pairs based on domain specific knowledge. Over and above estimating a sparse inverse covariance matrix, we also illustrate how to (1) incorporate constraints on the (bivariate) correlations and, (2) incorporate equality (equisparsity) or linear constraints between individual inverse covariance elements. Fourth, we also show that G-AMA is better adept at handling extremely ill-conditioned problems, as is often the case with real data. The methodology is demonstrated on both simulated and real datasets to illustrate its superior performance over recently proposed methods.

preprint2014arXiv

Loewner positive entrywise functions, and classification of measurable solutions of Cauchy's functional equations

Entrywise functions preserving Loewner positivity have been studied by many authors, most notably Schoenberg and Rudin. Following their work, it is known that functions preserving positivity when applied entrywise to positive semidefinite matrices of all dimensions are necessarily analytic with nonnegative Taylor coefficients. When the dimension is fixed, it has been shown by Vasudeva and Horn that such functions are automatically continuous and sufficiently differentiable. A natural refinement of the aforementioned problem consists of characterizing functions preserving positivity under rank constraints. In this paper, we begin this study by characterizing entrywise functions which preserve the cone of positive semidefinite real matrices of rank $1$ with entries in a general interval. Classifying such functions is intimately connected to the classical problem of solving Cauchy's functional equations, which have non-measurable solutions. We demonstrate that under mild local measurability assumptions, such functions are automatically smooth and can be completely characterized. We then extend our results by classifying functions preserving positivity on rank $1$ Hermitian complex matrices.

preprint2014arXiv

On fractional Hadamard powers of positive block matrices

Entrywise powers of matrices have been well-studied in the literature, and have recently received renewed attention in the regularization of high-dimensional correlation matrices. In this paper, we study powers of positive semidefinite block matrices $(H_{st})_{s,t=1}^n$ with complex entries. We first characterize the powers $α\in\mathbb{R}$ such that the blockwise power map $(H_{st}) \mapsto (H_{st}^α)$ preserves Loewner positivity. The characterization is obtained by exploiting connections with the theory of matrix monotone functions developed by Loewner. Second, we revisit previous work by Choudhury [Proc. AMS 108] who had provided a lower bound on $α$ for preserving positivity when the blocks $H_{st}$ pairwise commute. We completely settle this problem by characterizing the full set of powers preserving positivity in this setting. Our characterizations generalize previous work by FitzGerald-Horn, Bhatia-Elsner, and Hiai from scalars to arbitrary block size, and in particular, generalize the Schur Product Theorem. Finally, a natural and unifying framework for studying the case of diagonalizable blocks consists of replacing real powers by general characters of the complex plane. We thus classify such characters, and generalize our results to this more general setting. In the course of our work, given $β\in\mathbb{Z}$, we provide lower and upper bounds for the threshold power $α>0$ above which the complex characters $re^{iθ}\mapsto r^αe^{iβθ}$ preserve positivity when applied entrywise to positive semidefinite matrices. In particular, we completely resolve the $n=3$ case of a question raised in 2001 by Xingzhi Zhan. As an application, we extend previous work by de Pillis [Duke Math. J. 36] by classifying the characters $K$ of the complex plane for which the map $(H_{st})_{s,t=1}^n \mapsto (K({\rm tr}(H_{st})))_{s,t=1}^n$ preserves positivity.

preprint2014arXiv

Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection

Sparse high dimensional graphical model selection is a popular topic in contemporary machine learning. To this end, various useful approaches have been proposed in the context of $\ell_1$-penalized estimation in the Gaussian framework. Though many of these inverse covariance estimation approaches are demonstrably scalable and have leveraged recent advances in convex optimization, they still depend on the Gaussian functional form. To address this gap, a convex pseudo-likelihood based partial correlation graph estimation method (CONCORD) has been recently proposed. This method uses coordinate-wise minimization of a regression based pseudo-likelihood, and has been shown to have robust model selection properties in comparison with the Gaussian approach. In direct contrast to the parallel work in the Gaussian setting however, this new convex pseudo-likelihood framework has not leveraged the extensive array of methods that have been proposed in the machine learning literature for convex optimization. In this paper, we address this crucial gap by proposing two proximal gradient methods (CONCORD-ISTA and CONCORD-FISTA) for performing $\ell_1$-regularized inverse covariance matrix estimation in the pseudo-likelihood framework. We present timing comparisons with coordinate-wise minimization and demonstrate that our approach yields tremendous payoffs for $\ell_1$-penalized partial correlation graph estimation outside the Gaussian setting, thus yielding the fastest and most scalable approach for such problems. We undertake a theoretical analysis of our approach and rigorously demonstrate convergence, and also derive rates thereof.

preprint2014arXiv

The Letac-Massam conjecture and existence of high dimensional Bayes estimators for Graphical Models

In recent years, a variety of useful extensions of the Wishart have been proposed in the literature for the purposes of studying Markov random fields/graphical models. In particular, generalizations of the Wishart, referred to as Type I and Type II Wishart distributions, have been introduced by Letac and Massam (\emph{Annals of Statistics} 2006) and play important roles in both frequentist and Bayesian inference for Gaussian graphical models. These distributions have been especially useful in high-dimensional settings due to the flexibility offered by their multiple shape parameters. The domain of In this paper we resolve a long-standing conjecture of Letac and Massam (LM) concerning the domains of the multi-parameters of graphical Wishart type distributions. This conjecture, posed in \emph{Annals of Statistics}, also relates fundamentally to the existence of Bayes estimators corresponding to these high dimensional priors. To achieve our goal, we first develop novel theory in the context of probabilistic analysis of graphical models. Using these tools, and a recently introduced class of Wishart distributions for directed acyclic graph (DAG) models, we proceed to give counterexamples to the LM conjecture, thus completely resolving the problem. Our analysis also proceeds to give useful insights on graphical Wishart distributions with implications for Bayesian inference for such models.

preprint2013arXiv

A Methodology for Robust Multiproxy Paleoclimate Reconstructions and Modeling of Temperature Conditional Quantiles

Great strides have been made in the field of reconstructing past temperatures based on models relating temperature to temperature-sensitive paleoclimate proxies. One of the goals of such reconstructions is to assess if current climate is anomalous in a millennial context. These regression based approaches model the conditional mean of the temperature distribution as a function of paleoclimate proxies (or vice versa). Some of the recent focus in the area has considered methods which help reduce the uncertainty inherent in such statistical paleoclimate reconstructions, with the ultimate goal of improving the confidence that can be attached to such endeavors. A second important scientific focus in the subject area is the area of forward models for proxies, the goal of which is to understand the way paleoclimate proxies are driven by temperature and other environmental variables. In this paper we introduce novel statistical methodology for (1) quantile regression with autoregressive residual structure, (2) estimation of corresponding model parameters, (3) development of a rigorous framework for specifying uncertainty estimates of quantities of interest, yielding (4) statistical byproducts that address the two scientific foci discussed above. Our statistical methodology demonstrably produces a more robust reconstruction than is possible by using conditional-mean-fitting methods. Our reconstruction shares some of the common features of past reconstructions, but also gains useful insights. More importantly, we are able to demonstrate a significantly smaller uncertainty than that from previous regression methods. In addition, the quantile regression component allows us to model, in a more complete and flexible way than least squares, the conditional distribution of temperature given proxies. This relationship can be used to inform forward models relating how proxies are driven by temperature.

preprint2013arXiv

Duality in Graphical Models

Graphical models have proven to be powerful tools for representing high-dimensional systems of random variables. One example of such a model is the undirected graph, in which lack of an edge represents conditional independence between two random variables given the rest. Another example is the bidirected graph, in which absence of edges encodes pairwise marginal independence. Both of these classes of graphical models have been extensively studied, and while they are considered to be dual to one another, except in a few instances this duality has not been thoroughly investigated. In this paper, we demonstrate how duality between undirected and bidirected models can be used to transport results for one class of graphical models to the dual model in a transparent manner. We proceed to apply this technique to extend previously existing results as well as to prove new ones, in three important domains. First, we discuss the pairwise and global Markov properties for undirected and bidirected models, using the pseudographoid and reverse-pseudographoid rules which are weaker conditions than the typically used intersection and composition rules. Second, we investigate these pseudographoid and reverse pseudographoid rules in the context of probability distributions, using the concept of duality in the process. Duality allows us to quickly relate them to the more familiar intersection and composition properties. Third and finally, we apply the dualization method to understand the implications of faithfulness, which in turn leads to a more general form of an existing result.

preprint2013arXiv

Functions preserving positive definiteness for sparse matrices

We consider the problem of characterizing entrywise functions that preserve the cone of positive definite matrices when applied to every off-diagonal element. Our results extend theorems of Schoenberg [Duke Math. J. 9], Rudin [Duke Math. J. 26], Christensen and Ressel [Trans. Amer. Math. Soc., 243], and others, where similar problems were studied when the function is applied to all elements, including the diagonal ones. It is shown that functions that are guaranteed to preserve positive definiteness cannot at the same time induce sparsity, i.e., set elements to zero. These results have important implications for the regularization of positive definite matrices, where functions are often applied to only the off-diagonal elements to obtain sparse matrices with better properties (e.g., Markov random field/graphical model structure, better condition number). As a particular case, it is shown that \emph{soft-thresholding}, a commonly used operation in modern high-dimensional probability and statistics, is not guaranteed to maintain positive definiteness, even if the original matrix is sparse. This result has a deep connection to graphs, and in particular, to the class of trees. We then proceed to fully characterize functions which do preserve positive definiteness. This characterization is in terms of absolutely monotonic functions and turns out to be quite different from the case when the function is also applied to diagonal elements. We conclude by giving bounds on the condition number of a matrix which guarantee that the regularized matrix is positive definite.

preprint2013arXiv

Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension

We introduce a new approach to variable selection, called Predictive Correlation Screening, for predictor design. Predictive Correlation Screening (PCS) implements false positive control on the selected variables, is well suited to small sample sizes, and is scalable to high dimensions. We establish asymptotic bounds for Familywise Error Rate (FWER), and resultant mean square error of a linear predictor on the selected variables. We apply Predictive Correlation Screening to the following two-stage predictor design problem. An experimenter wants to learn a multivariate predictor of gene expressions based on successive biological samples assayed on mRNA arrays. She assays the whole genome on a few samples and from these assays she selects a small number of variables using Predictive Correlation Screening. To reduce assay cost, she subsequently assays only the selected variables on the remaining samples, to learn the predictor coefficients. We show superiority of Predictive Correlation Screening relative to LASSO and correlation learning (sometimes popularly referred to in the literature as marginal regression or simple thresholding) in terms of performance and computational complexity.

preprint2013arXiv

Successive normalization of rectangular arrays

Standard statistical techniques often require transforming data to have mean $0$ and standard deviation $1$. Typically, this process of "standardization" or "normalization" is applied across subjects when each subject produces a single number. High throughput genomic and financial data often come as rectangular arrays where each coordinate in one direction concerns subjects who might have different status (case or control, say), and each coordinate in the other designates "outcome" for a specific feature, for example, "gene," "polymorphic site" or some aspect of financial profile. It may happen, when analyzing data that arrive as a rectangular array, that one requires BOTH the subjects and the features to be "on the same footing." Thus there may be a need to standardize across rows and columns of the rectangular matrix. There arises the question as to how to achieve this double normalization. We propose and investigate the convergence of what seems to us a natural approach to successive normalization which we learned from our colleague Bradley Efron. We also study the implementation of the method on simulated data and also on data that arose from scientific experimentation.

preprint2013arXiv

The critical exponent conjecture for powers of doubly nonnegative matrices

Doubly non-negative matrices arise naturally in many setting including Markov random fields (positively banded graphical models) and in the convergence analysis of Markov chains. In this short note, we settle a recent conjecture by C.R. Johnson et al. [Linear Algebra Appl. 435 (2011)] by proving that the critical exponent beyond which all continuous conventional powers of $n$-by-$n$ doubly nonnegative matrices are doubly nonnegative is exactly $n-2$. We show that the conjecture follows immediately by applying a general characterization from the literature. We prove a stronger form of the conjecture by classifying all powers preserving doubly nonnegative matrices, and proceed to generalize the conjecture for broad classes of functions. We also provide different approaches for settling the original conjecture.

preprint2012arXiv

A note on the lack of symmetry in the graphical lasso

The graphical lasso (glasso) is a widely-used fast algorithm for estimating sparse inverse covariance matrices. The glasso solves an L1 penalized maximum likelihood problem and is available as an R library on CRAN. The output from the glasso, a regularized covariance matrix estimate a sparse inverse covariance matrix estimate, not only identify a graphical model but can also serve as intermediate inputs into multivariate procedures such as PCA, LDA, MANOVA, and others. The glasso indeed produces a covariance matrix estimate which solves the L1 penalized optimization problem in a dual sense; however, the method for producing the inverse covariance matrix estimator after this optimization is inexact and may produce asymmetric estimates. This problem is exacerbated when the amount of L1 regularization that is applied is small, which in turn is more likely to occur if the true underlying inverse covariance matrix is not sparse. The lack of symmetry can potentially have consequences. First, it implies that the covariance and inverse covariance estimates are not numerical inverses of one another, and second, asymmetry can possibly lead to negative or complex eigenvalues,rendering many multivariate procedures which may depend on the inverse covariance estimator unusable. We demonstrate this problem, explain its causes, and propose possible remedies.

preprint2012arXiv

Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

The L1-regularized maximum likelihood estimation problem has recently become a topic of great interest within the machine learning, statistics, and optimization communities as a method for producing sparse inverse covariance estimators. In this paper, a proximal gradient method (G-ISTA) for performing L1-regularized covariance matrix estimation is presented. Although numerous algorithms have been proposed for solving this problem, this simple proximal gradient method is found to have attractive theoretical and numerical properties. G-ISTA has a linear rate of convergence, resulting in an O(log e) iteration complexity to reach a tolerance of e. This paper gives eigenvalue bounds for the G-ISTA iterates, providing a closed-form linear convergence rate. The rate is shown to be closely related to the condition number of the optimal point. Numerical convergence results and timing comparisons for the proposed method are presented. G-ISTA is shown to perform very well, especially when the optimal point is well-conditioned.

preprint2012arXiv

Positive definite completion problems for directed acyclic graphs

A positive definite completion problem pertains to determining whether the unspecified positions of a partial (or incomplete) matrix can be completed in a desired subclass of positive definite matrices. In this paper we study an important and new class of positive definite completion problems where the desired subclasses are the spaces of covariance and inverse-covariance matrices of probabilistic models corresponding to directed acyclic graph models (also known as Bayesian networks). We provide fast procedures that determine whether a partial matrix can be completed in either of these spaces and thereafter proceed to construct the completed matrices. We prove an analog of the positive definite completion result for undirected graphs in the context of directed acyclic graphs, and thus proceed to characterize the class of DAGs which can always be completed. We also proceed to give closed form expressions for the inverse and the determinant of a completed matrix as a function of only the elements of the corresponding partial matrix.

preprint2012arXiv

Sparse Matrix Decompositions and Graph Characterizations

The question of when zeros (i.e., sparsity) in a positive definite matrix $A$ are preserved in its Cholesky decomposition, and vice versa, was addressed by Paulsen et al. in the Journal of Functional Analysis (85, pp151-178). In particular, they prove that for the pattern of zeros in $A$ to be retained in the Cholesky decomposition of $A$, the pattern of zeros in $A$ has to necessarily correspond to a chordal (or decomposable) graph associated with a specific type of vertex ordering. This result therefore yields a characterization of chordal graphs in terms of sparse positive definite matrices. It has also proved to be extremely useful in probabilistic and statistical analysis of Markov random fields where zeros in positive definite correlation matrices are intimately related to the notion of stochastic independence. Now, consider a positive definite matrix $A$ and its Cholesky decomposition given by $A = LDL^T$, where $L$ is lower triangular with unit diagonal entries, and $D$ a diagonal matrix with positive entries. In this paper, we prove that a necessary and sufficient condition for zeros (i.e., sparsity) in a positive definite matrix $A$ to be preserved in its associated Cholesky matrix $L$, \, and in addition also preserved in the inverse of the Cholesky matrix $L^{-1}$, is that the pattern of zeros corresponds to a co-chordal or homogeneous graph associated with a specific type of vertex ordering. We proceed to provide a second characterization of this class of graphs in terms of determinants of submatrices that correspond to cliques in the graph. These results add to the growing body of literature in the field of sparse matrix decompositions, and also prove to be critical ingredients in the probabilistic analysis of an important class of Markov random fields.

preprint2012arXiv

Successive Standardization of Rectangular Arrays

In this note we illustrate and develop further with mathematics and examples, the work on successive standardization (or normalization) that is studied earlier by the same authors in Olshen and Rajaratnam (2010) and Olshen and Rajaratnam (2011). Thus, we deal with successive iterations applied to rectangular arrays of numbers, where to avoid technical difficulties an array has at least three rows and at least three columns. Without loss, an iteration begins with operations on columns: first subtract the mean of each column; then divide by its standard deviation. The iteration continues with the same two operations done successively for rows. These four operations applied in sequence completes one iteration. One then iterates again, and again, and again,.... In Olshen and Rajaratnam (2010) it was argued that if arrays are made up of real numbers, then the set for which convergence of these successive iterations fails has Lebesgue measure 0. The limiting array has row and column means 0, row and column standard deviations 1. A basic result on convergence given in Olshen and Rajaratnam (2010) is true, though the argument in Olshen and Rajaratnam (2010) is faulty. The result is stated in the form of a theorem here, and the argument for the theorem is correct. Moreover, many graphics given in Olshen and Rajaratnam (2010) suggest that but for a set of entries of any array with Lebesgue measure 0, convergence is very rapid, eventually exponentially fast in the number of iterations. Because we learned this set of rules from Bradley Efron, we call it "Efron's algorithm". More importantly, the rapidity of convergence is illustrated by numerical examples.

preprint2011arXiv

Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?

Discussion of "A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?" by B.B. McShane and A.J. Wyner [arXiv:1104.4002]

preprint2011arXiv

Hub discovery in partial correlation graphical models

This paper treats the problem of screening a p-variate sample for strongly and multiply connected vertices in the partial correlation graph associated with the the partial correlation matrix of the sample. This problem, called hub screening, is important in many applications ranging from network security to computational biology to finance to social networks. In the area of network security, a node that becomes a hub of high correlation with neighboring nodes might signal anomalous activity such as a coordinated flooding attack. In the area of computational biology the set of hubs of a gene expression correlation graph can serve as potential targets for drug treatment to block a pathway or modulate host response. In the area of finance a hub might indicate a vulnerable financial instrument or sector whose collapse might have major repercussions on the market. In the area of social networks a hub of observed interactions between criminal suspects could be an influential ringleader. The techniques and theory presented in this paper permit scalable and reliable screening for such hubs. This paper extends our previous work on correlation screening [arXiv:1102.1204] to the more challenging problem of partial correlation screening for variables with a high degree of connectivity. In particular we consider 1) extension to the more difficult problem of screening for partial correlations exceeding a specified magnitude; 2) extension to screening variables whose vertex degree in the associated partial correlation graph, often called the concentration graph, exceeds a specified degree.

preprint2011arXiv

Large Scale Correlation Screening

This paper treats the problem of screening for variables with high correlations in high dimensional data in which there can be many fewer samples than variables. We focus on threshold-based correlation screening methods for three related applications: screening for variables with large correlations within a single treatment (autocorrelation screening); screening for variables with large cross-correlations over two treatments (cross-correlation screening); screening for variables that have persistently large auto-correlations over two treatments (persistent-correlation screening). The novelty of correlation screening is that it identifies a smaller number of variables which are highly correlated with others, as compared to identifying a number of correlation parameters. Correlation screening suffers from a phase transition phenomenon: as the correlation threshold decreases the number of discoveries increases abruptly. We obtain asymptotic expressions for the mean number of discoveries and the phase transition thresholds as a function of the number of samples, the number of variables, and the joint sample distribution. We also show that under a weak dependency condition the number of discoveries is dominated by a Poisson random variable giving an asymptotic expression for the false positive rate. The correlation screening approach bears tremendous dividends in terms of the type and strength of the asymptotic results that can be obtained. It also overcomes some of the major hurdles faced by existing methods in the literature as correlation screening is naturally scalable to high dimension. Numerical results strongly validate the theory that is presented in this paper. We illustrate the application of the correlation screening methodology on a large scale gene-expression dataset, revealing a few influential variables that exhibit a significant amount of correlation over multiple treatments.

preprint2011arXiv

Retaining positive definiteness in thresholded matrices

Positive definite (p.d.) matrices arise naturally in many areas within mathematics and also feature extensively in scientific applications. In modern high-dimensional applications, a common approach to finding sparse positive definite matrices is to threshold their small off-diagonal elements. This thresholding, sometimes referred to as hard-thresholding, sets small elements to zero. Thresholding has the attractive property that the resulting matrices are sparse, and are thus easier to interpret and work with. In many applications, it is often required, and thus implicitly assumed, that thresholded matrices retain positive definiteness. In this paper we formally investigate the algebraic properties of p.d. matrices which are thresholded. We demonstrate that for positive definiteness to be preserved, the pattern of elements to be set to zero has to necessarily correspond to a graph which is a union of disconnected complete components. This result rigorously demonstrates that, except in special cases, positive definiteness can be easily lost. We then proceed to demonstrate that the class of diagonally dominant matrices is not maximal in terms of retaining positive definiteness when thresholded. Consequently, we derive characterizations of matrices which retain positive definiteness when thresholded with respect to important classes of graphs. In particular, we demonstrate that retaining positive definiteness upon thresholding is governed by complex algebraic conditions.

preprint2011arXiv

Wishart distributions for decomposable covariance graph models

Gaussian covariance graph models encode marginal independence among the components of a multivariate random vector by means of a graph $G$. These models are distinctly different from the traditional concentration graph models (often also referred to as Gaussian graphical models or covariance selection models) since the zeros in the parameter are now reflected in the covariance matrix $Σ$, as compared to the concentration matrix $Ω=Σ^{-1}$. The parameter space of interest for covariance graph models is the cone $P_G$ of positive definite matrices with fixed zeros corresponding to the missing edges of $G$. As in Letac and Massam [Ann. Statist. 35 (2007) 1278--1323], we consider the case where $G$ is decomposable. In this paper, we construct on the cone $P_G$ a family of Wishart distributions which serve a similar purpose in the covariance graph setting as those constructed by Letac and Massam [Ann. Statist. 35 (2007) 1278--1323] and Dawid and Lauritzen [Ann. Statist. 21 (1993) 1272--1317] do in the concentration graph setting. We proceed to undertake a rigorous study of these "covariance" Wishart distributions and derive several deep and useful properties of this class.

Bala Rajaratnam

What is connected

Connect this record

See the researcher in context

Building this map preview

42 published item(s)

A unified framework for correlation mining in ultra-high dimension

Probability inequalities and tail estimates for metric semigroups

A convex framework for high-dimensional sparse Cholesky based covariance estimation

Generalized Pseudolikelihood Methods for Inverse Covariance Estimation

Model-free consistency of graph partitioning

Preserving positivity for matrices with sparsity constraints

Towards a sparse, scalable, and stably positive definite (inverse) covariance estimator

Two-stage Sampling, Prediction and Adaptive Regression via Correlation Screening (SPARCS)

Bayesian inference for Gaussian graphical models beyond decomposable graphs

Convergence of cyclic coordinatewise l1 minimization

Critical exponents of graphs

Differential Calculus on Graphon Space

Extracting Common Time Trends from Concurrent Time Series: Maximum Autocorrelation Factors with Application to Tree Ring Time Series Data

Foundational principles for large scale inference: Illustrations through correlation mining

Graphical Markov models for infinitely many variables

High dimensional Bayesian inference for Gaussian directed acyclic graph models

Integration and measures on the space of countable labelled graphs

Lasso Regression: Estimation and Shrinkage via Limit of Gibbs Sampling

MCMC-Based Inference in the Era of Big Data: A Fundamental Analysis of the Convergence Complexity of High-Dimensional Chains

Statistical paleoclimate reconstructions via Markov random fields

A convex pseudo-likelihood framework for high dimensional partial correlation estimation with convergence guarantees

G-AMA: Sparse Gaussian graphical model estimation via alternating minimization

Loewner positive entrywise functions, and classification of measurable solutions of Cauchy's functional equations

On fractional Hadamard powers of positive block matrices

Optimization Methods for Sparse Pseudo-Likelihood Graphical Model Selection

The Letac-Massam conjecture and existence of high dimensional Bayes estimators for Graphical Models

A Methodology for Robust Multiproxy Paleoclimate Reconstructions and Modeling of Temperature Conditional Quantiles

Duality in Graphical Models

Functions preserving positive definiteness for sparse matrices

Predictive Correlation Screening: Application to Two-stage Predictor Design in High Dimension

Successive normalization of rectangular arrays

The critical exponent conjecture for powers of doubly nonnegative matrices

A note on the lack of symmetry in the graphical lasso

Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

Positive definite completion problems for directed acyclic graphs

Sparse Matrix Decompositions and Graph Characterizations

Successive Standardization of Rectangular Arrays

Discussion of: A statistical analysis of multiple temperature proxies: Are reconstructions of surface temperatures over the last 1000 years reliable?

Hub discovery in partial correlation graphical models

Large Scale Correlation Screening

Retaining positive definiteness in thresholded matrices

Wishart distributions for decomposable covariance graph models