Source author record

Alfred O. Hero III

Alfred O. Hero III appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

63works

31topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Orthonormal Sketches for Secure Coded Regression

In this work, we propose a method for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample in \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate} gradient coding scheme, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. We focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be used to secure the data. We illustrate the performance through numerical experiments.

preprint2022arXiv

SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks

In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in the model itself. Alternatively, second-order belief propagation has been demonstrated for polytrees but not for general directed acyclic graph structures. In this work, we extend Loopy Belief Propagation to the setting of second-order Bayesian networks, giving rise to Second-Order Loopy Belief Propagation (SOLBP). For second-order Bayesian networks, SOLBP generates inferences consistent with those generated by sum-product networks, while being more computationally efficient and scalable.

preprint2022arXiv

Straggler Robust Distributed Matrix Inverse Approximation

A cumbersome operation in numerical analysis and linear algebra, optimization, machine learning and engineering algorithms; is inverting large full-rank matrices which appears in various processes and applications. This has both numerical stability and complexity issues, as well as high expected time to compute. We address the latter issue, by proposing an algorithm which uses a black-box least squares optimization solver as a subroutine, to give an estimate of the inverse (and pseudoinverse) of real nonsingular matrices; by estimating its columns. This also gives it the flexibility to be performed in a distributed manner, thus the estimate can be obtained a lot faster, and can be made robust to \textit{stragglers}. Furthermore, we assume a centralized network with no message passing between the computing nodes, and do not require a matrix factorization; e.g. LU, SVD or QR decomposition beforehand.

preprint2022arXiv

Uncertain Bayesian Networks: Learning from Incomplete Data

When the historical data are limited, the conditional probabilities associated with the nodes of Bayesian networks are uncertain and can be empirically estimated. Second order estimation methods provide a framework for both estimating the probabilities and quantifying the uncertainty in these estimates. We refer to these cases as uncer tain or second-order Bayesian networks. When such data are complete, i.e., all variable values are observed for each instantiation, the conditional probabilities are known to be Dirichlet-distributed. This paper improves the current state-of-the-art approaches for handling uncertain Bayesian networks by enabling them to learn distributions for their parameters, i.e., conditional probabilities, with incomplete data. We extensively evaluate various methods to learn the posterior of the parameters through the desired and empirically derived strength of confidence bounds for various queries.

preprint2020arXiv

Fundamental Limits of Deep Graph Convolutional Networks

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. To elucidate the capabilities and limitations of GCNs, we investigate their power, as a function of their number of layers, to distinguish between different random graph models (corresponding to different class-conditional distributions in a classification problem) on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We give a precise characterization of the set of pairs of graphons that are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. This characterization is in terms of a degree profile closeness property. Outside this class, a very simple GCN architecture suffices for distinguishability. We then exhibit a concrete, infinite class of graphons arising from stochastic block models that are well-separated in terms of cut distance and are indistinguishable by a GCN. These results theoretically match empirical observations of several prior works. To prove our results, we exploit a connection to random walks on graphs. Finally, we give empirical results on synthetic and real graph classification datasets, indicating that indistinguishable graph distributions arise in practice.

preprint2020arXiv

Numerically Stable Binary Gradient Coding

A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. \textit{Gradient coding} has been recently proposed in distributed optimization to compute the gradient of an objective function using multiple, possibly unreliable, worker nodes. By designing distributed coded schemes, gradient coded computations can be made resilient to \textit{stragglers}, nodes with longer response time comparing to other nodes in a distributed network. Most such schemes rely on operations over the real or complex numbers and are inherently numerically unstable. We present a binary scheme which avoids such operations, thereby enabling numerically stable distributed computation of the gradient. Also, some restricting assumptions in prior work are dropped, and a more efficient decoding is given.

preprint2020arXiv

The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. These results theoretically match empirical observations of several prior works. Finally, we show a converse result that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.

preprint2020arXiv

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. Approaches to overcome this hurdle include compression of the data matrix and distributing the computations. \textit{Leverage score sampling} provides a compressed approximation of a data matrix using an importance weighted subset. \textit{Gradient coding} has been recently proposed in distributed optimization to compute the gradient using multiple unreliable worker nodes. By designing coding matrices, gradient coded computations can be made resilient to stragglers, which are nodes in a distributed network that degrade system performance. We present a novel \textit{weighted leverage score} approach, that achieves improved performance for distributed gradient coding by utilizing an importance sampling.

preprint2019arXiv

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.

preprint2016arXiv

AMOS: An Automated Model Order Selection Algorithm for Spectral Graph Clustering

One of the longstanding problems in spectral graph clustering (SGC) is the so-called model order selection problem: automated selection of the correct number of clusters. This is equivalent to the problem of finding the number of connected components or communities in an undirected graph. In this paper, we propose AMOS, an automated model order selection algorithm for SGC. Based on a recent analysis of clustering reliability for SGC under the random interconnection model, AMOS works by incrementally increasing the number of clusters, estimating the quality of identified clusters, and providing a series of clustering reliability tests. Consequently, AMOS outputs clusters of minimal model order with statistical clustering reliability guarantees. Comparing to three other automated graph clustering methods on real-world datasets, AMOS shows superior performance in terms of multiple external and internal clustering metrics.

preprint2016arXiv

Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

We propose a scalable method for semi-supervised (transductive) learning from massive network-structured datasets. Our approach to semi-supervised learning is based on representing the underlying hypothesis as a graph signal with small total variation. Requiring a small total variation of the graph signal representing the underlying hypothesis corresponds to the central smoothness assumption that forms the basis for semi-supervised learning, i.e., input points forming clusters have similar output values or labels. We formulate the learning problem as a nonsmooth convex optimization problem which we solve by appealing to Nesterovs optimal first-order method for nonsmooth optimization. We also provide a message passing formulation of the learning method which allows for a highly scalable implementation in big data frameworks.

preprint2015arXiv

Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis

The flare-productivity of an active region is observed to be related to its spatial complexity. Mount Wilson or McIntosh sunspot classifications measure such complexity but in a categorical way, and may therefore not use all the information present in the observations. Moreover, such categorical schemes hinder a systematic study of an active region's evolution for example. We propose fine-scale quantitative descriptors for an active region's complexity and relate them to the Mount Wilson classification. We analyze the local correlation structure within continuum and magnetogram data, as well as the cross-correlation between continuum and magnetogram data. We compute the intrinsic dimension, partial correlation, and canonical correlation analysis (CCA) of image patches of continuum and magnetogram active region images taken from the SOHO-MDI instrument. We use masks of sunspots derived from continuum as well as larger masks of magnetic active regions derived from the magnetogram to analyze separately the core part of an active region from its surrounding part. We find the relationship between complexity of an active region as measured by Mount Wilson and the intrinsic dimension of its image patches. Partial correlation patterns exhibit approximately a third-order Markov structure. CCA reveals different patterns of correlation between continuum and magnetogram within the sunspots and in the region surrounding the sunspots. These results also pave the way for patch-based dictionary learning with a view towards automatic clustering of active regions.

preprint2015arXiv

Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization

Separating active regions that are quiet from potentially eruptive ones is a key issue in Space Weather applications. Traditional classification schemes such as Mount Wilson and McIntosh have been effective in relating an active region large scale magnetic configuration to its ability to produce eruptive events. However, their qualitative nature prevents systematic studies of an active region's evolution for example. We introduce a new clustering of active regions that is based on the local geometry observed in Line of Sight magnetogram and continuum images. We use a reduced-dimension representation of an active region that is obtained by factoring the corresponding data matrix comprised of local image patches. Two factorizations can be compared via the definition of appropriate metrics on the resulting factors. The distances obtained from these metrics are then used to cluster the active regions. We find that these metrics result in natural clusterings of active regions. The clusterings are related to large scale descriptors of an active region such as its size, its local magnetic field distribution, and its complexity as measured by the Mount Wilson classification scheme. We also find that including data focused on the neutral line of an active region can result in an increased correspondence between our clustering results and other active region descriptors such as the Mount Wilson classifications and the $R$ value. We provide some recommendations for which metrics, matrix factorization techniques, and regions of interest to use to study active regions.

preprint2015arXiv

Kronecker PCA Based Robust SAR STAP

In this work the detection of moving targets in multiantenna SAR is considered. As a high resolution radar imaging modality, SAR detects and identifies stationary targets very well, giving it an advantage over classical GMTI radars. Moving target detection is more challenging due to the "burying" of moving targets in the clutter and is often achieved using space-time adaptive processing (STAP) (based on learning filters from the spatio-temporal clutter covariance) to remove the stationary clutter and enhance the moving targets. In this work, it is noted that in addition to the oft noted low rank structure, the clutter covariance is also naturally in the form of a space vs time Kronecker product with low rank factors. A low-rank KronPCA covariance estimation algorithm is proposed to exploit this structure, and a separable clutter cancelation filter based on the Kronecker covariance estimate is proposed. Together, these provide orders of magnitude reduction in the number of training samples required, as well as improved robustness to corruption of the training data, e.g. due to outliers and moving targets. Theoretical properties of the proposed estimation algorithm are derived and the significant reductions in training complexity are established under the spherically invariant random vector model (SIRV). Finally, an extension of this approach incorporating multipass data (change detection) is presented. Simulation results and experiments using the real Gotcha SAR GMTI challenge dataset are presented that confirm the advantages of our approach relative to existing techniques.

preprint2015arXiv

Meta learning of bounds on the Bayes classifier error

Meta learning uses information from base learners (e.g. classifiers or estimators) as well as information about the learning problem to improve upon the performance of a single base learner. For example, the Bayes error rate of a given feature space, if known, can be used to aid in choosing a classifier, as well as in feature selection and model selection for the base classifiers and the meta classifier. Recent work in the field of f-divergence functional estimation has led to the development of simple and rapidly converging estimators that can be used to estimate various bounds on the Bayes error. We estimate multiple bounds on the Bayes error using an estimator that applies meta learning to slowly converging plug-in estimators to obtain the parametric convergence rate. We compare the estimated bounds empirically on simulated data and then estimate the tighter bounds on features extracted from an image patch analysis of sunspot continuum and magnetogram images.

preprint2015arXiv

MIST: L0 Sparse Linear Regression with Momentum

Significant attention has been given to minimizing a penalized least squares criterion for estimating sparse solutions to large linear systems of equations. The penalty is responsible for inducing sparsity and the natural choice is the so-called $l_0$ norm. In this paper we develop a Momentumized Iterative Shrinkage Thresholding (MIST) algorithm for minimizing the resulting non-convex criterion and prove its convergence to a local minimizer. Simulations on large data sets show superior performance of the proposed method to other methods.

preprint2015arXiv

Multi-criteria Similarity-based Anomaly Detection using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similarity-based anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g.~as measured by nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including non-metric measures, and one can test for anomalies by scalarizing using a non-negative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similarity-based anomaly detection using a novel multi-criteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets.

preprint2015arXiv

Non-parametric Quickest Change Detection for Large Scale Random Matrices

The problem of quickest detection of a change in the distribution of a $n\times p$ random matrix based on a sequence of observations having a single unknown change point is considered. The forms of the pre- and post-change distributions of the rows of the matrices are assumed to belong to the family of elliptically contoured densities with sparse dispersion matrices but are otherwise unknown. We propose a non-parametric stopping rule that is based on a novel summary statistic related to k-nearest neighbor correlation between columns of each observed random matrix. In the large scale regime of $p\rightarrow \infty$ and $n$ fixed we show that, among all functions of the proposed summary statistic, the proposed stopping rule is asymptotically optimal under a minimax quickest change detection (QCD) model.

preprint2015arXiv

On Decentralized Estimation with Active Queries

We consider the problem of decentralized 20 questions with noise for multiple players/agents under the minimum entropy criterion in the setting of stochastic search over a parameter space, with application to target localization. We propose decentralized extensions of the active query-based stochastic search strategy that combines elements from the 20 questions approach and social learning. We prove convergence to correct consensus on the value of the parameter. This framework provides a flexible and tractable mathematical model for decentralized parameter estimation systems based on active querying. We illustrate the effectiveness and robustness of the proposed decentralized collaborative 20 questions algorithm for random network topologies with information sharing.

preprint2015arXiv

Phase Transitions in Spectral Community Detection of Large Noisy Networks

In this paper, we study the sensitivity of the spectral clustering based community detection algorithm subject to a Erdos-Renyi type random noise model. We prove phase transitions in community detectability as a function of the external edge connection probability and the noisy edge presence probability under a general network model where two arbitrarily connected communities are interconnected by random external edges. Specifically, the community detection performance transitions from almost perfect detectability to low detectability as the inter-community edge connection probability exceeds some critical value. We derive upper and lower bounds on the critical value and show that the bounds are identical when the two communities have the same size. The phase transition results are validated using network simulations. Using the derived expressions for the phase transition threshold we propose a method for estimating this threshold from observed data.

preprint2015arXiv

Semi-supervised Multi-sensor Classification via Consensus-based Multi-View Maximum Entropy Discrimination

In this paper, we consider multi-sensor classification when there is a large number of unlabeled samples. The problem is formulated under the multi-view learning framework and a Consensus-based Multi-View Maximum Entropy Discrimination (CMV-MED) algorithm is proposed. By iteratively maximizing the stochastic agreement between multiple classifiers on the unlabeled dataset, the algorithm simultaneously learns multiple high accuracy classifiers. We demonstrate that our proposed method can yield improved performance over previous multi-view learning approaches by comparing performance on three real multi-sensor data sets.

preprint2015arXiv

Shortest Path through Random Points

Let $(M,g_1)$ be a complete $d$-dimensional Riemannian manifold for $d > 1$. Let $\mathcal X_n$ be a set of $n$ sample points in $M$ drawn randomly from a smooth Lebesgue density $f$ supported in $M$. Let $x,y$ be two points in $M$. We prove that the normalized length of the power-weighted shortest path between $x, y$ through $\mathcal X_n$ converges almost surely to a constant multiple of the Riemannian distance between $x,y$ under the metric tensor $g_p = f^{2(1-p)/d} g_1$, where $p > 1$ is the power parameter.

preprint2015arXiv

Universal Phase Transition in Community Detectability under a Stochastic Block Model

We prove the existence of an asymptotic phase transition threshold on community detectability for the spectral modularity method [M. E. J. Newman, Phys. Rev. E 74, 036104 (2006) and Proc. National Academy of Sciences. 103, 8577 (2006)] under a stochastic block model. The phase transition on community detectability occurs as the inter-community edge connection probability $p$ grows. This phase transition separates a sub-critical regime of small $p$, where modularity-based community detection successfully identifies the communities, from a super-critical regime of large $p$ where successful community detection is impossible. We show that, as the community sizes become large, the asymptotic phase transition threshold $p^*$ is equal to $\sqrt{p_1\cdot p_2}$, where $p_i~(i=1,2)$ is the within-community edge connection probability. Thus the phase transition threshold is universal in the sense that it does not depend on the ratio of community sizes. The universal phase transition phenomenon is validated by simulations for moderately sized communities. Using the derived expression for the phase transition threshold we propose an empirical method for estimating this threshold from real-world data.

preprint2014arXiv

Collaborative 20 Questions for Target Localization

We consider the problem of 20 questions with noise for multiple players under the minimum entropy criterion in the setting of stochastic search, with application to target localization. Each player yields a noisy response to a binary query governed by a certain error probability. First, we propose a sequential policy for constructing questions that queries each player in sequence and refines the posterior of the target location. Second, we consider a joint policy that asks all players questions in parallel at each time instant and characterize the structure of the optimal policy for constructing the sequence of questions. This generalizes the single player probabilistic bisection method for stochastic search problems. Third, we prove an equivalence between the two schemes showing that, despite the fact that the sequential scheme has access to a more refined filtration, the joint scheme performs just as well on average. Fourth, we establish convergence rates of the mean-square error (MSE) and derive error exponents. Lastly, we obtain an extension to the case of unknown error probabilities. This framework provides a mathematical model for incorporating a human in the loop for active machine learning systems.

preprint2014arXiv

Dynamic stochastic blockmodels for time-evolving social networks

Significant efforts have gone into the development of statistical models for analyzing data in the form of networks, such as social networks. Most existing work has focused on modeling static networks, which represent either a single time snapshot or an aggregate view over time. There has been recent interest in statistical modeling of dynamic networks, which are observed at multiple points in time and offer a richer representation of many complex phenomena. In this paper, we present a state-space model for dynamic networks that extends the well-known stochastic blockmodel for static networks to the dynamic setting. We fit the model in a near-optimal manner using an extended Kalman filter (EKF) augmented with a local search. We demonstrate that the EKF-based algorithm performs competitively with a state-of-the-art algorithm based on Markov chain Monte Carlo sampling but is significantly less computationally demanding.

preprint2014arXiv

Ensemble estimation of multivariate f-divergence

f-divergence estimation is an important problem in the fields of information theory, machine learning, and statistics. While several divergence estimators exist, relatively few of their convergence rates are known. We derive the MSE convergence rate for a density plug-in estimator of f-divergence. Then by applying the theory of optimally weighted ensemble estimation, we derive a divergence estimator with a convergence rate of O(1/T) that is simple to implement and performs well in high dimensions. We validate our theoretical results with experiments.

preprint2014arXiv

Image patch analysis and clustering of sunspots: a dimensionality reduction approach

Sunspots, as seen in white light or continuum images, are associated with regions of high magnetic activity on the Sun, visible on magnetogram images. Their complexity is correlated with explosive solar activity and so classifying these active regions is useful for predicting future solar activity. Current classification of sunspot groups is visually based and suffers from bias. Supervised learning methods can reduce human bias but fail to optimally capitalize on the information present in sunspot images. This paper uses two image modalities (continuum and magnetogram) to characterize the spatial and modal interactions of sunspot and magnetic active region images and presents a new approach to cluster the images. Specifically, in the framework of image patch analysis, we estimate the number of intrinsic parameters required to describe the spatial and modal dependencies, the correlation between the two modalities and the corresponding spatial patterns, and examine the phenomena at different scales within the images. To do this, we use linear and nonlinear intrinsic dimension estimators, canonical correlation analysis, and multiresolution analysis of intrinsic dimension.

preprint2014arXiv

Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification

We consider the application of KronPCA spatio-temporal modeling techniques [Greenewald et al 2013, Tsiligkaridis et al 2013] to the extraction of spatiotemporal features for video dismount classification. KronPCA performs a low-rank type of dimensionality reduction that is adapted to spatio-temporal data and is characterized by the T frame multiframe mean and covariance of p spatial features. For further regularization and improved inverse estimation, we also use the diagonally corrected KronPCA shrinkage methods we presented in [Greenewald et al 2013]. We apply this very general method to the modeling of the multivariate temporal behavior of HOG features extracted from pedestrian bounding boxes in video, with gender classification in a challenging dataset chosen as a specific application. The learned covariances for each class are used to extract spatiotemporal features which are then classified, achieving competitive classification performance.

preprint2014arXiv

Learning Latent Variable Gaussian Graphical Models

Gaussian graphical models (GGM) have been widely used in many high-dimensional applications ranging from biological and financial data to recommender systems. Sparsity in GGM plays a central role both statistically and computationally. Unfortunately, real-world data often does not fit well to sparse graphical models. In this paper, we focus on a family of latent variable Gaussian graphical models (LVGGM), where the model is conditionally sparse given latent variables, but marginally non-sparse. In LVGGM, the inverse covariance matrix has a low-rank plus sparse structure, and can be learned in a regularized maximum likelihood framework. We derive novel parameter estimation error bounds for LVGGM under mild conditions in the high-dimensional setting. These results complement the existing theory on the structural learning, and open up new possibilities of using LVGGM for statistical inference.

preprint2014arXiv

Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models

We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.

preprint2014arXiv

Multi-layer graph analysis for dynamic social networks

Modern social networks frequently encompass multiple distinct types of connectivity information; for instance, explicitly acknowledged friend relationships might complement behavioral measures that link users according to their actions or interests. One way to represent these networks is as multi-layer graphs, where each layer contains a unique set of edges over the same underlying vertices (users). Edges in different layers typically have related but distinct semantics; depending on the application multiple layers might be used to reduce noise through averaging, to perform multifaceted analyses, or a combination of the two. However, it is not obvious how to extend standard graph analysis techniques to the multi-layer setting in a flexible way. In this paper we develop latent variable models and methods for mining multi-layer networks for connectivity patterns based on noisy data.

preprint2014arXiv

Multivariate f-Divergence Estimation With Confidence

The problem of f-divergence estimation is important in the fields of machine learning, information theory, and statistics. While several nonparametric divergence estimators exist, relatively few have known convergence properties. In particular, even for those estimators whose MSE convergence rates are known, the asymptotic distributions are unknown. We establish the asymptotic normality of a recently proposed ensemble estimator of f-divergence between two distributions from a finite number of samples. This estimator has MSE convergence rate of O(1/T), is simple to implement, and performs well in high dimensions. This theory enables us to perform divergence-based inference tasks such as testing equality of pairs of distributions based on empirical samples. We experimentally validate our theoretical results and, as an illustration, use them to empirically bound the best achievable classification error.

preprint2014arXiv

Node Removal Vulnerability of the Largest Component of a Network

The connectivity structure of a network can be very sensitive to removal of certain nodes in the network. In this paper, we study the sensitivity of the largest component size to node removals. We prove that minimizing the largest component size is equivalent to solving a matrix one-norm minimization problem whose column vectors are orthogonal and sparse and they form a basis of the null space of the associated graph Laplacian matrix. A greedy node removal algorithm is then proposed based on the matrix one-norm minimization. In comparison with other node centralities such as node degree and betweenness, experimental results on US power grid dataset validate the effectiveness of the proposed approach in terms of reduction of the largest component size with relatively few node removals.

preprint2014arXiv

Pareto-depth for Multiple-query Image Retrieval

Most content-based image retrieval systems consider either one single query, or multiple queries that include the same object or represent the same semantic information. In this paper we consider the content-based image retrieval problem for multiple query images corresponding to different image semantics. We propose a novel multiple-query information retrieval algorithm that combines the Pareto front method (PFM) with efficient manifold ranking (EMR). We show that our proposed algorithm outperforms state of the art multiple-query retrieval algorithms on real-world image databases. We attribute this performance improvement to concavity properties of the Pareto fronts, and prove a theoretical result that characterizes the asymptotic concavity of the fronts.

preprint2014arXiv

Performance Guarantees for Adaptive Estimation of Sparse Signals

This paper studies adaptive sensing for estimating the nonzero amplitudes of a sparse signal with the aim of providing analytical guarantees on the performance gain due to adaptive resource allocation. We consider a previously proposed optimal two-stage policy for allocating sensing resources. For positive powers q, we derive tight upper bounds on the mean qth-power error resulting from the optimal two-stage policy and corresponding lower bounds on the improvement over non-adaptive uniform sensing. It is shown that the adaptation gain is related to the detectability of nonzero signal components as characterized by Chernoff coefficients, thus quantifying analytically the dependence on the sparsity level of the signal, the signal-to-noise ratio, and the sensing resource budget. For fixed sparsity levels and increasing signal-to-noise ratio or sensing budget, we obtain the rate of convergence to oracle performance and the rate at which the fraction of resources spent on the first exploratory stage decreases to zero. For a vanishing fraction of nonzero components, the gain increases without bound as a function of signal-to-noise ratio and sensing budget. Numerical simulations demonstrate that the bounds on adaptation gain are quite tight in non-asymptotic regimes as well.

preprint2014arXiv

Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions

In this work we consider the estimation of spatio-temporal covariance matrices in the low sample non-Gaussian regime. We impose covariance structure in the form of a sum of Kronecker products decomposition (Tsiligkaridis et al. 2013, Greenewald et al. 2013) with diagonal correction (Greenewald et al.), which we refer to as DC-KronPCA, in the estimation of multiframe covariance matrices. This paper extends the approaches of (Tsiligkaridis et al.) in two directions. First, we modify the diagonally corrected method of (Greenewald et al.) to include a block Toeplitz constraint imposing temporal stationarity structure. Second, we improve the conditioning of the estimate in the very low sample regime by using Ledoit-Wolf type shrinkage regularization similar to (Chen, Hero et al. 2010). For improved robustness to heavy tailed distributions, we modify the KronPCA to incorporate robust shrinkage estimation (Chen, Hero et al. 2011). Results of numerical simulations establish benefits in terms of estimation MSE when compared to previous methods. Finally, we apply our methods to a real-world network spatio-temporal anomaly detection problem and achieve superior results.

preprint2014arXiv

Resource-Constrained Adaptive Search and Tracking for Sparse Dynamic Targets

This paper considers the problem of resource-constrained and noise-limited localization and estimation of dynamic targets that are sparsely distributed over a large area. We generalize an existing framework [Bashan et al, 2008] for adaptive allocation of sensing resources to the dynamic case, accounting for time-varying target behavior such as transitions to neighboring cells and varying amplitudes over a potentially long time horizon. The proposed adaptive sensing policy is driven by minimization of a modified version of the previously introduced ARAP objective function, which is a surrogate function for mean squared error within locations containing targets. We provide theoretical upper bounds on the performance of adaptive sensing policies by analyzing solutions with oracle knowledge of target locations, gaining insight into the effect of target motion and amplitude variation as well as sparsity. Exact minimization of the multi-stage objective function is infeasible, but myopic optimization yields a closed-form solution. We propose a simple non-myopic extension, the Dynamic Adaptive Resource Allocation Policy (D-ARAP), that allocates a fraction of resources for exploring all locations rather than solely exploiting the current belief state. Our numerical studies indicate that D-ARAP has the following advantages: (a) it is more robust than the myopic policy to noise, missing data, and model mismatch; (b) it performs comparably to well-known approximate dynamic programming solutions but at significantly lower computational complexity; and (c) it improves greatly upon non-adaptive uniform resource allocation in terms of estimation error and probability of detection.

preprint2014arXiv

Resource-Constrained Adaptive Search for Sparse Multi-Class Targets with Varying Importance

In sparse target inference problems it has been shown that significant gains can be achieved by adaptive sensing using convex criteria. We generalize previous work on adaptive sensing to (a) include multiple classes of targets with different levels of importance and (b) accommodate multiple sensor models. New optimization policies are developed to allocate a limited resource budget to simultaneously locate, classify and estimate a sparse number of targets embedded in a large space. Upper and lower bounds on the performance of the proposed policies are derived by analyzing a baseline policy, which allocates resources uniformly across the scene, and an oracle policy which has a priori knowledge of the target locations/classes. These bounds quantify analytically the potential benefit of adaptive sensing as a function of target frequency and importance. Numerical results indicate that the proposed policies perform close to the oracle bound (<3dB) when signal quality is sufficiently high (e.g.~performance within 3 dB for SNR above 15 dB). Moreover, the proposed policies improve on previous policies in terms of reducing estimation error, reducing misclassification probability, and increasing expected return. To account for sensors with different levels of agility, three sensor models are considered: global adaptive (GA), which can allocate different amounts of resource to each location in the space; global uniform (GU), which can allocate resources uniformly across the scene; and local adaptive (LA), which can allocate fixed units to a subset of locations. Policies that use a mixture of GU and LA sensors are shown to perform similarly to those that use GA sensors while being more easily implementable.

preprint2014arXiv

Spectral Correlation Hub Screening of Multivariate Time Series

This chapter discusses correlation analysis of stationary multivariate Gaussian time series in the spectral or Fourier domain. The goal is to identify the hub time series, i.e., those that are highly correlated with a specified number of other time series. We show that Fourier components of the time series at different frequencies are asymptotically statistically independent. This property permits independent correlation analysis at each frequency, alleviating the computational and statistical challenges of high-dimensional time series. To detect correlation hubs at each frequency, an existing correlation screening method is extended to the complex numbers to accommodate complex-valued Fourier components. We characterize the number of hub discoveries at specified correlation and degree thresholds in the regime of increasing dimension and fixed sample size. The theory specifies appropriate thresholds to apply to sample correlation matrices to detect hubs and also allows statistical significance to be attributed to hub discoveries. Numerical results illustrate the accuracy of the theory and the usefulness of the proposed spectral framework.

preprint2013arXiv

A Regularized Graph Layout Framework for Dynamic Network Visualization

Many real-world networks, including social and information networks, are dynamic structures that evolve over time. Such dynamic networks are typically visualized using a sequence of static graph layouts. In addition to providing a visual representation of the network structure at each time step, the sequence should preserve the mental map between layouts of consecutive time steps to allow a human to interpret the temporal evolution of the network. In this paper, we propose a framework for dynamic network visualization in the on-line setting where only present and past graph snapshots are available to create the present layout. The proposed framework creates regularized graph layouts by augmenting the cost function of a static graph layout algorithm with a grouping penalty, which discourages nodes from deviating too far from other nodes belonging to the same group, and a temporal penalty, which discourages large node movements between consecutive time steps. The penalties increase the stability of the layout sequence, thus preserving the mental map. We introduce two dynamic layout algorithms within the proposed framework, namely dynamic multidimensional scaling (DMDS) and dynamic graph Laplacian layout (DGLL). We apply these algorithms on several data sets to illustrate the importance of both grouping and temporal regularization for producing interpretable visualizations of dynamic networks.

preprint2013arXiv

Adaptive Evolutionary Clustering

In many practical applications of clustering, the objects to be clustered evolve over time, and a clustering result is desired at each time step. In such applications, evolutionary clustering typically outperforms traditional static clustering by producing clustering results that reflect long-term trends while being robust to short-term variations. Several evolutionary clustering algorithms have recently been proposed, often by adding a temporal smoothness penalty to the cost function of a static clustering method. In this paper, we introduce a different approach to evolutionary clustering by accurately tracking the time-varying proximities between objects followed by static clustering. We present an evolutionary clustering framework that adaptively estimates the optimal smoothing parameter using shrinkage estimation, a statistical approach that improves a naive estimate using additional information. The proposed framework can be used to extend a variety of static clustering algorithms, including hierarchical, k-means, and spectral clustering, into evolutionary clustering algorithms. Experiments on synthetic and real data sets indicate that the proposed framework outperforms static clustering and existing evolutionary clustering algorithms in many scenarios.

preprint2013arXiv

Convergence Properties of Kronecker Graphical Lasso Algorithms

This paper studies iteration convergence of Kronecker graphical lasso (KGLasso) algorithms for estimating the covariance of an i.i.d. Gaussian random sample under a sparse Kronecker-product covariance model and MSE convergence rates. The KGlasso model, originally called the transposable regularized covariance model by Allen ["Transposable regularized covariance models with an application to missing data imputation," Ann. Appl. Statist., vol. 4, no. 2, pp. 764-790, 2010], implements a pair of $\ell_1$ penalties on each Kronecker factor to enforce sparsity in the covariance estimator. The KGlasso algorithm generalizes Glasso, introduced by Yuan and Lin ["Model selection and estimation in the Gaussian graphical model," Biometrika, vol. 94, pp. 19-35, 2007] and Banerjee ["Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data," J. Mach. Learn. Res., vol. 9, pp. 485-516, Mar. 2008], to estimate covariances having Kronecker product form. It also generalizes the unpenalized ML flip-flop (FF) algorithm of Dutilleul ["The MLE algorithm for the matrix normal distribution," J. Statist. Comput. Simul., vol. 64, pp. 105-123, 1999] and Werner ["On estimation of covariance matrices with Kronecker product structure," IEEE Trans. Signal Process., vol. 56, no. 2, pp. 478-491, Feb. 2008] to estimation of sparse Kronecker factors. We establish that the KGlasso iterates converge pointwise to a local maximum of the penalized likelihood function. We derive high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. Our results establish that KGlasso has significantly faster asymptotic convergence than Glasso and FF. Simulations are presented that validate the results of our analysis.

preprint2013arXiv

Correcting Camera Shake by Incremental Sparse Approximation

The problem of deblurring an image when the blur kernel is unknown remains challenging after decades of work. Recently there has been rapid progress on correcting irregular blur patterns caused by camera shake, but there is still much room for improvement. We propose a new blind deconvolution method using incremental sparse edge approximation to recover images blurred by camera shake. We estimate the blur kernel first from only the strongest edges in the image, then gradually refine this estimate by allowing for weaker and weaker edges. Our method competes with the benchmark deblurring performance of the state-of-the-art while being significantly faster and easier to generalize.

preprint2013arXiv

Covariance Estimation in High Dimensions via Kronecker Product Expansions

This paper presents a new method for estimating high dimensional covariance matrices. The method, permuted rank-penalized least-squares (PRLS), is based on a Kronecker product series expansion of the true covariance matrix. Assuming an i.i.d. Gaussian random sample, we establish high dimensional rates of convergence to the true covariance as both the number of samples and the number of variables go to infinity. For covariance matrices of low separation rank, our results establish that PRLS has significantly faster convergence than the standard sample covariance matrix (SCM) estimator. The convergence rate captures a fundamental tradeoff between estimation error and approximation error, thus providing a scalable covariance estimation framework in terms of separation rank, similar to low rank approximation of covariance matrices. The MSE convergence rates generalize the high dimensional rates recently obtained for the ML Flip-flop algorithm for Kronecker product covariance estimation. We show that a class of block Toeplitz covariance matrices is approximatable by low separation rank and give bounds on the minimal separation rank $r$ that ensures a given level of bias. Simulations are presented to validate the theoretical bounds. As a real world application, we illustrate the utility of the proposed Kronecker covariance estimator for spatio-temporal linear least squares prediction of multivariate wind speed measurements.

preprint2013arXiv

Dynamic stochastic blockmodels: Statistical models for time-evolving networks

Significant efforts have gone into the development of statistical models for analyzing data in the form of networks, such as social networks. Most existing work has focused on modeling static networks, which represent either a single time snapshot or an aggregate view over time. There has been recent interest in statistical modeling of dynamic networks, which are observed at multiple points in time and offer a richer representation of many complex phenomena. In this paper, we propose a state-space model for dynamic networks that extends the well-known stochastic blockmodel for static networks to the dynamic setting. We then propose a procedure to fit the model using a modification of the extended Kalman filter augmented with a local search. We apply the procedure to analyze a dynamic social network of email communication.

preprint2013arXiv

Ensemble estimators for multivariate entropy estimation

The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the mean squared error (MSE) decays increasingly slowly as a function of the sample size $T$ as the dimension $d$ of the samples increases. In particular, the rate is often glacially slow of order $O(T^{-γ/{d}})$, where $γ>0$ is a rate parameter. Examples of such estimators include kernel density estimators, $k$-nearest neighbor ($k$-NN) density estimators, $k$-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted affine combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of $O(T^{-1})$. Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample.

preprint2013arXiv

Information Theoretic Adaptive Tracking of Epidemics in Complex Networks

Adaptively monitoring the states of nodes in a large complex network is of interest in domains such as national security, public health, and energy grid management. Here, we present an information theoretic adaptive tracking and sampling framework that recursively selects measurements using the feedback from performing inference on a dynamic Bayesian Network. We also present conditions for the existence of a network specific, observation dependent, phase transition in the updated posterior of hidden node states resulting from actively monitoring the network. Since traditional epidemic thresholds are derived using observation independent Markov chains, the threshold of the posterior should more accurately model the true phase transition of a network. The adaptive tracking framework and epidemic threshold should provide insight into modeling the dynamic response of the updated posterior to active intervention and control policies while monitoring modern complex networks.

preprint2013arXiv

Moving target inference with hierarchical Bayesian models in synthetic aperture radar imagery

In synthetic aperture radar (SAR), images are formed by focusing the response of stationary objects to a single spatial location. On the other hand, moving targets cause phase errors in the standard formation of SAR images that cause displacement and defocusing effects. SAR imagery also contains significant sources of non-stationary spatially-varying noises, including antenna gain discrepancies, angular scintillation (glints) and complex speckle. In order to account for this intricate phenomenology, this work combines the knowledge of the physical, kinematic, and statistical properties of SAR imaging into a single unified Bayesian structure that simultaneously (a) estimates the nuisance parameters such as clutter distributions and antenna miscalibrations and (b) estimates the target signature required for detection/inference of the target state. Moreover, we provide a Monte Carlo estimate of the posterior distribution for the target state and nuisance parameters that infers the parameters of the model directly from the data, largely eliminating tuning of algorithm parameters. We demonstrate that our algorithm competes at least as well on a synthetic dataset as state-of-the-art algorithms for estimating sparse signals. Finally, performance analysis on a measured dataset demonstrates that the proposed algorithm is robust at detecting/estimating targets over a wide area and performs at least as well as popular algorithms for SAR moving target detection.

preprint2013arXiv

Multi-criteria Anomaly Detection using Pareto Depth Analysis

We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be defined, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria.

preprint2013arXiv

Nonlinear unmixing of hyperspectral images using a semiparametric model and spatial regularization

Incorporating spatial information into hyperspectral unmixing procedures has been shown to have positive effects, due to the inherent spatial-spectral duality in hyperspectral scenes. Current research works that consider spatial information are mainly focused on the linear mixing model. In this paper, we investigate a variational approach to incorporating spatial correlation into a nonlinear unmixing procedure. A nonlinear algorithm operating in reproducing kernel Hilbert spaces, associated with an $\ell_1$ local variation norm as the spatial regularizer, is derived. Experimental results, with both synthetic and real data, illustrate the effectiveness of the proposed scheme.

preprint2013arXiv

Revealing social networks of spammers through spectral clustering

To date, most studies on spam have focused only on the spamming phase of the spam cycle and have ignored the harvesting phase, which consists of the mass acquisition of email addresses. It has been observed that spammers conceal their identity to a lesser degree in the harvesting phase, so it may be possible to gain new insights into spammers' behavior by studying the behavior of harvesters, which are individuals or bots that collect email addresses. In this paper, we reveal social networks of spammers by identifying communities of harvesters with high behavioral similarity using spectral clustering. The data analyzed was collected through Project Honey Pot, a distributed system for monitoring harvesting and spamming. Our main findings are (1) that most spammers either send only phishing emails or no phishing emails at all, (2) that most communities of spammers also send only phishing emails or no phishing emails at all, and (3) that several groups of spammers within communities exhibit coherent temporal behavior and have similar IP addresses. Our findings reveal some previously unknown behavior of spammers and suggest that there is indeed social structure between spammers to be discovered.

preprint2012arXiv

Empirical estimation of entropy functionals with confidence

This paper introduces a class of k-nearest neighbor ($k$-NN) estimators called bipartite plug-in (BPI) estimators for estimating integrals of non-linear functions of a probability density, such as Shannon entropy and Rényi entropy. The density is assumed to be smooth, have bounded support, and be uniformly bounded from below on this set. Unlike previous $k$-NN estimators of non-linear density functionals, the proposed estimator uses data-splitting and boundary correction to achieve lower mean square error. Specifically, we assume that $T$ i.i.d. samples ${X}_i \in \mathbb{R}^d$ from the density are split into two pieces of cardinality $M$ and $N$ respectively, with $M$ samples used for computing a k-nearest-neighbor density estimate and the remaining $N$ samples used for empirical estimation of the integral of the density functional. By studying the statistical properties of k-NN balls, explicit rates for the bias and variance of the BPI estimator are derived in terms of the sample size, the dimension of the samples and the underlying probability distribution. Based on these results, it is possible to specify optimal choice of tuning parameters $M/T$, $k$ for maximizing the rate of decrease of the mean square error (MSE). The resultant optimized BPI estimator converges faster and achieves lower mean squared error than previous $k$-NN entropy estimators. In addition, a central limit theorem is established for the BPI estimator that allows us to specify tight asymptotic confidence intervals.

preprint2012arXiv

The First Stray Light Corrected EUV Images of Solar Coronal Holes

Coronal holes are the source regions of the fast solar wind, which fills most of the solar system volume near the cycle minimum. Removing stray light from extreme ultraviolet (EUV) images of the Sun's corona is of high astrophysical importance, as it is required to make meaningful determinations of temperatures and densities of coronal holes. EUV images tend to be dominated by the component of the stray light due to the long-range scatter caused by microroughness of telescope mirror surfaces, and this component has proven very difficult to measure in pre-flight characterization. In-flight characterization heretofore has proven elusive due to the fact that the detected image is simultaneously nonlinear in two unknown functions: the stray light pattern and the true image which would be seen by an ideal telescope. Using a constrained blind deconvolution technique that takes advantage of known zeros in the true image provided by a fortuitous lunar transit, we have removed the stray light from solar images seen by the EUVI instrument on STEREO-B in all four filter bands (171, 195, 284, and 304 Å). Uncertainty measures of the stray light corrected images, which include the systematic error due to misestimation of the scatter, are provided. It is shown that in EUVI, stray light contributes up to 70% of the emission in coronal holes seen on the solar disk, which has dramatic consequences for diagnostics of temperature and density and therefore estimates of key plasma parameters such as the plasma $β$\ and ion-electron collision rates.

preprint2011arXiv

Order-preserving factor analysis (OPFA)

We present a novel factor analysis method that can be applied to the discovery of common factors shared among trajectories in multivariate time series data. These factors satisfy a precedence-ordering property: certain factors are recruited only after some other factors are activated. Precedence-ordering arise in applications where variables are activated in a specific order, which is unknown. The proposed method is based on a linear model that accounts for each factor's inherent delays and relative order. We present an algorithm to fit the model in an unsupervised manner using techniques from convex and non-convex optimization that enforce sparsity of the factor scores and consistent precedence-order of the factor loadings. We illustrate the Order-Preserving Factor Analysis (OPFA) method for the problem of extracting precedence-ordered factors from a longitudinal (time course) study of gene expression data.

preprint2011arXiv

Performance Bounds for Sparse Parametric Covariance Estimation in Gaussian Models

We consider estimation of a sparse parameter vector that determines the covariance matrix of a Gaussian random vector via a sparse expansion into known "basis matrices". Using the theory of reproducing kernel Hilbert spaces, we derive lower bounds on the variance of estimators with a given mean function. This includes unbiased estimation as a special case. We also present a numerical comparison of our lower bounds with the variance of two standard estimators (hard-thresholding estimator and maximum likelihood estimator).

preprint2011arXiv

Recursive $\ell_{1,\infty}$ Group lasso

We introduce a recursive adaptive group lasso algorithm for real-time penalized least squares prediction that produces a time sequence of optimal sparse predictor coefficient vectors. At each time index the proposed algorithm computes an exact update of the optimal $\ell_{1,\infty}$-penalized recursive least squares (RLS) predictor. Each update minimizes a convex but nondifferentiable function optimization problem. We develop an online homotopy method to reduce the computational complexity. Numerical simulations demonstrate that the proposed algorithm outperforms the $\ell_1$ regularized RLS algorithm for a group sparse system identification problem and has lower implementation complexity than direct group lasso solvers.

preprint2011arXiv

Sensor Management: Past, Present, and Future

Sensor systems typically operate under resource constraints that prevent the simultaneous use of all resources all of the time. Sensor management becomes relevant when the sensing system has the capability of actively managing these resources; i.e., changing its operating configuration during deployment in reaction to previous measurements. Examples of systems in which sensor management is currently used or is likely to be used in the near future include autonomous robots, surveillance and reconnaissance networks, and waveform-agile radars. This paper provides an overview of the theory, algorithms, and applications of sensor management as it has developed over the past decades and as it stands today.

preprint2010arXiv

Robust Shrinkage Estimation of High-dimensional Covariance Matrices

We address high dimensional covariance estimation for elliptical distributed samples, which are also known as spherically invariant random vectors (SIRV) or compound-Gaussian processes. Specifically we consider shrinkage methods that are suitable for high dimensional problems with a small number of samples (large $p$ small $n$). We start from a classical robust covariance estimator [Tyler(1987)], which is distribution-free within the family of elliptical distribution but inapplicable when $n<p$. Using a shrinkage coefficient, we regularize Tyler's fixed point iterations. We prove that, for all $n$ and $p$, the proposed fixed point iterations converge to a unique limit regardless of the initial condition. Next, we propose a simple, closed-form and data dependent choice for the shrinkage coefficient, which is based on a minimum mean squared error framework. Simulations demonstrate that the proposed method achieves low estimation error and is robust to heavy-tailed samples. Finally, as a real world application we demonstrate the performance of the proposed technique in the context of activity/intrusion detection using a wireless sensor network.

preprint2010arXiv

Spatio-Temporal Graphical Model Selection

We consider the problem of estimating the topology of spatial interactions in a discrete state, discrete time spatio-temporal graphical model where the interactions affect the temporal evolution of each agent in a network. Among other models, the susceptible, infected, recovered ($SIR$) model for interaction events fall into this framework. We pose the problem as a structure learning problem and solve it using an $\ell_1$-penalized likelihood convex program. We evaluate the solution on a simulated spread of infectious over a complex network. Our topology estimates outperform those of a standard spatial Markov random field graphical model selection using $\ell_1$-regularized logistic regression.

preprint2009arXiv

Covariance estimation in decomposable Gaussian graphical models

Graphical models are a framework for representing and exploiting prior conditional independence structures within distributions using graphs. In the Gaussian case, these models are directly related to the sparsity of the inverse covariance (concentration) matrix and allow for improved covariance estimation with lower computational complexity. We consider concentration estimation with the mean-squared error (MSE) as the objective, in a special type of model known as decomposable. This model includes, for example, the well known banded structure and other cases encountered in practice. Our first contribution is the derivation and analysis of the minimum variance unbiased estimator (MVUE) in decomposable graphical models. We provide a simple closed form solution to the MVUE and compare it with the classical maximum likelihood estimator (MLE) in terms of performance and complexity. Next, we extend the celebrated Stein's unbiased risk estimate (SURE) to graphical models. Using SURE, we prove that the MSE of the MVUE is always smaller or equal to that of the biased MLE, and that the MVUE itself is dominated by other approaches. In addition, we propose the use of SURE as a constructive mechanism for deriving new covariance estimators. Similarly to the classical MLE, all of our proposed estimators have simple closed form solutions but result in a significant reduction in MSE.

preprint2009arXiv

Shrinkage Algorithms for MMSE Covariance Estimation

We address covariance estimation in the sense of minimum mean-squared error (MMSE) for Gaussian samples. Specifically, we consider shrinkage methods which are suitable for high dimensional problems with a small number of samples (large p small n). First, we improve on the Ledoit-Wolf (LW) method by conditioning on a sufficient statistic. By the Rao-Blackwell theorem, this yields a new estimator called RBLW, whose mean-squared error dominates that of LW for Gaussian variables. Second, to further reduce the estimation error, we propose an iterative approach which approximates the clairvoyant shrinkage estimator. Convergence of this iterative method is established and a closed form expression for the limit is determined, which is referred to as the oracle approximating shrinkage (OAS) estimator. Both RBLW and OAS estimators have simple expressions and are easily implemented. Although the two methods are developed from different persepctives, their structure is identical up to specified constants. The RBLW estimator provably dominates the LW method. Numerical simulations demonstrate that the OAS approach can perform even better than RBLW, especially when n is much less than p. We also demonstrate the performance of these techniques in the context of adaptive beamforming.

preprint2008arXiv

Decomposable Principal Component Analysis

We consider principal component analysis (PCA) in decomposable Gaussian graphical models. We exploit the prior information in these models in order to distribute its computation. For this purpose, we reformulate the problem in the sparse inverse covariance (concentration) domain and solve the global eigenvalue problem using a sequence of local eigenvalue problems in each of the cliques of the decomposable graph. We demonstrate the application of our methodology in the context of decentralized anomaly detection in the Abilene backbone network. Based on the topology of the network, we propose an approximate statistical graphical model and distribute the computation of PCA.

preprint2008arXiv

Sparse image reconstruction for molecular imaging

The application that motivates this paper is molecular imaging at the atomic level. When discretized at sub-atomic distances, the volume is inherently sparse. Noiseless measurements from an imaging technology can be modeled by convolution of the image with the system point spread function (psf). Such is the case with magnetic resonance force microscopy (MRFM), an emerging technology where imaging of an individual tobacco mosaic virus was recently demonstrated with nanometer resolution. We also consider additive white Gaussian noise (AWGN) in the measurements. Many prior works of sparse estimators have focused on the case when H has low coherence; however, the system matrix H in our application is the convolution matrix for the system psf. A typical convolution matrix has high coherence. The paper therefore does not assume a low coherence H. A discrete-continuous form of the Laplacian and atom at zero (LAZE) p.d.f. used by Johnstone and Silverman is formulated, and two sparse estimators derived by maximizing the joint p.d.f. of the observation and image conditioned on the hyperparameters. A thresholding rule that generalizes the hard and soft thresholding rule appears in the course of the derivation. This so-called hybrid thresholding rule, when used in the iterative thresholding framework, gives rise to the hybrid estimator, a generalization of the lasso. Unbiased estimates of the hyperparameters for the lasso and hybrid estimator are obtained via Stein's unbiased risk estimate (SURE). A numerical study with a Gaussian psf and two sparse images shows that the hybrid estimator outperforms the lasso.

Alfred O. Hero III

What is connected

Connect this record

See the researcher in context

Building this map preview

63 published item(s)

Orthonormal Sketches for Secure Coded Regression

SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks

Straggler Robust Distributed Matrix Inverse Approximation

Uncertain Bayesian Networks: Learning from Incomplete Data

Fundamental Limits of Deep Graph Convolutional Networks

Numerically Stable Binary Gradient Coding

The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version

Weighted Gradient Coding with Leverage Score Sampling

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

AMOS: An Automated Model Order Selection Algorithm for Spectral Graph Clustering

Scalable Semi-Supervised Learning over Networks using Nonsmooth Convex Optimization

Image patch analysis of sunspots and active regions. I. Intrinsic dimension and correlation analysis

Image patch analysis of sunspots and active regions. II. Clustering via matrix factorization

Kronecker PCA Based Robust SAR STAP

Meta learning of bounds on the Bayes classifier error

MIST: L0 Sparse Linear Regression with Momentum

Multi-criteria Similarity-based Anomaly Detection using Pareto Depth Analysis

Non-parametric Quickest Change Detection for Large Scale Random Matrices

On Decentralized Estimation with Active Queries

Phase Transitions in Spectral Community Detection of Large Noisy Networks

Semi-supervised Multi-sensor Classification via Consensus-based Multi-View Maximum Entropy Discrimination

Shortest Path through Random Points

Universal Phase Transition in Community Detectability under a Stochastic Block Model

Collaborative 20 Questions for Target Localization

Dynamic stochastic blockmodels for time-evolving social networks

Ensemble estimation of multivariate f-divergence

Image patch analysis and clustering of sunspots: a dimensionality reduction approach

Kronecker PCA Based Spatio-Temporal Modeling of Video for Dismount Classification

Learning Latent Variable Gaussian Graphical Models

Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models

Multi-layer graph analysis for dynamic social networks

Multivariate f-Divergence Estimation With Confidence

Node Removal Vulnerability of the Largest Component of a Network

Pareto-depth for Multiple-query Image Retrieval

Performance Guarantees for Adaptive Estimation of Sparse Signals

Regularized Block Toeplitz Covariance Matrix Estimation via Kronecker Product Expansions

Resource-Constrained Adaptive Search and Tracking for Sparse Dynamic Targets

Resource-Constrained Adaptive Search for Sparse Multi-Class Targets with Varying Importance

Spectral Correlation Hub Screening of Multivariate Time Series

A Regularized Graph Layout Framework for Dynamic Network Visualization

Adaptive Evolutionary Clustering

Convergence Properties of Kronecker Graphical Lasso Algorithms

Correcting Camera Shake by Incremental Sparse Approximation

Covariance Estimation in High Dimensions via Kronecker Product Expansions

Dynamic stochastic blockmodels: Statistical models for time-evolving networks

Ensemble estimators for multivariate entropy estimation

Information Theoretic Adaptive Tracking of Epidemics in Complex Networks

Moving target inference with hierarchical Bayesian models in synthetic aperture radar imagery

Multi-criteria Anomaly Detection using Pareto Depth Analysis

Nonlinear unmixing of hyperspectral images using a semiparametric model and spatial regularization

Revealing social networks of spammers through spectral clustering

Empirical estimation of entropy functionals with confidence

The First Stray Light Corrected EUV Images of Solar Coronal Holes

Order-preserving factor analysis (OPFA)

Performance Bounds for Sparse Parametric Covariance Estimation in Gaussian Models

Recursive $\ell_{1,\infty}$ Group lasso

Sensor Management: Past, Present, and Future

Robust Shrinkage Estimation of High-dimensional Covariance Matrices

Spatio-Temporal Graphical Model Selection

Covariance estimation in decomposable Gaussian graphical models

Shrinkage Algorithms for MMSE Covariance Estimation

Decomposable Principal Component Analysis

Sparse image reconstruction for molecular imaging