Source author record

Alfred O. Hero

Alfred O. Hero appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

37works

25topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A unified framework for correlation mining in ultra-high dimension

Many applications benefit from theory relevant to the identification of variables having large correlations or partial correlations in high dimension. Recently there has been progress in the ultra-high dimensional setting when the sample size $n$ is fixed and the dimension $p$ tends to infinity. Despite these advances, the correlation screening framework suffers from practical, methodological and theoretical deficiencies. For instance, previous correlation screening theory requires that the population covariance matrix be sparse and block diagonal. This block sparsity assumption is however restrictive in practical applications. As a second example, correlation and partial correlation screening requires the estimation of dependence measures, which can be computationally prohibitive. In this paper, we propose a unifying approach to correlation and partial correlation mining that is not restricted to block diagonal correlation structure, thus yielding a methodology that is suitable for modern applications. By making connections to random geometric graphs, the number of highly correlated or partial correlated variables are shown to have compound Poisson finite-sample characterizations, which hold for both the finite $p$ case and when $p$ tends to infinity. The unifying framework also demonstrates a duality between correlation and partial correlation screening with theoretical and practical consequences.

preprint2022arXiv

An Improvement on the Hotelling $T^2$ Test Using the Ledoit-Wolf Nonlinear Shrinkage Estimator

Hotelling's $T^2$ test is a classical approach for discriminating the means of two multivariate normal samples that share a population covariance matrix. Hotelling's test is not ideal for high-dimensional samples because the eigenvalues of the estimated sample covariance matrix are inconsistent estimators for their population counterparts. We replace the sample covariance matrix with the nonlinear shrinkage estimator of Ledoit and Wolf 2020. We observe empirically for sub-Gaussian data that the resulting algorithm dominates past methods (Bai and Saranadasa 1996, Chen and Qin 2010, and Li et al. 2020) for a family of population covariance matrices that includes matrices with high or low condition number and many or few nontrivial -- i.e., spiked -- eigenvalues.

preprint2020arXiv

A Geometric Approach to Online Streaming Feature Selection

Online Streaming Feature Selection (OSFS) is a sequential learning problem where individual features across all samples are made available to algorithms in a streaming fashion. In this work, firstly, we assert that OSFS's main assumption of having data from all the samples available at runtime is unrealistic and introduce a new setting where features and samples are streamed concurrently called OSFS with Streaming Samples (OSFS-SS). Secondly, the primary OSFS method, SAOLA utilizes an unbounded mutual information measure and requires multiple comparison steps between the stored and incoming feature sets to evaluate a feature's importance. We introduce Geometric Online Adaption, an algorithm that requires relatively less feature comparison steps and uses a bounded conditional geometric dependency measure. Our algorithm outperforms several OSFS baselines including SAOLA on a variety of datasets. We also extend SAOLA to work in the OSFS-SS setting and show that GOA continues to achieve the best results. Thirdly, the current paradigm of the OSFS algorithm comparison is flawed. Algorithms are measured by comparing the number of features used and the accuracy obtained by the learner, two properties that are fundamentally at odds with one another. Without fixing a limit on either of these properties, the qualities of features obtained by different algorithms are incomparable. We try to rectify this inconsistency by fixing the maximum number of features available to the learner and comparing algorithms in terms of their accuracy. Additionally, we characterize the behaviour of SAOLA and GOA on feature sets derived from popular deep convolutional featurizers.

preprint2020arXiv

Learning to Bound the Multi-class Bayes Error

In the context of supervised learning, meta learning uses features, metadata and other information to learn about the difficulty, behavior, or composition of the problem. Using this knowledge can be useful to contextualize classifier results or allow for targeted decisions about future data sampling. In this paper, we are specifically interested in learning the Bayes error rate (BER) based on a labeled data sample. Providing a tight bound on the BER that is also feasible to estimate has been a challenge. Previous work[1] has shown that a pairwise bound based on the sum of Henze-Penrose (HP) divergence over label pairs can be directly estimated using a sum of Friedman-Rafsky (FR) multivariate run test statistics. However, in situations in which the dataset and number of classes are large, this bound is computationally infeasible to calculate and may not be tight. Other multi-class bounds also suffer from computationally complex estimation procedures. In this paper, we present a generalized HP divergence measure that allows us to estimate the Bayes error rate with log-linear computation. We prove that the proposed bound is tighter than both the pairwise method and a bound proposed by Lin [2]. We also empirically show that these bounds are close to the BER. We illustrate the proposed method on the MNIST dataset, and show its utility for the evaluation of feature reduction strategies. We further demonstrate an approach for evaluation of deep learning architectures using the proposed bounds.

preprint2020arXiv

OrthoReg: Robust Network Pruning Using Orthonormality Regularization

Network pruning in Convolutional Neural Networks (CNNs) has been extensively investigated in recent years. To determine the impact of pruning a group of filters on a network's accuracy, state-of-the-art pruning methods consistently assume filters of a CNN are independent. This allows the importance of a group of filters to be estimated as the sum of importances of individual filters. However, overparameterization in modern networks results in highly correlated filters that invalidate this assumption, thereby resulting in incorrect importance estimates. To address this issue, we propose OrthoReg, a principled regularization strategy that enforces orthonormality on a network's filters to reduce inter-filter correlation, thereby allowing reliable, efficient determination of group importance estimates, improved trainability of pruned networks, and efficient, simultaneous pruning of large groups of filters. When used for iterative pruning on VGG-13, MobileNet-V1, and ResNet-34, OrthoReg consistently outperforms five baseline techniques, including the state-of-the-art, on CIFAR-100 and Tiny-ImageNet. For the recently proposed Early-Bird Ticket hypothesis, which claims networks become amenable to pruning early-on in training and can be pruned after a few epochs to minimize training expenditure, we find OrthoReg significantly outperforms prior work. Code available at https://github.com/EkdeepSLubana/OrthoReg.

preprint2020arXiv

Pattern-Based Analysis of Time Series: Estimation

While Internet of Things (IoT) devices and sensors create continuous streams of information, Big Data infrastructures are deemed to handle the influx of data in real-time. One type of such a continuous stream of information is time series data. Due to the richness of information in time series and inadequacy of summary statistics to encapsulate structures and patterns in such data, development of new approaches to learn time series is of interest. In this paper, we propose a novel method, called pattern tree, to learn patterns in the times-series using a binary-structured tree. While a pattern tree can be used for many purposes such as lossless compression, prediction and anomaly detection, in this paper we focus on its application in time series estimation and forecasting. In comparison to other methods, our proposed pattern tree method improves the mean squared error of estimation.

preprint2020arXiv

Predicting solar flares with machine learning: investigating solar cycle dependence

A deep learning network, Long-Short Term Memory (LSTM) network, is used in this work to predict whether the maximum flare class an active region (AR) will produce in the next 24 hours is class $Γ$. We considered $Γ$ are $\ge M$, $\ge C$ and any flare class. The essence of using LSTM, which is a recurrent neural network, is its capability to capture temporal information of the data samples. The input features are time sequences of 20 magnetic parameters from SHARPs - Space-weather HMI Active Region Patches. We analyzed active regions from June 2010 to Dec 2018, using the Geostationary Operational Environmental Satellite (GOES) X-ray flare catalogs and label the data samples with identified ARs in the GOES X-ray flare catalogs. Our results (i) shows consistent skill scores with recently published results using LSTMs and better than the previous work using single time input (eg. DeFN) (ii) The skill scores from the model show essential differences when different years of data was chosen for training and testing.

preprint2020arXiv

Robust Distributed Fixed-Time Economic Dispatch under Time-Varying Topology

The centralized power generation infrastructure that defines the North American electric grid is slowly moving to the distributed architecture due to the explosion in use of renewable generation and distributed energy resources (DERs), such as residential solar, wind turbines and battery storage. Furthermore, variable pricing policies and profusion of flexible loads entail frequent and severe changes in power outputs required from the individual generation units, requiring fast availability of power allocation. To this end, a fixed-time convergent, fully distributed economic dispatch algorithm for scheduling optimal power generation among a set of DERs is proposed. The proposed algorithm incorporates both load balance and generation capacity constraints.

preprint2020arXiv

Testing that a Local Optimum of the Likelihood is Globally Optimum using Reparameterized Embeddings

Many mathematical imaging problems are posed as non-convex optimization problems. When numerically tractable global optimization procedures are not available, one is often interested in testing ex post facto whether or not a locally convergent algorithm has found the globally optimal solution. When the problem is formulated in terms of maximizing the likelihood function under a statistical model for the measurements, one can construct a statistical test that a local maximum is in fact the global maximum. A one-sided test is proposed for the case that the statistical model is a member of the generalized location family of probability distributions, a condition often satisfied in imaging and other inverse problems. We propose a general method for improving the accuracy of the test by reparameterizing the likelihood function to embed its domain into a higher dimensional parameter space. We show that the proposed global maximum testing method results in improved accuracy and reduced computation for a physically-motivated joint-inverse problem arising in camera-blur estimation.

preprint2019arXiv

Identifying Solar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters

We present several methods towards construction of precursors, which show great promise towards early predictions, of solar flare events in this paper. A data pre-processing pipeline is built to extract useful data from multiple sources, Geostationary Operational Environmental Satellites (GOES) and Solar Dynamics Observatory (SDO)/Helioseismic and Magnetic Imager (HMI), to prepare inputs for machine learning algorithms. Two classification models are presented: classification of flares from quiet times for active regions and classification of strong versus weak flare events. We adopt deep learning algorithms to capture both the spatial and temporal information from HMI magnetogram data. Effective feature extraction and feature selection with raw magnetogram data using deep learning and statistical algorithms enable us to train classification models to achieve almost as good performance as using active region parameters provided in HMI/Space-Weather HMI-Active Region Patch (SHARP) data files. Case studies show a significant increase in the prediction score around 20 hours before strong solar flare events.

preprint2016arXiv

Diversion Detection in Partially Observed Nuclear Fuel Cycle Networks

A nuclear fuel cycle contains several facilities with different purposes such as mining, conversion, enrichment, and fuel rod fabrication. These facilities form a network, which is naturally sparse in the number of connections (i.e., edges) since not every facility directly interacts with all the others. Given the knowledge of a network baseline, we are interested in detecting anomalous activities in this network, which may signal the diversion of nuclear materials. Anomalies can take the form of a new or missing edge or abnormal rates of interaction. However, often it is not possible to observe the entire network traffic directly due to some constraints such as cost, physical limitations, or laws. By treating the unobserved network traffic as latent variables, we propose estimators for the true network traffic, including the anomalous activity, to use in testing for significant deviations from the baseline. We provide simulation results of a simple network of facilities and show that our estimators have superior performance over existing alternatives. Additionally, we establish that while a good estimate of the network traffic is necessary, perfect reconstruction is not required to effectively detect anomalous network activity. Instead it suffices to detect perturbations within the network at an aggregate or global scale.

preprint2016arXiv

Learning to classify with possible sensor failures

In this paper, we propose a general framework to learn a robust large-margin binary classifier when corrupt measurements, called anomalies, caused by sensor failure might be present in the training set. The goal is to minimize the generalization error of the classifier on non-corrupted measurements while controlling the false alarm rate associated with anomalous samples. By incorporating a non-parametric regularizer based on an empirical entropy estimator, we propose a Geometric-Entropy-Minimization regularized Maximum Entropy Discrimination (GEM-MED) method to learn to classify and detect anomalies in a joint manner. We demonstrate using simulated data and a real multimodal data set. Our GEM-MED method can yield improved performance over previous robust classification methods in terms of both classification accuracy and anomaly detection rate.

preprint2016arXiv

Measure-Transformed Quasi Maximum Likelihood Estimation

In this paper the Gaussian quasi maximum likelihood estimator (GQMLE) is generalized by applying a transform to the probability distribution of the data. The proposed estimator, called measure-transformed GQMLE (MT-GQMLE), minimizes the empirical Kullback-Leibler divergence between a transformed probability distribution of the data and a hypothesized Gaussian probability measure. By judicious choice of the transform we show that, unlike the GQMLE, the proposed estimator can gain sensitivity to higher-order statistical moments and resilience to outliers leading to significant mitigation of the model mismatch effect on the estimates. Under some mild regularity conditions we show that the MT-GQMLE is consistent, asymptotically normal and unbiased. Furthermore, we derive a necessary and sufficient condition for asymptotic efficiency. A data driven procedure for optimal selection of the measure transformation parameters is developed that minimizes the trace of an empirical estimate of the asymptotic mean-squared-error matrix. The MT-GQMLE is applied to linear regression and source localization and numerical comparisons illustrate its robustness and resilience to outliers.

preprint2016arXiv

Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion Detection

Many modern datasets can be represented as graphs and hence spectral decompositions such as graph principal component analysis (PCA) can be useful. Distinct from previous graph decomposition approaches based on subspace projection of a single topological feature, e.g., the Fiedler vector of centered graph adjacency matrix (graph Laplacian), we propose spectral decomposition approaches to graph PCA and graph dictionary learning that integrate multiple features, including graph walk statistics, centrality measures and graph distances to reference nodes. In this paper we propose a new PCA method for single graph analysis, called multi-centrality graph PCA (MC-GPCA), and a new dictionary learning method for ensembles of graphs, called multi-centrality graph dictionary learning (MC-GDL), both based on spectral decomposition of multi-centrality matrices. As an application to cyber intrusion detection, MC-GPCA can be an effective indicator of anomalous connectivity pattern and MC-GDL can provide discriminative basis for attack classification.

preprint2016arXiv

Online Diversion Detection in Nuclear Fuel Cycles via Multimodal Observations

In nuclear fuel cycles, an enrichment facility typically provides low enriched uranium (LEU) to a number of customers. We consider monitoring an enrichment facility to timely detect a possible diversion of highly enriched uranium (HEU). To increase the the detection accuracy it is important to efficiently use the available information diversity. In this work, it is assumed that the shipment times and the power consumption of the enrichment facility are observed for each shipment of enriched uranium. We propose to initially learn the statistical patterns of the enrichment facility through the bimodal observations in a training period, that is known to be free of diversions. Then, for the goal of timely diversion detection, we propose to use an online detection algorithm which sequentially compares each set of new observations in the test period, which possibly includes diversions, to the learned patterns, and raises a diversion alarm when a significant statistical deviation is detected. The efficacy of the proposed method is shown by comparing its detection performance to those of the traditional detection methods in the Statistics literature.

preprint2016arXiv

Robust training on approximated minimal-entropy set

preprint2015arXiv

A Dictionary Approach to EBSD Indexing

We propose a framework for indexing of grain and sub-grain structures in electron backscatter diffraction (EBSD) images of polycrystalline materials. The framework is based on a previously introduced physics-based forward model by Callahan and De Graef (2013) relating measured patterns to grain orientations (Euler angle). The forward model is tuned to the microscope and the sample symmetry group. We discretize the domain of the forward model onto a dense grid of Euler angles and for each measured pattern we identify the most similar patterns in the dictionary. These patterns are used to identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixels 3x3 neighborhood. Indexing is accomplished by computing the mean orientation of the closest dictionary matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic 3-sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification. We demonstrate the proposed dictionary-based approach on a Ni-base IN100 alloy.

preprint2015arXiv

Deep Community Detection

A deep community in a graph is a connected component that can only be seen after removal of nodes or edges from the rest of the graph. This paper formulates the problem of detecting deep communities as multi-stage node removal that maximizes a new centrality measure, called the local Fiedler vector centrality (LFVC), at each stage. The LFVC is associated with the sensitivity of algebraic connectivity to node or edge removals. We prove that a greedy node/edge removal strategy, based on successive maximization of LFVC, has bounded performance loss relative to the optimal, but intractable, combinatorial batch removal strategy. Under a stochastic block model framework, we show that the greedy LFVC strategy can extract deep communities with probability one as the number of observations becomes large. We apply the greedy LFVC strategy to real-world social network datasets. Compared with conventional community detection methods we demonstrate improved ability to identify important communities and key members in the network.

preprint2015arXiv

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.

preprint2015arXiv

Foundational principles for large scale inference: Illustrations through correlation mining

When can reliable inference be drawn in the "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, with implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics the dataset is often variable-rich but sample-starved: a regime where the number $n$ of acquired samples (statistical replicates) is far fewer than the number $p$ of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data." Sample complexity however has received relatively less attention, especially in the setting when the sample size $n$ is fixed, and the dimension $p$ grows without bound. To address this gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where the variable dimension is fixed and the sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa-scale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables that are of interest. We demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

preprint2015arXiv

Multimodal Factor Analysis

A multimodal system with Poisson, Gaussian, and multinomial observations is considered. A generative graphical model that combines multiple modalities through common factor loadings is proposed. In this model, latent factors are like summary objects that has latent factor scores in each modality, and the observed objects are represented in terms of such summary objects. This potentially brings about a significant dimensionality reduction. It also naturally enables a powerful means of clustering based on a diverse set of observations. An expectation-maximization (EM) algorithm to find the model parameters is provided. The algorithm is tested on a Twitter dataset which consists of the counts and geographical coordinates of hashtag occurrences, together with the bag of words for each hashtag. The resultant factors successfully localizes the hashtags in all dimensions: counts, coordinates, topics. The algorithm is also extended to accommodate von Mises-Fisher distribution, which is used to model the spherical coordinates.

preprint2014arXiv

Empirical non-parametric estimation of the Fisher Information

The Fisher information matrix (FIM) is a foundational concept in statistical signal processing. The FIM depends on the probability distribution, assumed to belong to a smooth parametric family. Traditional approaches to estimating the FIM require estimating the probability distribution function (PDF), or its parameters, along with its gradient or Hessian. However, in many practical situations the PDF of the data is not known but the statistician has access to an observation sample for any parameter value. Here we propose a method of estimating the FIM directly from sampled data that does not require knowledge of the underlying PDF. The method is based on non-parametric estimation of an $f$-divergence over a local neighborhood of the parameter space and a relation between curvature of the $f$-divergence and the FIM. Thus we obtain an empirical estimator of the FIM that does not require density estimation and is asymptotically consistent. We empirically evaluate the validity of our approach using two experiments.

preprint2014arXiv

Jointly Sparse Global SIMPLS Regression

Partial least squares (PLS) regression combines dimensionality reduction and prediction using a latent variable model. Since partial least squares regression (PLS-R) does not require matrix inversion or diagonalization, it can be applied to problems with large numbers of variables. As predictor dimension increases, variable selection becomes essential to avoid over-fitting, to provide more accurate predictors and to yield more interpretable parameters. We propose a global variable selection approach that penalizes the total number of variables across all PLS components. Put another way, the proposed global penalty encourages the selected variables to be shared among the PLS components. We formulate PLS-R with joint sparsity as a variational optimization problem with objective function equal to a novel global SIMPLS criterion plus a mixed norm sparsity penalty on the weight matrix. The mixed norm sparsity penalty is the $\ell_1$ norm of the $\ell_2$ norm on the weights corresponding to the same variable used over all the PLS components. A novel augmented Lagrangian method is proposed to solve the optimization problem and soft thresholding for sparsity occurs naturally as part of the iterative solution. Experiments show that the modified PLS-R attains better or as good performance with many fewer selected predictor variables.

preprint2013arXiv

A Hamilton-Jacobi equation for the continuum limit of non-dominated sorting

We show that non-dominated sorting of a sequence of i.i.d. random variables in Euclidean space has a continuum limit that corresponds to solving a Hamilton-Jacobi equation involving the probability density function of the random variables. Non-dominated sorting is a fundamental problem in multi-objective optimization, and is equivalent to finding the canonical antichain partition and to problems involving the longest chain among Euclidean points. As an application of this result, we show that non-dominated sorting is asymptotically stable under random perturbations in the data. We give a numerical scheme for computing the viscosity solution of this Hamilton-Jacobi equation and present some numerical simulations for various density functions.

preprint2013arXiv

A PDE-based approach to non-dominated sorting

Non-dominated sorting is a fundamental combinatorial problem in multiobjective optimization, and is equivalent to the longest chain problem in combinatorics and random growth models for crystals in materials science. In a previous work, we showed that non-dominated sorting has a continuum limit that corresponds to solving a Hamilton-Jacobi equation. In this work we present and analyze a fast numerical scheme for this Hamilton-Jacobi equation, and show how it can be used to design a fast algorithm for approximate non-dominated sorting.

preprint2013arXiv

Nonlinear unmixing of hyperspectral images: models and algorithms

When considering the problem of unmixing hyperspectral images, most of the literature in the geoscience and image processing areas relies on the widely used linear mixing model (LMM). However, the LMM may be not valid and other nonlinear models need to be considered, for instance, when there are multi-scattering effects or intimate interactions. Consequently, over the last few years, several significant contributions have been proposed to overcome the limitations inherent in the LMM. In this paper, we present an overview of recent advances in nonlinear unmixing modeling.

preprint2013arXiv

Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM

We present a variational Bayesian method of joint image reconstruction and point spread function (PSF) estimation when the PSF of the imaging device is only partially known. To solve this semi-blind deconvolution problem, prior distributions are specified for the PSF and the 3D image. Joint image reconstruction and PSF estimation is then performed within a Bayesian framework, using a variational algorithm to estimate the posterior distribution. The image prior distribution imposes an explicit atomic measure that corresponds to image sparsity. Importantly, the proposed Bayesian deconvolution algorithm does not require hand tuning. Simulation results clearly demonstrate that the semi-blind deconvolution algorithm compares favorably with previous Markov chain Monte Carlo (MCMC) version of myopic sparse reconstruction. It significantly outperforms mismatched non-blind algorithms that rely on the assumption of the perfect knowledge of the PSF. The algorithm is illustrated on real data from magnetic resonance force microscopy (MRFM).

preprint2012arXiv

Kullback Proximal Algorithms for Maximum Likelihood Estimation

Accelerated algorithms for maximum likelihood image reconstruction are essential for emerging applications such as 3D tomography, dynamic tomographic imaging, and other high dimensional inverse problems. In this paper, we introduce and analyze a class of fast and stable sequential optimization methods for computing maximum likelihood estimates and study its convergence properties. These methods are based on a {\it proximal point algorithm} implemented with the Kullback-Liebler (KL) divergence between posterior densities of the complete data as a proximal penalty function. When the proximal relaxation parameter is set to unity one obtains the classical expectation maximization (EM) algorithm. For a decreasing sequence of relaxation parameters, relaxed versions of EM are obtained which can have much faster asymptotic convergence without sacrifice of monotonicity. We present an implementation of the algorithm using Moré's {\it Trust Region} update strategy. For illustration the method is applied to a non-quadratic inverse problem with Poisson distributed data.

preprint2012arXiv

On EM algorithms and their proximal generalizations

In this paper, we analyze the celebrated EM algorithm from the point of view of proximal point algorithms. More precisely, we study a new type of generalization of the EM procedure introduced in \cite{Chretien&Hero:98} and called Kullback-proximal algorithms. The proximal framework allows us to prove new results concerning the cluster points. An essential contribution is a detailed analysis of the case where some cluster points lie on the boundary of the parameter space.

preprint2012arXiv

On Measure Transformed Canonical Correlation Analysis

In this paper linear canonical correlation analysis (LCCA) is generalized by applying a structured transform to the joint probability distribution of the considered pair of random vectors, i.e., a transformation of the joint probability measure defined on their joint observation space. This framework, called measure transformed canonical correlation analysis (MTCCA), applies LCCA to the data after transformation of the joint probability measure. We show that judicious choice of the transform leads to a modified canonical correlation analysis, which, in contrast to LCCA, is capable of detecting non-linear relationships between the considered pair of random vectors. Unlike kernel canonical correlation analysis, where the transformation is applied to the random vectors, in MTCCA the transformation is applied to their joint probability distribution. This results in performance advantages and reduced implementation complexity. The proposed approach is illustrated for graphical model selection in simulated data having non-linear dependencies, and for measuring long-term associations between companies traded in the NASDAQ and NYSE stock markets.

preprint2012arXiv

Semi-blind Sparse Image Reconstruction with Application to MRFM

We propose a solution to the image deconvolution problem where the convolution kernel or point spread function (PSF) is assumed to be only partially known. Small perturbations generated from the model are exploited to produce a few principal components explaining the PSF uncertainty in a high dimensional space. Unlike recent developments on blind deconvolution of natural images, we assume the image is sparse in the pixel basis, a natural sparsity arising in magnetic resonance force microscopy (MRFM). Our approach adopts a Bayesian Metropolis-within-Gibbs sampling framework. The performance of our Bayesian semi-blind algorithm for sparse images is superior to previously proposed semi-blind algorithms such as the alternating minimization (AM) algorithm and blind algorithms developed for natural images. We illustrate our myopic algorithm on real MRFM tobacco virus data.

preprint2011arXiv

Large Scale Correlation Screening

This paper treats the problem of screening for variables with high correlations in high dimensional data in which there can be many fewer samples than variables. We focus on threshold-based correlation screening methods for three related applications: screening for variables with large correlations within a single treatment (autocorrelation screening); screening for variables with large cross-correlations over two treatments (cross-correlation screening); screening for variables that have persistently large auto-correlations over two treatments (persistent-correlation screening). The novelty of correlation screening is that it identifies a smaller number of variables which are highly correlated with others, as compared to identifying a number of correlation parameters. Correlation screening suffers from a phase transition phenomenon: as the correlation threshold decreases the number of discoveries increases abruptly. We obtain asymptotic expressions for the mean number of discoveries and the phase transition thresholds as a function of the number of samples, the number of variables, and the joint sample distribution. We also show that under a weak dependency condition the number of discoveries is dominated by a Poisson random variable giving an asymptotic expression for the false positive rate. The correlation screening approach bears tremendous dividends in terms of the type and strength of the asymptotic results that can be obtained. It also overcomes some of the major hurdles faced by existing methods in the literature as correlation screening is naturally scalable to high dimension. Numerical results strongly validate the theory that is presented in this paper. We illustrate the application of the correlation screening methodology on a large scale gene-expression dataset, revealing a few influential variables that exhibit a significant amount of correlation over multiple treatments.

preprint2010arXiv

Identification and Query of Activated Gene Pathways in Disease Progression

Disease occurs due to aberrant expression of genes and modulation of the biological pathways along which they lie. Inference of activated gene pathways, using gene expression data during disease progression, is an important problem. In this work, we have developed a generalizable framework for the identification of interacting pathways while incorporating biological realism, using functional data analysis and manifold embedding techniques. Additionally, we have also developed a new method to query for the differential co-ordinated activity of any desired pathway during disease progression. The methods developed in this work can be generalized to any conditions of interest.

preprint2010arXiv

Regularized Least-Mean-Square Algorithms

We consider adaptive system identification problems with convex constraints and propose a family of regularized Least-Mean-Square (LMS) algorithms. We show that with a properly selected regularization parameter the regularized LMS provably dominates its conventional counterpart in terms of mean square deviations. We establish simple and closed-form expressions for choosing this regularization parameter. For identifying an unknown sparse system we propose sparse and group-sparse LMS algorithms, which are special examples of the regularized LMS family. Simulation results demonstrate the advantages of the proposed filters in both convergence rate and steady-state error under sparsity assumptions on the true coefficient vector.

preprint2009arXiv

Hierarchical Bayesian sparse image reconstruction with application to MRFM

This paper presents a hierarchical Bayesian model to reconstruct sparse images when the observations are obtained from linear transformations and corrupted by an additive white Gaussian noise. Our hierarchical Bayes model is well suited to such naturally sparse image applications as it seamlessly accounts for properties such as sparsity and positivity of the image via appropriate Bayes priors. We propose a prior that is based on a weighted mixture of a positive exponential distribution and a mass at zero. The prior has hyperparameters that are tuned automatically by marginalization over the hierarchical Bayesian model. To overcome the complexity of the posterior distribution, a Gibbs sampling strategy is proposed. The Gibbs samples can be used to estimate the image to be recovered, e.g. by maximizing the estimated posterior distribution. In our fully Bayesian approach the posteriors of all the parameters are available. Thus our algorithm provides more information than other previously proposed sparse reconstruction methods that only give a point estimate. The performance of our hierarchical Bayesian sparse reconstruction method is illustrated on synthetic and real data collected from a tobacco virus sample using a prototype MRFM instrument.

preprint2009arXiv

Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery

This paper studies a fully Bayesian algorithm for endmember extraction and abundance estimation for hyperspectral imagery. Each pixel of the hyperspectral image is decomposed as a linear combination of pure endmember spectra following the linear mixing model. The estimation of the unknown endmember spectra is conducted in a unified manner by generating the posterior distribution of abundances and endmember parameters under a hierarchical Bayesian model. This model assumes conjugate prior distributions for these parameters, accounts for non-negativity and full-additivity constraints, and exploits the fact that the endmember proportions lie on a lower dimensional simplex. A Gibbs sampler is proposed to overcome the complexity of evaluating the resulting posterior distribution. This sampler generates samples distributed according to the posterior distribution and estimates the unknown parameters using these generated samples. The accuracy of the joint Bayesian estimator is illustrated by simulations conducted on synthetic and real AVIRIS images.

preprint2008arXiv

Practical recipes for the model order reduction, dynamical simulation, and compressive sampling of large-scale open quantum systems

This article presents numerical recipes for simulating high-temperature and non-equilibrium quantum spin systems that are continuously measured and controlled. The notion of a spin system is broadly conceived, in order to encompass macroscopic test masses as the limiting case of large-j spins. The simulation technique has three stages: first the deliberate introduction of noise into the simulation, then the conversion of that noise into an equivalent continuous measurement and control process, and finally, projection of the trajectory onto a state-space manifold having reduced dimensionality and possessing a Kahler potential of multi-linear form. The resulting simulation formalism is used to construct a positive P-representation for the thermal density matrix. Single-spin detection by magnetic resonance force microscopy (MRFM) is simulated, and the data statistics are shown to be those of a random telegraph signal with additive white noise. Larger-scale spin-dust models are simulated, having no spatial symmetry and no spatial ordering; the high-fidelity projection of numerically computed quantum trajectories onto low-dimensionality Kahler state-space manifolds is demonstrated. The reconstruction of quantum trajectories from sparse random projections is demonstrated, the onset of Donoho-Stodden breakdown at the Candes-Tao sparsity limit is observed, a deterministic construction for sampling matrices is given, and methods for quantum state optimization by Dantzig selection are given.

Alfred O. Hero

What is connected

Connect this record

See the researcher in context

Building this map preview

37 published item(s)

A unified framework for correlation mining in ultra-high dimension

An Improvement on the Hotelling $T^2$ Test Using the Ledoit-Wolf Nonlinear Shrinkage Estimator

A Geometric Approach to Online Streaming Feature Selection

Learning to Bound the Multi-class Bayes Error

OrthoReg: Robust Network Pruning Using Orthonormality Regularization

Pattern-Based Analysis of Time Series: Estimation

Predicting solar flares with machine learning: investigating solar cycle dependence

Robust Distributed Fixed-Time Economic Dispatch under Time-Varying Topology

Testing that a Local Optimum of the Likelihood is Globally Optimum using Reparameterized Embeddings

Identifying Solar Flare Precursors Using Time Series of SDO/HMI Images and SHARP Parameters

Diversion Detection in Partially Observed Nuclear Fuel Cycle Networks

Learning to classify with possible sensor failures

Measure-Transformed Quasi Maximum Likelihood Estimation

Multi-centrality Graph Spectral Decompositions and their Application to Cyber Intrusion Detection

Online Diversion Detection in Nuclear Fuel Cycles via Multimodal Observations

Robust training on approximated minimal-entropy set

A Dictionary Approach to EBSD Indexing

Deep Community Detection

Empirically Estimable Classification Bounds Based on a New Divergence Measure

Foundational principles for large scale inference: Illustrations through correlation mining

Multimodal Factor Analysis

Empirical non-parametric estimation of the Fisher Information

Jointly Sparse Global SIMPLS Regression

A Hamilton-Jacobi equation for the continuum limit of non-dominated sorting

A PDE-based approach to non-dominated sorting

Nonlinear unmixing of hyperspectral images: models and algorithms

Variational Semi-blind Sparse Deconvolution with Orthogonal Kernel Bases and its Application to MRFM

Kullback Proximal Algorithms for Maximum Likelihood Estimation

On EM algorithms and their proximal generalizations

On Measure Transformed Canonical Correlation Analysis

Semi-blind Sparse Image Reconstruction with Application to MRFM

Large Scale Correlation Screening

Identification and Query of Activated Gene Pathways in Disease Progression

Regularized Least-Mean-Square Algorithms

Hierarchical Bayesian sparse image reconstruction with application to MRFM

Joint Bayesian endmember extraction and linear unmixing for hyperspectral imagery

Practical recipes for the model order reduction, dynamical simulation, and compressive sampling of large-scale open quantum systems