Source author record

Boaz Nadler

Boaz Nadler appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning math.ST Statistics Theory math.OC Information Theory math.IT math.NA eess.SP Molecular Networks Numerical Analysis physics.comp-ph physics.optics Quantitative Methods

Catalog footprint

What is connected

19works

13topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Distributed Sparse Linear Regression under Communication Constraints

In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and thus a tight communication budget. In this work we focus on distributed learning of a sparse linear regression model, under severe communication constraints. We propose several two round distributed schemes, whose communication per machine is sublinear in the data dimension. In our schemes, individual machines compute debiased lasso estimators, but send to the fusion center only very few values. On the theoretical front, we analyze one of these schemes and prove that with high probability it achieves exact support recovery at low signal to noise ratios, where individual machines fail to recover the support. We show in simulations that our scheme works as well as, and in some cases better, than more communication intensive approaches.

preprint2022arXiv

Distributed Sparse Normal Means Estimation with Sublinear Communication

We consider the problem of sparse normal means estimation in a distributed setting with communication constraints. We assume there are $M$ machines, each holding $d$-dimensional observations of a $K$-sparse vector $μ$ corrupted by additive Gaussian noise. The $M$ machines are connected in a star topology to a fusion center, whose goal is to estimate the vector $μ$ with a low communication budget. Previous works have shown that to achieve the centralized minimax rate for the $\ell_2$ risk, the total communication must be high - at least linear in the dimension $d$. This phenomenon occurs, however, at very weak signals. We show that at signal-to-noise ratios (SNRs) that are sufficiently high - but not enough for recovery by any individual machine - the support of $μ$ can be correctly recovered with significantly less communication. Specifically, we present two algorithms for distributed estimation of a sparse mean vector corrupted by either Gaussian or sub-Gaussian noise. We then prove that above certain SNR thresholds, with high probability, these algorithms recover the correct support with total communication that is sublinear in the dimension $d$. Furthermore, the communication decreases exponentially as a function of signal strength. If in addition $KM\ll \tfrac{d}{\log d}$, then with an additional round of sublinear communication, our algorithms achieve the centralized rate for the $\ell_2$ risk. Finally, we present simulations that illustrate the performance of our algorithms in different parameter regimes.

preprint2022arXiv

GNMR: A provable one-line algorithm for low rank matrix recovery

Low rank matrix recovery problems, including matrix completion and matrix sensing, appear in a broad range of applications. In this work we present GNMR -- an extremely simple iterative algorithm for low rank matrix recovery, based on a Gauss-Newton linearization. On the theoretical front, we derive recovery guarantees for GNMR in both the matrix sensing and matrix completion settings. Some of these results improve upon the best currently known for other methods. A key property of GNMR is that it implicitly keeps the factor matrices approximately balanced throughout its iterations. On the empirical front, we show that for matrix completion with uniform sampling, GNMR performs better than several popular methods, especially when given very few observations close to the information limit.

preprint2022arXiv

Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm

The inductive matrix completion (IMC) problem is to recover a low rank matrix from few observed entries while incorporating prior knowledge about its row and column subspaces. In this work, we make three contributions to the IMC problem: (i) we prove that under suitable conditions, the IMC optimization landscape has no bad local minima; (ii) we derive a simple scheme with theoretical guarantees to estimate the rank of the unknown matrix; and (iii) we propose GNIMC, a simple Gauss-Newton based method to solve the IMC problem, analyze its runtime and derive recovery guarantees for it. The guarantees for GNIMC are sharper in several aspects than those available for other methods, including a quadratic convergence rate, fewer required observed entries and stability to errors or deviations from low-rank. Empirically, given entries observed uniformly at random, GNIMC recovers the underlying matrix substantially faster than several competing methods.

preprint2020arXiv

Tight Recovery Guarantees for Orthogonal Matching Pursuit Under Gaussian Noise

Orthogonal Matching pursuit (OMP) is a popular algorithm to estimate an unknown sparse vector from multiple linear measurements of it. Assuming exact sparsity and that the measurements are corrupted by additive Gaussian noise, the success of OMP is often formulated as exactly recovering the support of the sparse vector. Several authors derived a sufficient condition for exact support recovery by OMP with high probability depending on the signal-to-noise ratio, defined as the magnitude of the smallest non-zero coefficient of the vector divided by the noise level. We make two contributions. First, we derive a slightly sharper sufficient condition for two variants of OMP, in which either the sparsity level or the noise level is known. Next, we show that this sharper sufficient condition is tight, in the following sense: for a wide range of problem parameters, there exist a dictionary of linear measurements and a sparse vector with a signal-to-noise ratio slightly below that of the sufficient condition, for which with high probability OMP fails to recover its support. Finally, we present simulations which illustrate that our condition is tight for a much broader range of dictionaries.

preprint2018arXiv

Robust Sparse Covariance Estimation by Thresholding Tyler's M-Estimator

Estimating a high-dimensional sparse covariance matrix from a limited number of samples is a fundamental problem in contemporary data analysis. Most proposals to date, however, are not robust to outliers or heavy tails. Towards bridging this gap, in this work we consider estimating a sparse shape matrix from $n$ samples following a possibly heavy tailed elliptical distribution. We propose estimators based on thresholding either Tyler's M-estimator or its regularized variant. We derive bounds on the difference in spectral norm between our estimators and the shape matrix in the joint limit as the dimension $p$ and sample size $n$ tend to infinity with $p/n\toγ>0$. These bounds are minimax rate-optimal. Results on simulated data support our theoretical analysis.

preprint2016arXiv

The discrete sign problem: uniqueness, recovery algorithms and phase retrieval applications

In this paper we consider the following real-valued and finite dimensional specific instance of the 1-D classical phase retrieval problem. Let ${\bf F}\in\mathbb{R}^N$ be an $N$-dimensional vector, whose discrete Fourier transform has a compact support. The sign problem is to recover ${\bf F}$ from its magnitude $|{\bf F}|$. First, in contrast to the classical 1-D phase problem which in general has multiple solutions, we prove that with sufficient over-sampling, the sign problem admits a unique solution. Next, we show that the sign problem can be viewed as a special case of a more general piecewise constant phase problem. Relying on this result, we derive a computationally efficient and robust to noise sign recovery algorithm. In the noise-free case and with a sufficiently high sampling rate, our algorithm is guaranteed to recover the true sign pattern. Finally, we present two phase retrieval applications of the sign problem: (i) vectorial phase retrieval with three measurement vectors; and (ii) recovery of two well separated 1-D objects.

preprint2016arXiv

Unsupervised Ensemble Learning with Dependent Classifiers

In unsupervised ensemble learning, one obtains predictions from multiple sources or classifiers, yet without knowing the reliability and expertise of each source, and with no labeled data to assess it. The task is to combine these possibly conflicting predictions into an accurate meta-learner. Most works to date assumed perfect diversity between the different sources, a property known as conditional independence. In realistic scenarios, however, this assumption is often violated, and ensemble learners based on it can be severely sub-optimal. The key challenges we address in this paper are:\ (i) how to detect, in an unsupervised manner, strong violations of conditional independence; and (ii) construct a suitable meta-learner. To this end we introduce a statistical model that allows for dependencies between classifiers. Our main contributions are the development of novel unsupervised methods to detect strongly dependent classifiers, better estimate their accuracies, and construct an improved meta-learner. Using both artificial and real datasets, we showcase the importance of taking classifier dependencies into account and the competitive performance of our approach.

preprint2015arXiv

Do semidefinite relaxations solve sparse PCA up to the information limit?

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an $\ell_0$-sparse eigenvector, in the asymptotic regime as dimension $p$ and sample size $n$ both tend to infinity. Amini and Wainwright [Ann. Statist. 37 (2009) 2877-2921] proved that for sparsity levels $k\geqΩ(n/\log p)$, no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for $k\leq O(\sqrt{n/\log p})$, diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when $k\geqΩ(\sqrt{n})$, the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of $\ell_0$-sparsity $k\geqΩ(\sqrt{n})$. Finally, we present empirical results suggesting that up to sparsity levels $k=O(\sqrt{n})$, recovery is possible by a simple covariance thresholding algorithm.

preprint2015arXiv

Learning Parametric-Output HMMs with Two Aliased States

In various applications involving hidden Markov models (HMMs), some of the hidden states are aliased, having identical output distributions. The minimality, identifiability and learnability of such aliased HMMs have been long standing problems, with only partial solutions provided thus far. In this paper we focus on parametric-output HMMs, whose output distributions come from a parametric family, and that have exactly two aliased states. For this class, we present a complete characterization of their minimality and identifiability. Furthermore, for a large family of parametric output distributions, we derive computationally efficient and statistically consistent algorithms to detect the presence of aliasing and learn the aliased HMM transition and emission parameters. We illustrate our theoretical analysis by several simulations.

preprint2015arXiv

Roy's Largest Root Test Under Rank-One Alternatives

Roy's largest root is a common test statistic in multivariate analysis, statistical signal processing and allied fields. Despite its ubiquity, provision of accurate and tractable approximations to its distribution under the alternative has been a longstanding open problem. Assuming Gaussian observations and a rank one alternative, or concentrated non-centrality, we derive simple yet accurate approximations for the most common low-dimensional settings. These include signal detection in noise, multiple response regression, multivariate analysis of variance and canonical correlation analysis. A small noise perturbation approach, perhaps underused in statistics, leads to simple combinations of standard univariate distributions, such as central and non-central $χ^2$ and $F$. Our results allow approximate power and sample size calculations for Roy's test for rank one effects, which is precisely where it is most powerful.

preprint2014arXiv

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

In various situations one is given only the predictions of multiple classifiers over a large unlabeled test data. This scenario raises the following questions: Without any labeled data and without any a-priori knowledge about the reliability of these different classifiers, is it possible to consistently and computationally efficiently estimate their accuracies? Furthermore, also in a completely unsupervised manner, can one construct a more accurate unsupervised ensemble classifier? In this paper, focusing on the binary case, we present simple, computationally efficient algorithms to solve these questions. Furthermore, under standard classifier independence assumptions, we prove our methods are consistent and study their asymptotic error. Our approach is spectral, based on the fact that the off-diagonal entries of the classifiers' covariance matrix and 3-d tensor are rank-one. We illustrate the competitive performance of our algorithms via extensive experiments on both artificial and real datasets.

preprint2014arXiv

Roy's largest root under rank-one alternatives:The complex valued case and applications

The largest eigenvalue of a Wishart matrix, known as Roy's largest root (RLR), plays an important role in a variety of applications. Most works to date derived approximations to its distribution under various asymptotic regimes, such as degrees of freedom, dimension, or both tending to infinity. However, several applications involve finite and relative small parameters, for which the above approximations may be inaccurate. Recently, via a small noise perturbation approach with fixed dimension and degrees of freedom, Johnstone and Nadler derived simple yet accurate stochastic approximations to the distribution of Roy's largest root in the real valued case, under a rank-one alternative. In this paper, we extend their results to the complex valued case. Furthermore, we analyze the behavior of the leading eigenvector by developing new stochastic approximations. Specifically, we derive simple stochastic approximations to the distribution of the largest eigenvalue under five common complex single-matrix and double-matrix scenarios. We then apply these results to investigate several problems in signal detection and communications. In particular, we analyze the performance of RLR detector in cognitive radio spectrum sensing and constant modulus signal detection in the high signal-to-noise ratio (SNR) regime. Moreover, we address the problem of determining the optimal transmit-receive antenna configuration (here optimality is in the sense of outage minimization) for rank-one multiple-input multiple-output Rician Fading channels at high SNR.

preprint2013arXiv

On learning parametric-output HMMs

We present a novel approach for learning an HMM whose outputs are distributed according to a parametric family. This is done by {\em decoupling} the learning task into two steps: first estimating the output parameters, and then estimating the hidden states transition probabilities. The first step is accomplished by fitting a mixture model to the output stationary distribution. Given the parameters of this mixture model, the second step is formulated as the solution of an easily solvable convex quadratic program. We provide an error analysis for the estimated transition probabilities and show they are robust to small perturbations in the estimates of the mixture parameters. Finally, we support our analysis with some encouraging empirical results.

preprint2013arXiv

Ranking and combining multiple predictors without labeled data

In a broad range of classification and decision making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier accuracy can be assessed using available labeled data, and raises two questions: given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to a) reliably rank them; and b) construct a meta-classifier more accurate than most classifiers in the ensemble? Here we present a novel spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, as its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), a novel ensemble classifier whose weights are equal to this eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting, for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.

preprint2012arXiv

Active Learning with Distributional Estimates

Active Learning (AL) is increasingly important in a broad range of applications. Two main AL principles to obtain accurate classification with few labeled data are refinement of the current decision boundary and exploration of poorly sampled regions. In this paper we derive a novel AL scheme that balances these two principles in a natural way. In contrast to many AL strategies, which are based on an estimated class conditional probability ^p(y|x), a key component of our approach is to view this quantity as a random variable, hence explicitly considering the uncertainty in its estimated value. Our main contribution is a novel mathematical framework for uncertainty-based AL, and a corresponding AL scheme, where the uncertainty in ^p(y|x) is modeled by a second-order distribution. On the practical side, we show how to approximate such second-order distributions for kernel density classification. Finally, we find that over a large number of UCI, USPS and Caltech4 datasets, our AL scheme achieves significantly better learning curves than popular AL methods such as uncertainty sampling and error reduction sampling, when all use the same kernel density classifier.

preprint2012arXiv

Minimax bounds for sparse PCA with noisy high-dimensional data

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the $l_2$ loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.

preprint2011arXiv

Vectorial Phase Retrieval for Linear Characterization of Attosecond Pulses

The waveforms of attosecond pulses produced by high-harmonic generation carry information on the electronic structure and dynamics in atomic and molecular systems. Current methods for the temporal characterization of such pulses have limited sensitivity and impose significant experimental complexity. We propose a new linear and all-optical method inspired by widely-used multi-dimensional phase retrieval algorithms. Our new scheme is based on the spectral measurement of two attosecond sources and their interference. As an example, we focus on the case of spectral polarization measurements of attosecond pulses, relying on their most fundamental property -- being well confined in time. We demonstrate this method numerically reconstructing the temporal profiles of attosecond pulses generated from aligned $CO_2$ molecules.

preprint2006arXiv

Variable-free exploration of stochastic models: a gene regulatory network example

Finding coarse-grained, low-dimensional descriptions is an important task in the analysis of complex, stochastic models of gene regulatory networks. This task involves (a) identifying observables that best describe the state of these complex systems and (b) characterizing the dynamics of the observables. In a previous paper [13], we assumed that good observables were known a priori, and presented an equation-free approach to approximate coarse-grained quantities (i.e, effective drift and diffusion coefficients) that characterize the long-time behavior of the observables. Here we use diffusion maps [9] to extract appropriate observables ("reduction coordinates") in an automated fashion; these involve the leading eigenvectors of a weighted Laplacian on a graph constructed from network simulation data. We present lifting and restriction procedures for translating between physical variables and these data-based observables. These procedures allow us to perform equation-free coarse-grained, computations characterizing the long-term dynamics through the design and processing of short bursts of stochastic simulation initialized at appropriate values of the data-based observables.

Boaz Nadler

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

Distributed Sparse Linear Regression under Communication Constraints

Distributed Sparse Normal Means Estimation with Sublinear Communication

GNMR: A provable one-line algorithm for low rank matrix recovery

Inductive Matrix Completion: No Bad Local Minima and a Fast Algorithm

Tight Recovery Guarantees for Orthogonal Matching Pursuit Under Gaussian Noise

Robust Sparse Covariance Estimation by Thresholding Tyler's M-Estimator

The discrete sign problem: uniqueness, recovery algorithms and phase retrieval applications

Unsupervised Ensemble Learning with Dependent Classifiers

Do semidefinite relaxations solve sparse PCA up to the information limit?

Learning Parametric-Output HMMs with Two Aliased States

Roy's Largest Root Test Under Rank-One Alternatives

Estimating the Accuracies of Multiple Classifiers Without Labeled Data

Roy's largest root under rank-one alternatives:The complex valued case and applications

On learning parametric-output HMMs

Ranking and combining multiple predictors without labeled data

Active Learning with Distributional Estimates

Minimax bounds for sparse PCA with noisy high-dimensional data

Vectorial Phase Retrieval for Linear Characterization of Attosecond Pulses

Variable-free exploration of stochastic models: a gene regulatory network example