Source author record

Waheed U. Bajwa

Waheed U. Bajwa appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning eess.SP Information Theory math.IT Distributed, Parallel, and Cluster Computing math.OC math.ST Statistics Theory hep-ex astro-ph.IM Computer Vision Cryptography and Security eess.SY math.DS Multiagent Systems Systems and Control

Catalog footprint

What is connected

17works

16topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Guide to Computational Reproducibility in Signal Processing and Machine Learning

Computational reproducibility is a growing problem that has been extensively studied among computational researchers and within the signal processing and machine learning research community. However, with the changing landscape of signal processing and machine learning research come new obstacles and unseen challenges in creating reproducible experiments. Due to these new challenges most computational experiments have become difficult, if not impossible, to be reproduced by an independent researcher. In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments. While the issue of computational reproducibility has been discussed in the literature and specifically within the signal processing community, it is still unclear to most researchers what are the best practices to ensure reproducibility without impinging on their primary responsibility of conducting research. We feel that although researchers understand the importance of making experiments reproducible, the lack of a clear set of standards and tools makes it difficult to incorporate good reproducibility practices in most labs. It is in this regard that we aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.

preprint2022arXiv

A Method for Quantifying Position Reconstruction Uncertainty in Astroparticle Physics using Bayesian Networks

Robust position reconstruction is paramount for enabling discoveries in astroparticle physics as backgrounds are significantly reduced by only considering interactions within the fiducial volume. In this work, we present for the first time a method for position reconstruction using a Bayesian network which provides per interaction uncertainties. We demonstrate the utility of this method with simulated data based on the XENONnT detector design, a dual-phase xenon time-projection chamber, as a proof-of-concept. The network structure includes variables representing the 2D position of the interaction within the detector, the number of electrons entering the gaseous phase, and the hits measured by each sensor in the top array of the detector. The precision of the position reconstruction (difference between the true and expectation value of position) is comparable to the state-of-the-art methods -- an RMS of 0.69 cm, ~0.09 of the sensor spacing, for the inner part of the detector (<60 cm) and 0.98 cm, ~0.12 of the sensor spacing, near the wall of the detector (>60 cm). More importantly, the uncertainty of each interaction position was directly computed, which is not possible with other reconstruction methods. The method found a median 3-$σ$ confidence region of 11 cm$^2$ for the inner part of the detector and 21 cm$^2$ near the wall of the detector. We found the Bayesian network framework to be well suited to the problem of position reconstruction. The performance of this proof-of-concept, even with several simplifying assumptions, shows that this is a promising method for providing per interaction uncertainty, which can be extended to energy reconstruction and signal classification.

preprint2022arXiv

A Minimax Lower Bound for Low-Rank Matrix-Variate Logistic Regression

This paper considers the problem of matrix-variate logistic regression. It derives the fundamental error threshold on estimating low-rank coefficient matrices in the logistic regression problem by obtaining a lower bound on the minimax risk. The bound depends explicitly on the dimension and distribution of the covariates, the rank and energy of the coefficient matrix, and the number of samples. The resulting bound is proportional to the intrinsic degrees of freedom in the problem, which suggests the sample complexity of the low-rank matrix logistic regression problem can be lower than that for vectorized logistic regression. The proof techniques utilized in this work also set the stage for development of minimax lower bounds for tensor-variate logistic regression problems.

preprint2022arXiv

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

Gradient-related first-order methods have become the workhorse of large-scale numerical optimization problems. Many of these problems involve nonconvex objective functions with multiple saddle points, which necessitates an understanding of the behavior of discrete trajectories of first-order methods within the geometrical landscape of these functions. This paper concerns convergence of first-order discrete methods to a local minimum of nonconvex optimization problems that comprise strict-saddle points within the geometrical landscape. To this end, it focuses on analysis of discrete gradient trajectories around saddle neighborhoods, derives sufficient conditions under which these trajectories can escape strict-saddle neighborhoods in linear time, explores the contractive and expansive dynamics of these trajectories in neighborhoods of strict-saddle points that are characterized by gradients of moderate magnitude, characterizes the non-curving nature of these trajectories, and highlights the inability of these trajectories to re-enter the neighborhoods around strict-saddle points after exiting them. Based on these insights and analyses, the paper then proposes a simple variant of the vanilla gradient descent algorithm, termed Curvature Conditioned Regularized Gradient Descent (CCRGD) algorithm, which utilizes a check for an initial boundary condition to ensure its trajectories can escape strict-saddle neighborhoods in linear time. Convergence analysis of the CCRGD algorithm, which includes its rate of convergence to a local minimum, is also presented in the paper. Numerical experiments are then provided on a test function as well as a low-rank matrix factorization problem to evaluate the efficacy of the proposed algorithm.

preprint2022arXiv

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Machine learning has begun to play a central role in many applications. A multitude of these applications typically also involve datasets that are distributed across multiple computing devices/machines due to either design constraints (e.g., multiagent systems) or computational/privacy reasons (e.g., learning on smartphone data). Such applications often require the learning tasks to be carried out in a decentralized fashion, in which there is no central server that is directly connected to all nodes. In real-world decentralized settings, nodes are prone to undetected failures due to malfunctioning equipment, cyberattacks, etc., which are likely to crash non-robust learning algorithms. The focus of this paper is on robustification of decentralized learning in the presence of nodes that have undergone Byzantine failures. The Byzantine failure model allows faulty nodes to arbitrarily deviate from their intended behaviors, thereby ensuring designs of the most robust of algorithms. But the study of Byzantine resilience within decentralized learning, in contrast to distributed learning, is still in its infancy. In particular, existing Byzantine-resilient decentralized learning methods either do not scale well to large-scale machine learning models, or they lack statistical convergence guarantees that help characterize their generalization errors. In this paper, a scalable, Byzantine-resilient decentralized machine learning framework termed Byzantine-resilient decentralized gradient descent (BRIDGE) is introduced. Algorithmic and statistical convergence guarantees for one variant of BRIDGE are also provided in the paper for both strongly convex problems and a class of nonconvex problems. In addition, large-scale decentralized learning experiments are used to establish that the BRIDGE framework is scalable and it delivers competitive results for Byzantine-resilient convex and nonconvex learning.

preprint2022arXiv

Domain-informed neural networks for interaction localization within astroparticle experiments

This work proposes a domain-informed neural network architecture for experimental particle physics, using particle interaction localization with the time-projection chamber (TPC) technology for dark matter research as an example application. A key feature of the signals generated within the TPC is that they allow localization of particle interactions through a process called reconstruction. While multilayer perceptrons (MLPs) have emerged as a leading contender for reconstruction in TPCs, such a black-box approach does not reflect prior knowledge of the underlying scientific processes. This paper looks anew at neural network-based interaction localization and encodes prior detector knowledge, in terms of both signal characteristics and detector geometry, into the feature encoding and the output layers of a multilayer neural network. The resulting Domain-informed Neural Network (DiNN) limits the receptive fields of the neurons in the initial feature encoding layers in order to account for the spatially localized nature of the signals produced within the TPC. This aspect of the DiNN, which has similarities with the emerging area of graph neural networks in that the neurons in the initial layers only connect to a handful of neurons in their succeeding layer, significantly reduces the number of parameters in the network in comparison to an MLP. In addition, in order to account for the detector geometry, the output layers of the network are modified using two geometric transformations to ensure the DiNN produces localizations within the interior of the detector. The end result is a neural network architecture that has 60% fewer parameters than an MLP, but that still achieves similar localization performance and provides a path to future architectural developments with improved performance because of their ability to encode additional domain knowledge into the architecture.

preprint2021arXiv

A hybrid model-based and learning-based approach for classification using limited number of training samples

The fundamental task of classification given a limited number of training data samples is considered for physical systems with known parametric statistical models. The standalone learning-based and statistical model-based classifiers face major challenges towards the fulfillment of the classification task using a small training set. Specifically, classifiers that solely rely on the physics-based statistical models usually suffer from their inability to properly tune the underlying unobservable parameters, which leads to a mismatched representation of the system's behaviors. Learning-based classifiers, on the other hand, typically rely on a large number of training data from the underlying physical process, which might not be feasible in most practical scenarios. In this paper, a hybrid classification method -- termed HyPhyLearn -- is proposed that exploits both the physics-based statistical models and the learning-based classifiers. The proposed solution is based on the conjecture that HyPhyLearn would alleviate the challenges associated with the individual approaches of learning-based and statistical model-based classifiers by fusing their respective strengths. The proposed hybrid approach first estimates the unobservable model parameters using the available (suboptimal) statistical estimation procedures, and subsequently use the physics-based statistical models to generate synthetic data. Then, the training data samples are incorporated with the synthetic data in a learning-based classifier that is based on domain-adversarial training of neural networks. Specifically, in order to address the mismatch problem, the classifier learns a mapping from the training data and the synthetic data to a common feature space. Simultaneously, the classifier is trained to find discriminative features within this space in order to fulfill the classification task.

preprint2020arXiv

Adversary-resilient Distributed and Decentralized Statistical Inference and Machine Learning: An Overview of Recent Advances Under the Byzantine Threat Model

While the last few decades have witnessed a huge body of work devoted to inference and learning in distributed and decentralized setups, much of this work assumes a non-adversarial setting in which individual nodes---apart from occasional statistical failures---operate as intended within the algorithmic framework. In recent years, however, cybersecurity threats from malicious non-state actors and rogue entities have forced practitioners and researchers to rethink the robustness of distributed and decentralized algorithms against adversarial attacks. As a result, we now have a plethora of algorithmic approaches that guarantee robustness of distributed and/or decentralized inference and learning under different adversarial threat models. Driven in part by the world's growing appetite for data-driven decision making, however, securing of distributed/decentralized frameworks for inference and learning against adversarial threats remains a rapidly evolving research area. In this article, we provide an overview of some of the most recent developments in this area under the threat model of Byzantine attacks.

preprint2020arXiv

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

This paper considers the problem of estimating the principal eigenvector of a covariance matrix from independent and identically distributed data samples in streaming settings. The streaming rate of data in many contemporary applications can be high enough that a single processor cannot finish an iteration of existing methods for eigenvector estimation before a new sample arrives. This paper formulates and analyzes a distributed variant of the classical Krasulina's method (D-Krasulina) that can keep up with the high streaming rate of data by distributing the computational load across multiple processing nodes. The analysis shows that---under appropriate conditions---D-Krasulina converges to the principal eigenvector in an order-wise optimal manner; i.e., after receiving $M$ samples across all nodes, its estimation error can be $O(1/M)$. In order to reduce the network communication overhead, the paper also develops and analyzes a mini-batch extension of D-Krasulina, which is termed DM-Krasulina. The analysis of DM-Krasulina shows that it can also achieve order-optimal estimation error rates under appropriate conditions, even when some samples have to be discarded within the network due to communication latency. Finally, experiments are performed over synthetic and real-world data to validate the convergence behaviors of D-Krasulina and DM-Krasulina in high-rate streaming settings.

preprint2020arXiv

ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models

Statistical inference can be computationally prohibitive in ultrahigh-dimensional linear models. Correlation-based variable screening, in which one leverages marginal correlations for removal of irrelevant variables from the model prior to statistical inference, can be used to overcome this challenge. Prior works on correlation-based variable screening either impose statistical priors on the linear model or assume specific post-screening inference methods. This paper first extends the analysis of correlation-based variable screening to arbitrary linear models and post-screening inference techniques. In particular, (i) it shows that a condition---termed the screening condition---is sufficient for successful correlation-based screening of linear models, and (ii) it provides insights into the dependence of marginal correlation-based screening on different problem parameters. Numerical experiments confirm that these insights are not mere artifacts of analysis; rather, they are reflective of the challenges associated with marginal correlation-based variable screening. Second, the paper explicitly derives the screening condition for arbitrary (random or deterministic) linear models and, in the process, it establishes that---under appropriate conditions---it is possible to reduce the dimension of an ultrahigh-dimensional, arbitrary linear model to almost the sample size even when the number of active variables scales almost linearly with the sample size. Third, it specializes the screening condition to sub-Gaussian linear models and contrasts the final results to those existing in the literature. This specialization formally validates the claim that the main result of this paper generalizes existing ones on correlation-based screening.

preprint2020arXiv

Human Action Attribute Learning From Video Data Using Low-Rank Representations

Representation of human actions as a sequence of human body movements or action attributes enables the development of models for human activity recognition and summarization. We present an extension of the low-rank representation (LRR) model, termed the clustering-aware structure-constrained low-rank representation (CS-LRR) model, for unsupervised learning of human action attributes from video data. Our model is based on the union-of-subspaces (UoS) framework, and integrates spectral clustering into the LRR optimization problem for better subspace clustering results. We lay out an efficient linear alternating direction method to solve the CS-LRR optimization problem. We also introduce a hierarchical subspace clustering approach, termed hierarchical CS-LRR, to learn the attributes without the need for a priori specification of their number. By visualizing and labeling these action attributes, the hierarchical model can be used to semantically summarize long video sequences of human actions at multiple resolutions. A human action or activity can also be uniquely represented as a sequence of transitions from one action attribute to another, which can then be used for human action recognition. We demonstrate the effectiveness of the proposed model for semantic summarization and action recognition through comprehensive experiments on five real-world human action datasets.

preprint2020arXiv

Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms

This work addresses the problem of learning sparse representations of tensor data using structured dictionary learning. It proposes learning a mixture of separable dictionaries to better capture the structure of tensor data by generalizing the separable dictionary learning model. Two different approaches for learning mixture of separable dictionaries are explored and sufficient conditions for local identifiability of the underlying dictionary are derived in each case. Moreover, computational algorithms are developed to solve the problem of learning mixture of separable dictionaries in both batch and online settings. Numerical experiments are used to show the usefulness of the proposed model and the efficacy of the developed algorithms.

preprint2020arXiv

Learning Product Graphs Underlying Smooth Graph Signals

Real-world data is often times associated with irregular structures that can analytically be represented as graphs. Having access to this graph, which is sometimes trivially evident from domain knowledge, provides a better representation of the data and facilitates various information processing tasks. However, in cases where the underlying graph is unavailable, it needs to be learned from the data itself for data representation, data processing and inference purposes. Existing literature on learning graphs from data has mostly considered arbitrary graphs, whereas the graphs generating real-world data tend to have additional structure that can be incorporated in the graph learning procedure. Structure-aware graph learning methods require learning fewer parameters and have the potential to reduce computational, memory and sample complexities. In light of this, the focus of this paper is to devise a method to learn structured graphs from data that are given in the form of product graphs. Product graphs arise naturally in many real-world datasets and provide an efficient and compact representation of large-scale graphs through several smaller factor graphs. To this end, first the graph learning problem is posed as a linear program, which (on average) outperforms the state-of-the-art graph learning algorithms. This formulation is of independent interest itself as it shows that graph learning is possible through a simple linear program. Afterwards, an alternating minimization-based algorithm aimed at learning various types of product graphs is proposed, and local convergence guarantees to the true solution are established for this algorithm. Finally the performance gains, reduced sample complexity, and inference capabilities of the proposed algorithm over existing methods are also validated through numerical simulations on synthetic and real datasets.

preprint2019arXiv

ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning

Distributed machine learning algorithms enable learning of models from datasets that are distributed over a network without gathering the data at a centralized location. While efficient distributed algorithms have been developed under the assumption of faultless networks, failures that can render these algorithms nonfunctional occur frequently in the real world. This paper focuses on the problem of Byzantine failures, which are the hardest to safeguard against in distributed algorithms. While Byzantine fault tolerance has a rich history, existing work does not translate into efficient and practical algorithms for high-dimensional learning in fully distributed (also known as decentralized) settings. In this paper, an algorithm termed Byzantine-resilient distributed coordinate descent (ByRDiE) is developed and analyzed that enables distributed learning in the presence of Byzantine failures. Theoretical analysis (convex settings) and numerical experiments (convex and nonconvex settings) highlight its usefulness for high-dimensional distributed learning in the presence of Byzantine failures.

preprint2016arXiv

A Multiple Hypothesis Testing Approach to Low-Complexity Subspace Unmixing

Subspace-based signal processing traditionally focuses on problems involving a few subspaces. Recently, a number of problems in different application areas have emerged that involve a significantly larger number of subspaces relative to the ambient dimension. It becomes imperative in such settings to first identify a smaller set of active subspaces that contribute to the observation before further processing can be carried out. This problem of identification of a small set of active subspaces among a huge collection of subspaces from a single (noisy) observation in the ambient space is termed subspace unmixing. This paper formally poses the subspace unmixing problem under the parsimonious subspace-sum (PS3) model, discusses connections of the PS3 model to problems in wireless communications, hyperspectral imaging, high-dimensional statistics and compressed sensing, and proposes a low-complexity algorithm, termed marginal subspace detection (MSD), for subspace unmixing. The MSD algorithm turns the subspace unmixing problem for the PS3 model into a multiple hypothesis testing (MHT) problem and its analysis in the paper helps control the family-wise error rate of this MHT problem at any level $α\in [0,1]$ under two random signal generation models. Some other highlights of the analysis of the MSD algorithm include: (i) it is applicable to an arbitrary collection of subspaces on the Grassmann manifold; (ii) it relies on properties of the collection of subspaces that are computable in polynomial time; and ($iii$) it allows for linear scaling of the number of active subspaces as a function of the ambient dimension. Finally, numerical results are presented in the paper to better understand the performance of the MSD algorithm.

preprint2015arXiv

Identification of Linear Time-Varying Systems Through Waveform Diversity

Linear, time-varying (LTV) systems composed of time shifts, frequency shifts, and complex amplitude scalings are operators that act on continuous finite-energy waveforms. This paper presents a novel, resource-efficient method for identifying the parametric description of such systems, i.e., the time shifts, frequency shifts, and scalings, from the sampled response to linear frequency modulated (LFM) waveforms, with emphasis on the application to radar processing. If the LTV operator is probed with a sufficiently diverse set of LFM waveforms, then the system can be identified with high accuracy. In the case of noiseless measurements, the identification is perfect, while in the case of noisy measurements, the accuracy is inversely proportional to the noise level. The use of parametric estimation techniques with recently proposed denoising algorithms allows the estimation of the parameters with high accuracy.

preprint2013arXiv

Finding Zeros: Greedy Detection of Holes

In this paper, motivated by the setting of white-space detection [1], we present theoretical and empirical results for detection of the zero-support E of x \in Cp (xi = 0 for i \in E) with reduced-dimension linear measurements. We propose two low- complexity algorithms based on one-step thresholding [2] for this purpose. The second algorithm is a variant of the first that further assumes the presence of group-structure in the target signal [3] x. Performance guarantees for both algorithms based on the worst- case and average coherence (group coherence) of the measurement matrix is presented along with the empirical performance of the algorithms.

Waheed U. Bajwa

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

A Guide to Computational Reproducibility in Signal Processing and Machine Learning

A Method for Quantifying Position Reconstruction Uncertainty in Astroparticle Physics using Bayesian Networks

A Minimax Lower Bound for Low-Rank Matrix-Variate Logistic Regression

Boundary Conditions for Linear Exit Time Gradient Trajectories Around Saddle Points: Analysis and Algorithm

BRIDGE: Byzantine-resilient Decentralized Gradient Descent

Domain-informed neural networks for interaction localization within astroparticle experiments

A hybrid model-based and learning-based approach for classification using limited number of training samples

Adversary-resilient Distributed and Decentralized Statistical Inference and Machine Learning: An Overview of Recent Advances Under the Byzantine Threat Model

Distributed Stochastic Algorithms for High-rate Streaming Principal Component Analysis

ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models

Human Action Attribute Learning From Video Data Using Low-Rank Representations

Learning Mixtures of Separable Dictionaries for Tensor Data: Analysis and Algorithms

Learning Product Graphs Underlying Smooth Graph Signals

ByRDiE: Byzantine-resilient distributed coordinate descent for decentralized learning

A Multiple Hypothesis Testing Approach to Low-Complexity Subspace Unmixing

Identification of Linear Time-Varying Systems Through Waveform Diversity

Finding Zeros: Greedy Detection of Holes