Researcher profile

Alfred O. Hero III

Alfred O. Hero III contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2022arXiv

Orthonormal Sketches for Secure Coded Regression

In this work, we propose a method for speeding up linear regression distributively, while ensuring security. We leverage randomized sketching techniques, and improve straggler resilience in asynchronous systems. Specifically, we apply a random orthonormal matrix and then subsample in \textit{blocks}, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the transformation corresponds to an encoded encryption in an \textit{approximate} gradient coding scheme, and the subsampling corresponds to the responses of the non-straggling workers; in a centralized coded computing network. We focus on the special case of the \textit{Subsampled Randomized Hadamard Transform}, which we generalize to block sampling; and discuss how it can be used to secure the data. We illustrate the performance through numerical experiments.

preprint2022arXiv

SOLBP: Second-Order Loopy Belief Propagation for Inference in Uncertain Bayesian Networks

In second-order uncertain Bayesian networks, the conditional probabilities are only known within distributions, i.e., probabilities over probabilities. The delta-method has been applied to extend exact first-order inference methods to propagate both means and variances through sum-product networks derived from Bayesian networks, thereby characterizing epistemic uncertainty, or the uncertainty in the model itself. Alternatively, second-order belief propagation has been demonstrated for polytrees but not for general directed acyclic graph structures. In this work, we extend Loopy Belief Propagation to the setting of second-order Bayesian networks, giving rise to Second-Order Loopy Belief Propagation (SOLBP). For second-order Bayesian networks, SOLBP generates inferences consistent with those generated by sum-product networks, while being more computationally efficient and scalable.

preprint2022arXiv

Straggler Robust Distributed Matrix Inverse Approximation

A cumbersome operation in numerical analysis and linear algebra, optimization, machine learning and engineering algorithms; is inverting large full-rank matrices which appears in various processes and applications. This has both numerical stability and complexity issues, as well as high expected time to compute. We address the latter issue, by proposing an algorithm which uses a black-box least squares optimization solver as a subroutine, to give an estimate of the inverse (and pseudoinverse) of real nonsingular matrices; by estimating its columns. This also gives it the flexibility to be performed in a distributed manner, thus the estimate can be obtained a lot faster, and can be made robust to \textit{stragglers}. Furthermore, we assume a centralized network with no message passing between the computing nodes, and do not require a matrix factorization; e.g. LU, SVD or QR decomposition beforehand.

preprint2022arXiv

Uncertain Bayesian Networks: Learning from Incomplete Data

When the historical data are limited, the conditional probabilities associated with the nodes of Bayesian networks are uncertain and can be empirically estimated. Second order estimation methods provide a framework for both estimating the probabilities and quantifying the uncertainty in these estimates. We refer to these cases as uncer tain or second-order Bayesian networks. When such data are complete, i.e., all variable values are observed for each instantiation, the conditional probabilities are known to be Dirichlet-distributed. This paper improves the current state-of-the-art approaches for handling uncertain Bayesian networks by enabling them to learn distributions for their parameters, i.e., conditional probabilities, with incomplete data. We extensively evaluate various methods to learn the posterior of the parameters through the desired and empirically derived strength of confidence bounds for various queries.

preprint2020arXiv

Fundamental Limits of Deep Graph Convolutional Networks

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. To elucidate the capabilities and limitations of GCNs, we investigate their power, as a function of their number of layers, to distinguish between different random graph models (corresponding to different class-conditional distributions in a classification problem) on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We give a precise characterization of the set of pairs of graphons that are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. This characterization is in terms of a degree profile closeness property. Outside this class, a very simple GCN architecture suffices for distinguishability. We then exhibit a concrete, infinite class of graphons arising from stochastic block models that are well-separated in terms of cut distance and are indistinguishable by a GCN. These results theoretically match empirical observations of several prior works. To prove our results, we exploit a connection to random walks on graphs. Finally, we give empirical results on synthetic and real graph classification datasets, indicating that indistinguishable graph distributions arise in practice.

preprint2020arXiv

Numerically Stable Binary Gradient Coding

A major hurdle in machine learning is scalability to massive datasets. One approach to overcoming this is to distribute the computational tasks among several workers. \textit{Gradient coding} has been recently proposed in distributed optimization to compute the gradient of an objective function using multiple, possibly unreliable, worker nodes. By designing distributed coded schemes, gradient coded computations can be made resilient to \textit{stragglers}, nodes with longer response time comparing to other nodes in a distributed network. Most such schemes rely on operations over the real or complex numbers and are inherently numerically unstable. We present a binary scheme which avoids such operations, thereby enabling numerically stable distributed computation of the gradient. Also, some restricting assumptions in prior work are dropped, and a more efficient decoding is given.

preprint2020arXiv

The Power of Graph Convolutional Networks to Distinguish Random Graph Models: Short Version

Graph convolutional networks (GCNs) are a widely used method for graph representation learning. We investigate the power of GCNs, as a function of their number of layers, to distinguish between different random graph models on the basis of the embeddings of their sample graphs. In particular, the graph models that we consider arise from graphons, which are the most general possible parameterizations of infinite exchangeable graph models and which are the central objects of study in the theory of dense graph limits. We exhibit an infinite class of graphons that are well-separated in terms of cut distance and are indistinguishable by a GCN with nonlinear activation functions coming from a certain broad class if its depth is at least logarithmic in the size of the sample graph. These results theoretically match empirical observations of several prior works. Finally, we show a converse result that for pairs of graphons satisfying a degree profile separation property, a very simple GCN architecture suffices for distinguishability. To prove our results, we exploit a connection to random walks on graphs.

preprint2020arXiv

Weighted Gradient Coding with Leverage Score Sampling

A major hurdle in machine learning is scalability to massive datasets. Approaches to overcome this hurdle include compression of the data matrix and distributing the computations. \textit{Leverage score sampling} provides a compressed approximation of a data matrix using an importance weighted subset. \textit{Gradient coding} has been recently proposed in distributed optimization to compute the gradient using multiple unreliable worker nodes. By designing coding matrices, gradient coded computations can be made resilient to stragglers, which are nodes in a distributed network that degrade system performance. We present a novel \textit{weighted leverage score} approach, that achieves improved performance for distributed gradient coding by utilizing an importance sampling.

preprint2019arXiv

Semi-supervised Learning in Network-Structured Data via Total Variation Minimization

We propose and analyze a method for semi-supervised learning from partially-labeled network-structured data. Our approach is based on a graph signal recovery interpretation under a clustering hypothesis that labels of data points belonging to the same well-connected subset (cluster) are similar valued. This lends naturally to learning the labels by total variation (TV) minimization, which we solve by applying a recently proposed primal-dual method for non-smooth convex optimization. The resulting algorithm allows for a highly scalable implementation using message passing over the underlying empirical graph, which renders the algorithm suitable for big data applications. By applying tools of compressed sensing, we derive a sufficient condition on the underlying network structure such that TV minimization recovers clusters in the empirical graph of the data. In particular, we show that the proposed primal-dual method amounts to maximizing network flows over the empirical graph of the dataset. Moreover, the learning accuracy of the proposed algorithm is linked to the set of network flows between data points having known labels. The effectiveness and scalability of our approach is verified by numerical experiments.