Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
47works
0followers
12topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

47 published item(s)

preprint2026arXiv

Federated Martingale Posterior Samping

Federated Bayesian neural networks require fixing a prior on the model parameters together with a likelihood. Eliciting meaningful priors on the weight space of modern overparameterized models is notoriously difficult, and misspecification of either component can severely degrade accuracy and calibration. Motivated by the rapid progress of predictive models such as large language models, the martingale posterior, also known as predictive Bayes, replaces the prior--likelihood pair with a predictive distribution and recovers parameter uncertainty by repeatedly drawing predictive samples and refitting the model. A direct federated implementation, however, would require clients to share the local data sets. This letter proposes {federated martingale posterior} (FMP) sampling, a one-shot embarrassingly parallel protocol in which each client uploads a small set of trainable data embeddings and the server runs the predictive sampler centrally. Experiments on MNIST, CIFAR-10, and CIFAR-100 show that FMP closely matches the centralized counterpart and significantly improves calibration over consensus-style baselines.

preprint2026arXiv

Modern Neuromorphic AI: From Intra-Token to Inter-Token Processing

The rapid growth of artificial intelligence (AI) has brought novel data processing and generative capabilities but also escalating energy requirements. This challenge motivates renewed interest in neuromorphic computing principles, which promise brain-like efficiency through discrete and sparse activations, recurrent dynamics, and non-linear feedback. In fact, modern AI architectures increasingly embody neuromorphic principles through heavily quantized activations, state-space dynamics, and sparse attention mechanisms. This paper elaborates on the connections between neuromorphic models, state-space models, and transformer architectures through the lens of the distinction between intra-token processing and inter-token processing. Most early work on neuromorphic AI was based on spiking neural networks (SNNs) for intra-token processing, i.e., for transformations involving multiple channels, or features, of the same vector input, such as the pixels of an image. In contrast, more recent research has explored how neuromorphic principles can be leveraged to design efficient inter-token processing methods, which selectively combine different information elements depending on their contextual relevance. Implementing associative memorization mechanisms, these approaches leverage state-space dynamics or sparse self-attention. Along with a systematic presentation of modern neuromorphic AI models through the lens of intra-token and inter-token processing, training methodologies for neuromorphic AI models are also reviewed. These range from surrogate gradients leveraging parallel convolutional processing to local learning rules based on reinforcement learning mechanisms.

preprint2026arXiv

Quantize-Sample-and-Verify: LLM Acceleration via Adaptive Edge-Cloud Speculative Decoding

In edge-cloud speculative decoding (SD), edge devices equipped with small language models (SLMs) generate draft tokens that are verified by large language models (LLMs) in the cloud. A key bottleneck in such systems is the limited communication bandwidth between edge and cloud, which necessitates quantization of the information transmitted about generated tokens. In this work, we introduce a novel quantize-sample (Q-S) strategy that provably preserves the output distribution of the cloud-based model, ensuring that the verified tokens match the distribution of those that would have been generated directly by the LLM. We develop a throughput model for edge-cloud SD that explicitly accounts for communication latency. Leveraging this model, we propose an adaptive mechanism that optimizes token throughput by dynamically adjusting the draft length and quantization precision in response to both semantic uncertainty and channel conditions. Simulations demonstrate that the proposed Q-S approach significantly improves decoding efficiency in realistic edge-cloud deployment scenarios.

preprint2023arXiv

Neuromorphic Wireless Cognition: Event-Driven Semantic Communications for Remote Inference

Neuromorphic computing is an emerging computing paradigm that moves away from batched processing towards the online, event-driven, processing of streaming data. Neuromorphic chips, when coupled with spike-based sensors, can inherently adapt to the "semantics" of the data distribution by consuming energy only when relevant events are recorded in the timing of spikes and by proving a low-latency response to changing conditions in the environment. This paper proposes an end-to-end design for a neuromorphic wireless Internet-of-Things system that integrates spike-based sensing, processing, and communication. In the proposed NeuroComm system, each sensing device is equipped with a neuromorphic sensor, a spiking neural network (SNN), and an impulse radio transmitter with multiple antennas. Transmission takes place over a shared fading channel to a receiver equipped with a multi-antenna impulse radio receiver and with an SNN. In order to enable adaptation of the receiver to the fading channel conditions, we introduce a hypernetwork to control the weights of the decoding SNN using pilots. Pilots, encoding SNNs, decoding SNN, and hypernetwork are jointly trained across multiple channel realizations. The proposed system is shown to significantly improve over conventional frame-based digital solutions, as well as over alternative non-adaptive training methods, in terms of time-to-accuracy and energy consumption metrics.

preprint2022arXiv

Adaptive Worker Grouping For Communication-Efficient and Straggler-Tolerant Distributed SGD

Wall-clock convergence time and communication load are key performance metrics for the distributed implementation of stochastic gradient descent (SGD) in parameter server settings. Communication-adaptive distributed Adam (CADA) has been recently proposed as a way to reduce communication load via the adaptive selection of workers. CADA is subject to performance degradation in terms of wall-clock convergence time in the presence of stragglers. This paper proposes a novel scheme named grouping-based CADA (G-CADA) that retains the advantages of CADA in reducing the communication load, while increasing the robustness to stragglers at the cost of additional storage at the workers. G-CADA partitions the workers into groups of workers that are assigned the same data shards. Groups are scheduled adaptively at each iteration, and the server only waits for the fastest worker in each selected group. We provide analysis and experimental results to elaborate the significant gains on the wall-clock time, as well as communication load and computation load, of G-CADA over other benchmark schemes.

preprint2022arXiv

AI-Based Channel Prediction in D2D Links: An Empirical Validation

Device-to-Device (D2D) communication propelled by artificial intelligence (AI) will be an allied technology that will improve system performance and support new services in advanced wireless networks (5G, 6G and beyond). In this paper, AI-based deep learning techniques are applied to D2D links operating at 5.8 GHz with the aim at providing potential answers to the following questions concerning the prediction of the received signal strength variations: i) how effective is the prediction as a function of the coherence time of the channel? and ii) what is the minimum number of input samples required for a target prediction performance? To this end, a variety of measurement environments and scenarios are considered, including an indoor open-office area, an outdoor open-space, line of sight (LOS), non-LOS (NLOS), and mobile scenarios. Four deep learning models are explored, namely long short-term memory networks (LSTMs), gated recurrent units (GRUs), convolutional neural networks (CNNs), and dense or feedforward networks (FFNs). Linear regression is used as a baseline model. It is observed that GRUs and LSTMs present equivalent performance, and both are superior when compared to CNNs, FFNs and linear regression. This indicates that GRUs and LSTMs are able to better account for temporal dependencies in the D2D data sets. We also provide recommendations on the minimum input lengths that yield the required performance given the channel coherence time. For instance, to predict 17 and 23 ms into the future, in indoor and outdoor LOS environments, respectively, an input length of 25 ms is recommended. This indicates that the bulk of the learning is done within the coherence time of the channel, and that large input lengths may not always be beneficial.

preprint2022arXiv

An Introduction to Quantum Machine Learning for Engineers

In the current noisy intermediate-scale quantum (NISQ) era, quantum machine learning is emerging as a dominant paradigm to program gate-based quantum computers. In quantum machine learning, the gates of a quantum circuit are parametrized, and the parameters are tuned via classical optimization based on data and on measurements of the outputs of the circuit. Parametrized quantum circuits (PQCs) can efficiently address combinatorial optimization problems, implement probabilistic generative models, and carry out inference (classification and regression). This monograph provides a self-contained introduction to quantum machine learning for an audience of engineers with a background in probability and linear algebra. It first describes the necessary background, concepts, and tools necessary to describe quantum operations and measurements. Then, it covers parametrized quantum circuits, the variational quantum eigensolver, as well as unsupervised and supervised quantum machine learning formulations.

preprint2022arXiv

Bayesian Active Meta-Learning for Black-Box Optimization

Data-efficient learning algorithms are essential in many practical applications for which data collection is expensive, e.g., for the optimal deployment of wireless systems in unknown propagation scenarios. Meta-learning can address this problem by leveraging data from a set of related learning tasks, e.g., from similar deployment settings. In practice, one may have available only unlabeled data sets from the related tasks, requiring a costly labeling procedure to be carried out before use in meta-learning. For instance, one may know the possible positions of base stations in a given area, but not the performance indicators achievable with each deployment. To decrease the number of labeling steps required for meta-learning, this paper introduces an information-theoretic active task selection mechanism, and evaluates an instantiation of the approach for Bayesian optimization of black-box models.

preprint2022arXiv

Bayesian Active Meta-Learning for Few Pilot Demodulation and Equalization

Two of the main principles underlying the life cycle of an artificial intelligence (AI) module in communication networks are adaptation and monitoring. Adaptation refers to the need to adjust the operation of an AI module depending on the current conditions; while monitoring requires measures of the reliability of an AI module's decisions. Classical frequentist learning methods for the design of AI modules fall short on both counts of adaptation and monitoring, catering to one-off training and providing overconfident decisions. This paper proposes a solution to address both challenges by integrating meta-learning with Bayesian learning. As a specific use case, the problems of demodulation and equalization over a fading channel based on the availability of few pilots are studied. Meta-learning processes pilot information from multiple frames in order to extract useful shared properties of effective demodulators across frames. The resulting trained demodulators are demonstrated, via experiments, to offer better calibrated soft decisions, at the computational cost of running an ensemble of networks at run time. The capacity to quantify uncertainty in the model parameter space is further leveraged by extending Bayesian meta-learning to an active setting. In it, the designer can select in a sequential fashion channel conditions under which to generate data for meta-learning from a channel simulator. Bayesian active meta-learning is seen in experiments to significantly reduce the number of frames required to obtain efficient adaptation procedure for new frames.

preprint2022arXiv

Learning Quantum Entanglement Distillation with Noisy Classical Communications

Quantum networking relies on the management and exploitation of entanglement. Practical sources of entangled qubits are imperfect, producing mixed quantum state with reduced fidelity with respect to ideal Bell pairs. Therefore, an important primitive for quantum networking is entanglement distillation, whose goal is to enhance the fidelity of entangled qubits through local operations and classical communication (LOCC). Existing distillation protocols assume the availability of ideal, noiseless, communication channels. In this paper, we study the case in which communication takes place over noisy binary symmetric channels. We propose to implement local processing through parameterized quantum circuits (PQCs) that are optimized to maximize the average fidelity, while accounting for communication errors. The introduced approach, Noise Aware-LOCCNet (NA-LOCCNet), is shown to have significant advantages over existing protocols designed for noiseless communications.

preprint2022arXiv

Leveraging Channel Noise for Sampling and Privacy via Quantized Federated Langevin Monte Carlo

For engineering applications of artificial intelligence, Bayesian learning holds significant advantages over standard frequentist learning, including the capacity to quantify uncertainty. Langevin Monte Carlo (LMC) is an efficient gradient-based approximate Bayesian learning strategy that aims at producing samples drawn from the posterior distribution of the model parameters. Prior work focused on a distributed implementation of LMC over a multi-access wireless channel via analog modulation. In contrast, this paper proposes quantized federated LMC (FLMC), which integrates one-bit stochastic quantization of the local gradients with channel-driven sampling. Channel-driven sampling leverages channel noise for the purpose of contributing to Monte Carlo sampling, while also serving the role of privacy mechanism. Analog and digital implementations of wireless LMC are compared as a function of differential privacy (DP) requirements, revealing the advantages of the latter at sufficiently high signal-to-noise ratio.

preprint2022arXiv

Modular Meta-Learning for Power Control via Random Edge Graph Neural Networks

In this paper, we consider the problem of power control for a wireless network with an arbitrarily time-varying topology, including the possible addition or removal of nodes. A data-driven design methodology that leverages graph neural networks (GNNs) is adopted in order to efficiently parametrize the power control policy mapping the channel state information (CSI) to transmit powers. The specific GNN architecture, known as random edge GNN (REGNN), defines a non-linear graph convolutional filter whose spatial weights are tied to the channel coefficients. While prior work assumed a joint training approach whereby the REGNN-based policy is shared across all topologies, this paper targets adaptation of the power control policy based on limited CSI data regarding the current topology. To this end, we propose a novel modular meta-learning technique that enables the efficient optimization of module assignment. While black-box meta-learning optimizes a general-purpose adaptation procedure via (stochastic) gradient descent, modular meta-learning finds a set of reusable modules that can form components of a solution for any new network topology. Numerical results validate the benefits of meta-learning for power control problems over joint training schemes, and demonstrate the advantages of modular meta-learning when data availability is extremely limited.

preprint2022arXiv

Predicting Flat-Fading Channels via Meta-Learned Closed-Form Linear Filters and Equilibrium Propagation

Predicting fading channels is a classical problem with a vast array of applications, including as an enabler of artificial intelligence (AI)-based proactive resource allocation for cellular networks. Under the assumption that the fading channel follows a stationary complex Gaussian process, as for Rayleigh and Rician fading models, the optimal predictor is linear, and it can be directly computed from the Doppler spectrum via standard linear minimum mean squared error (LMMSE) estimation. However, in practice, the Doppler spectrum is unknown, and the predictor has only access to a limited time series of estimated channels. This paper proposes to leverage meta-learning in order to mitigate the requirements in terms of training data for channel fading prediction. Specifically, it first develops an offline low-complexity solution based on linear filtering via a meta-trained quadratic regularization. Then, an online method is proposed based on gradient descent and equilibrium propagation (EP). Numerical results demonstrate the advantages of the proposed approach, showing its capacity to approach the genie-aided LMMSE solution with a small number of training data points.

preprint2022arXiv

Predicting Multi-Antenna Frequency-Selective Channels via Meta-Learned Linear Filters based on Long-Short Term Channel Decomposition

An efficient data-driven prediction strategy for multi-antenna frequency-selective channels must operate based on a small number of pilot symbols. This paper proposes novel channel prediction algorithms that address this goal by integrating transfer and meta-learning with a reduced-rank parametrization of the channel. The proposed methods optimize linear predictors by utilizing data from previous frames, which are generally characterized by distinct propagation characteristics, in order to enable fast training on the time slots of the current frame. The proposed predictors rely on a novel long-short-term decomposition (LSTD) of the linear prediction model that leverages the disaggregation of the channel into long-term space-time signatures and fading amplitudes. We first develop predictors for single-antenna frequency-flat channels based on transfer/meta-learned quadratic regularization. Then, we introduce transfer and meta-learning algorithms for LSTD-based prediction models that build on equilibrium propagation (EP) and alternating least squares (ALS). Numerical results under the 3GPP 5G standard channel model demonstrate the impact of transfer and meta-learning on reducing the number of pilots for channel prediction, as well as the merits of the proposed LSTD parametrization.

preprint2022arXiv

Quantum-Aided Meta-Learning for Bayesian Binary Neural Networks via Born Machines

Near-term noisy intermediate-scale quantum circuits can efficiently implement implicit probabilistic models in discrete spaces, supporting distributions that are practically infeasible to sample from using classical means. One of the possible applications of such models, also known as Born machines, is probabilistic inference, which is at the core of Bayesian methods. This paper studies the use of Born machines for the problem of training binary Bayesian neural networks. In the proposed approach, a Born machine is used to model the variational distribution of the binary weights of the neural network, and data from multiple tasks is used to reduce training data requirements on new tasks. The method combines gradient-based meta-learning and variational inference via Born machines, and is shown in a prototypical regression problem to outperform conventional joint learning strategies.

preprint2022arXiv

Robust Bayesian Learning for Reliable Wireless AI: Framework and Applications

This work takes a critical look at the application of conventional machine learning methods to wireless communication problems through the lens of reliability and robustness. Deep learning techniques adopt a frequentist framework, and are known to provide poorly calibrated decisions that do not reproduce the true uncertainty caused by limitations in the size of the training data. Bayesian learning, while in principle capable of addressing this shortcoming, is in practice impaired by model misspecification and by the presence of outliers. Both problems are pervasive in wireless communication settings, in which the capacity of machine learning models is subject to resource constraints and training data is affected by noise and interference. In this context, we explore the application of the framework of robust Bayesian learning. After a tutorial-style introduction to robust Bayesian learning, we showcase the merits of robust Bayesian learning on several important wireless communication problems in terms of accuracy, calibration, and robustness to outliers and misspecification.

preprint2022arXiv

Robust Design of Rate-Splitting Multiple Access With Imperfect CSI for Cell-Free MIMO Systems

Rate-Splitting Multiple Access (RSMA) for multi-user downlink operates by splitting the message for each user equipment (UE) into a private message and a set of common messages, which are simultaneously transmitted by means of superposition coding. The RSMA scheme can enhance throughput and connectivity as compared to conventional multiple access techniques by optimizing the rate-splitting ratios along with the corresponding downlink beamforming vectors. This work examines the impact of erroneous channel state information (CSI) on the performance of RSMA in cell-free multiple-input multiple-output (MIMO) systems. An efficient robust optimization algorithm is proposed by using closed-form lower bound expressions on the expected data rates. Extensive numerical results show the importance of robust design in the presence of CSI errors and how the performance gain of RSMA over conventional schemes is affected by CSI imperfection.

preprint2022arXiv

Robust Distributed Bayesian Learning with Stragglers via Consensus Monte Carlo

This paper studies distributed Bayesian learning in a setting encompassing a central server and multiple workers by focusing on the problem of mitigating the impact of stragglers. The standard one-shot, or embarrassingly parallel, Bayesian learning protocol known as consensus Monte Carlo (CMC) is generalized by proposing two straggler-resilient solutions based on grouping and coding. Two main challenges in designing straggler-resilient algorithms for CMC are the need to estimate the statistics of the workers' outputs across multiple shots, and the joint non-linear post-processing of the outputs of the workers carried out at the server. This is in stark contrast to other distributed settings like gradient coding, which only require the per-shot sum of the workers' outputs. The proposed methods, referred to as Group-based CMC (G-CMC) and Coded CMC (C-CMC), leverage redundant computing at the workers in order to enable the estimation of global posterior samples at the server based on partial outputs from the workers. Simulation results show that C-CMC may outperform G-CMC for a small number of workers, while G-CMC is generally preferable for a larger number of workers.

preprint2022arXiv

Training Hybrid Classical-Quantum Classifiers via Stochastic Variational Optimization

Quantum machine learning has emerged as a potential practical application of near-term quantum devices. In this work, we study a two-layer hybrid classical-quantum classifier in which a first layer of quantum stochastic neurons implementing generalized linear models (QGLMs) is followed by a second classical combining layer. The input to the first, hidden, layer is obtained via amplitude encoding in order to leverage the exponential size of the fan-in of the quantum neurons in the number of qubits per neuron. To facilitate implementation of the QGLMs, all weights and activations are binary. While the state of the art on training strategies for this class of models is limited to exhaustive search and single-neuron perceptron-like bit-flip strategies, this letter introduces a stochastic variational optimization approach that enables the joint training of quantum and classical layers via stochastic gradient descent. Experiments show the advantages of the approach for a variety of activation functions implemented by QGLM neurons.

preprint2022arXiv

Wireless Federated Langevin Monte Carlo: Repurposing Channel Noise for Bayesian Sampling and Privacy

Most works on federated learning (FL) focus on the most common frequentist formulation of learning whereby the goal is minimizing the global empirical loss. Frequentist learning, however, is known to be problematic in the regime of limited data as it fails to quantify epistemic uncertainty in prediction. Bayesian learning provides a principled solution to this problem by shifting the optimization domain to the space of distribution in the model parameters. {\color{black}This paper proposes a novel mechanism for the efficient implementation of Bayesian learning in wireless systems. Specifically, we focus on a standard gradient-based Markov Chain Monte Carlo (MCMC) method, namely Langevin Monte Carlo (LMC), and we introduce a novel protocol, termed Wireless Federated LMC (WFLMC), that is able to repurpose channel noise for the double role of seed randomness for MCMC sampling and of privacy preservation.} To this end, based on the analysis of the Wasserstein distance between sample distribution and global posterior distribution under privacy and power constraints, we introduce a power allocation strategy as the solution of a convex program. The analysis identifies distinct operating regimes in which the performance of the system is power-limited, privacy-limited, or limited by the requirement of MCMC sampling. Both analytical and simulation results demonstrate that, if the channel noise is properly accounted for under suitable conditions, it can be fully repurposed for both MCMC sampling and privacy preservation, obtaining the same performance as in an ideal communication setting that is not subject to privacy constraints.

preprint2021arXiv

An Information-Theoretic Analysis of The Cost of Decentralization for Learning and Inference Under Privacy Constraints

In vertical federated learning (FL), the features of a data sample are distributed across multiple agents. As such, inter-agent collaboration can be beneficial not only during the learning phase, as is the case for standard horizontal FL, but also during the inference phase. A fundamental theoretical question in this setting is how to quantify the cost, or performance loss, of decentralization for learning and/or inference. In this paper, we consider general supervised learning problems with any number of agents, and provide a novel information-theoretic quantification of the cost of decentralization in the presence of privacy constraints on inter-agent communication within a Bayesian framework. The cost of decentralization for learning and/or inference is shown to be quantified in terms of conditional mutual information terms involving features and label variables.

preprint2021arXiv

Calibration-Aided Edge Inference Offloading via Adaptive Model Partitioning of Deep Neural Networks

Mobile devices can offload deep neural network (DNN)-based inference to the cloud, overcoming local hardware and energy limitations. However, offloading adds communication delay, thus increasing the overall inference time, and hence it should be used only when needed. An approach to address this problem consists of the use of adaptive model partitioning based on early-exit DNNs. Accordingly, the inference starts at the mobile device, and an intermediate layer estimates the accuracy: If the estimated accuracy is sufficient, the device takes the inference decision; Otherwise, the remaining layers of the DNN run at the cloud. Thus, the device offloads the inference to the cloud only if it cannot classify a sample with high confidence. This offloading requires a correct accuracy prediction at the device. Nevertheless, DNNs are typically miscalibrated, providing overconfident decisions. This work shows that the employment of a miscalibrated early-exit DNN for offloading via model partitioning can significantly decrease inference accuracy. In contrast, we argue that implementing a calibration algorithm prior to deployment can solve this problem, allowing for more reliable offloading decisions.

preprint2021arXiv

Coded Computing and Cooperative Transmission for Wireless Distributed Matrix Multiplication

Consider a multi-cell mobile edge computing network, in which each user wishes to compute the product of a user-generated data matrix with a network-stored matrix. This is done through task offloading by means of input uploading, distributed computing at edge nodes (ENs), and output downloading. Task offloading may suffer long delay since servers at some ENs may be straggling due to random computation time, and wireless channels may experience severe fading and interference. This paper aims to investigate the interplay among upload, computation, and download latencies during the offloading process in the high signal-to-noise ratio regime from an information-theoretic perspective. A policy based on cascaded coded computing and on coordinated and cooperative interference management in uplink and downlink is proposed and proved to be approximately optimal for a sufficiently large upload time. By investing more time in uplink transmission, the policy creates data redundancy at the ENs, which can reduce the computation time, by enabling the use of coded computing, as well as the download time via transmitter cooperation. Moreover, the policy allows computation time to be traded for download time. Numerical examples demonstrate that the proposed policy can improve over existing schemes by significantly reducing the end-to-end execution time.

preprint2021arXiv

Conditional Mutual Information-Based Generalization Bound for Meta Learning

Meta-learning optimizes an inductive bias---typically in the form of the hyperparameters of a base-learning algorithm---by observing data from a finite number of related tasks. This paper presents an information-theoretic bound on the generalization performance of any given meta-learner, which builds on the conditional mutual information (CMI) framework of Steinke and Zakynthinou (2020). In the proposed extension to meta-learning, the CMI bound involves a training \textit{meta-supersample} obtained by first sampling $2N$ independent tasks from the task environment, and then drawing $2M$ independent training samples for each sampled task. The meta-training data fed to the meta-learner is modelled as being obtained by randomly selecting $N$ tasks from the available $2N$ tasks and $M$ training samples per task from the available $2M$ training samples per task. The resulting bound is explicit in two CMI terms, which measure the information that the meta-learner output and the base-learner output provide about which training data are selected, given the entire meta-supersample. Finally, we present a numerical example that illustrates the merits of the proposed bound in comparison to prior information-theoretic bounds for meta-learning.

preprint2021arXiv

Fast On-Device Adaptation for Spiking Neural Networks via Online-Within-Online Meta-Learning

Spiking Neural Networks (SNNs) have recently gained popularity as machine learning models for on-device edge intelligence for applications such as mobile healthcare management and natural language processing due to their low power profile. In such highly personalized use cases, it is important for the model to be able to adapt to the unique features of an individual with only a minimal amount of training data. Meta-learning has been proposed as a way to train models that are geared towards quick adaptation to new tasks. The few existing meta-learning solutions for SNNs operate offline and require some form of backpropagation that is incompatible with the current neuromorphic edge-devices. In this paper, we propose an online-within-online meta-learning rule for SNNs termed OWOML-SNN, that enables lifelong learning on a stream of tasks, and relies on local, backprop-free, nested updates.

preprint2021arXiv

Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence

In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture $(i)$ the domain shift between the target data distribution $P'_Z$ and the source distribution $P_Z$ through a two-parameter family of generalized $(α_1,α_2)$-Jensen-Shannon (JS) divergences; and $(ii)$ the sensitivity of the transfer learner output $W$ to each individual sample of the data set $Z_i$ via the mutual information $I(W;Z_i)$. For $α_1 \in (0,1)$, the $(α_1,α_2)$-JS divergence can be bounded even when the support of $P_Z$ is not included in that of $P'_Z$. This contrasts the Kullback-Leibler (KL) divergence $D_{KL}(P_Z||P'_Z)$-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the $ϕ$-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the $(α_1,α_2)$-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.

preprint2021arXiv

Information-Theoretic Generalization Bounds for Meta-Learning and Applications

Meta-learning, or "learning to learn", refers to techniques that infer an inductive bias from data corresponding to multiple related tasks with the goal of improving the sample efficiency for new, previously unobserved, tasks. A key performance measure for meta-learning is the meta-generalization gap, that is, the difference between the average loss measured on the meta-training data and on a new, randomly selected task. This paper presents novel information-theoretic upper bounds on the meta-generalization gap. Two broad classes of meta-learning algorithms are considered that uses either separate within-task training and test sets, like MAML, or joint within-task training and test sets, like Reptile. Extending the existing work for conventional learning, an upper bound on the meta-generalization gap is derived for the former class that depends on the mutual information (MI) between the output of the meta-learning algorithm and its input meta-training data. For the latter, the derived bound includes an additional MI between the output of the per-task learning procedure and corresponding data set to capture within-task uncertainty. Tighter bounds are then developed, under given technical conditions, for the two classes via novel Individual Task MI (ITMI) bounds. Applications of the derived bounds are finally discussed, including a broad class of noisy iterative algorithms for meta-learning.

preprint2021arXiv

Joint Design of Radar Waveform and Detector via End-to-end Learning with Waveform Constraints

The problem of data-driven joint design of transmitted waveform and detector in a radar system is addressed in this paper. We propose two novel learning-based approaches to waveform and detector design based on end-to-end training of the radar system. The first approach consists of alternating supervised training of the detector for a fixed waveform and reinforcement learning of the transmitter for a fixed detector. In the second approach, the transmitter and detector are trained simultaneously. Various operational waveform constraints, such as peak-to-average-power ratio (PAR) and spectral compatibility, are incorporated into the design. Unlike traditional radar design methods that rely on rigid mathematical models with limited applicability, it is shown that radar learning can be robustified by training the detector with synthetic data generated from multiple statistical models of the environment. Theoretical considerations and results show that the proposed methods are capable of adapting the transmitted waveform to environmental conditions while satisfying design constraints.

preprint2021arXiv

Multi-Sample Online Learning for Probabilistic Spiking Neural Networks

Spiking Neural Networks (SNNs) capture some of the efficiency of biological brains for inference and learning via the dynamic, online, event-driven processing of binary time series. Most existing learning algorithms for SNNs are based on deterministic neuronal models, such as leaky integrate-and-fire, and rely on heuristic approximations of backpropagation through time that enforce constraints such as locality. In contrast, probabilistic SNN models can be trained directly via principled online, local, update rules that have proven to be particularly effective for resource-constrained systems. This paper investigates another advantage of probabilistic SNNs, namely their capacity to generate independent outputs when queried over the same input. It is shown that the multiple generated output samples can be used during inference to robustify decisions and to quantify uncertainty -- a feature that deterministic SNN models cannot provide. Furthermore, they can be leveraged for training in order to obtain more accurate statistical estimates of the log-loss training criterion, as well as of its gradient. Specifically, this paper introduces an online learning rule based on generalized expectation-maximization (GEM) that follows a three-factor form with global learning signals and is referred to as GEM-SNN. Experimental results on structured output memorization and classification on a standard neuromorphic data set demonstrate significant improvements in terms of log-likelihood, accuracy, and calibration when increasing the number of samples used for inference and training.

preprint2021arXiv

Multi-Sample Online Learning for Spiking Neural Networks based on Generalized Expectation Maximization

Spiking Neural Networks (SNNs) offer a novel computational paradigm that captures some of the efficiency of biological brains by processing through binary neural dynamic activations. Probabilistic SNN models are typically trained to maximize the likelihood of the desired outputs by using unbiased estimates of the log-likelihood gradients. While prior work used single-sample estimators obtained from a single run of the network, this paper proposes to leverage multiple compartments that sample independent spiking signals while sharing synaptic weights. The key idea is to use these signals to obtain more accurate statistical estimates of the log-likelihood training criterion, as well as of its gradient. The approach is based on generalized expectation-maximization (GEM), which optimizes a tighter approximation of the log-likelihood using importance sampling. The derived online learning algorithm implements a three-factor rule with global per-compartment learning signals. Experimental results on a classification task on the neuromorphic MNIST-DVS data set demonstrate significant improvements in terms of log-likelihood, accuracy, and calibration when increasing the number of compartments used for training and inference.

preprint2021arXiv

Single-RF Multi-User Communication Through Reconfigurable Intelligent Surfaces: An Information-Theoretic Analysis

Reconfigurable intelligent surfaces (RISs) are typically used in multi-user systems to mitigate interference among active transmitters. In contrast, this paper studies a setting with a conventional active encoder as well as a passive encoder that modulates the reflection pattern of the RIS. The RIS hence serves the dual purpose of improving the rate of the active encoder and of enabling communication from the second encoder. The capacity region is characterized, and information-theoretic insights regarding the trade-offs between the rates of the two encoders are derived by focusing on the high- and low-power regimes.

preprint2020arXiv

Address-Event Variable-Length Compression for Time-Encoded Data

Time-encoded signals, such as social network update logs and spiking traces in neuromorphic processors, are defined by multiple traces carrying information in the timing of events, or spikes. When time-encoded data is processed at a remote site with respect to the location it is produced, the occurrence of events needs to be encoded and transmitted in a timely fashion. The standard Address-Event Representation (AER) protocol for neuromorphic chips encodes the indices of the "spiking" traces in the payload of a packet produced at the same time the events are recorded, hence implicitly encoding the events' timing in the timing of the packet. This paper investigates the potential bandwidth saving that can be obtained by carrying out variable-length compression of packets' payloads. Compression leverages both intra-trace and inter-trace correlations over time that are typical in applications such as social networks or neuromorphic computing. The approach is based on discrete-time Hawkes processes and entropy coding with conditional codebooks. Results from an experiment based on a real-world retweet dataset are also provided.

preprint2020arXiv

Cooperative Learning via Federated Distillation over Fading Channels

Cooperative training methods for distributed machine learning are typically based on the exchange of local gradients or local model parameters. The latter approach is known as Federated Learning (FL). An alternative solution with reduced communication overhead, referred to as Federated Distillation (FD), was recently proposed that exchanges only averaged model outputs. While prior work studied implementations of FL over wireless fading channels, here we propose wireless protocols for FD and for an enhanced version thereof that leverages an offline communication phase to communicate ``mixed-up'' covariate vectors. The proposed implementations consist of different combinations of digital schemes based on separate source-channel coding and of over-the-air computing strategies based on analog joint source-channel coding. It is shown that the enhanced version FD has the potential to significantly outperform FL in the presence of limited spectral resources.

preprint2020arXiv

Decentralized Federated Learning via SGD over Wireless D2D Networks

Federated Learning (FL), an emerging paradigm for fast intelligent acquisition at the network edge, enables joint training of a machine learning model over distributed data sets and computing resources with limited disclosure of local data. Communication is a critical enabler of large-scale FL due to significant amount of model information exchanged among edge devices. In this paper, we consider a network of wireless devices sharing a common fading wireless channel for the deployment of FL. Each device holds a generally distinct training set, and communication typically takes place in a Device-to-Device (D2D) manner. In the ideal case in which all devices within communication range can communicate simultaneously and noiselessly, a standard protocol that is guaranteed to converge to an optimal solution of the global empirical risk minimization problem under convexity and connectivity assumptions is Decentralized Stochastic Gradient Descent (DSGD). DSGD integrates local SGD steps with periodic consensus averages that require communication between neighboring devices. In this paper, wireless protocols are proposed that implement DSGD by accounting for the presence of path loss, fading, blockages, and mutual interference. The proposed protocols are based on graph coloring for scheduling and on both digital and analog transmission strategies at the physical layer, with the latter leveraging over-the-air computing via sparsity-based recovery.

preprint2020arXiv

End-to-End Learning of Neuromorphic Wireless Systems for Low-Power Edge Artificial Intelligence

This paper introduces a novel "all-spike" low-power solution for remote wireless inference that is based on neuromorphic sensing, Impulse Radio (IR), and Spiking Neural Networks (SNNs). In the proposed system, event-driven neuromorphic sensors produce asynchronous time-encoded data streams that are encoded by an SNN, whose output spiking signals are pulse modulated via IR and transmitted over general frequence-selective channels; while the receiver's inputs are obtained via hard detection of the received signals and fed to an SNN for classification. We introduce an end-to-end training procedure that treats the cascade of encoder, channel, and decoder as a probabilistic SNN-based autoencoder that implements Joint Source-Channel Coding (JSCC). The proposed system, termed NeuroJSCC, is compared to conventional synchronous frame-based and uncoded transmissions in terms of latency and accuracy. The experiments confirm that the proposed end-to-end neuromorphic edge architecture provides a promising framework for efficient and low-latency remote sensing, communication, and inference.

preprint2020arXiv

Fog-Based Detection for Random-Access IoT Networks with Per-Measurement Preambles

Internet of Things (IoT) systems may be deployed to monitor spatially distributed quantities of interests (QoIs), such as noise or pollution levels. This paper considers a fog-based IoT network, in which active IoT devices transmit measurements of the monitored QoIs to the local edge node (EN), while the ENs are connected to a cloud processor via limited-capacity fronthaul links. While the conventional approach uses preambles as metadata for reserving communication resources, here we consider assigning preambles directly to measurement levels across all devices. The resulting Type-Based Multiple Access (TBMA) protocol enables the efficient remote detection of the QoIs, rather than of the individual payloads. The performance of both edge and cloud-based detection or hypothesis testing is evaluated in terms of error exponents. Cloud-based hypothesis testing is shown theoretically and via numerical results to be advantageous when the inter-cell interference power and the fronthaul capacity are sufficiently large.

preprint2020arXiv

Fundamental Limits of Wireless Caching under Uneven-Capacity Channels

This work identifies the fundamental limits of cache-aided coded multicasting in the presence of the well-known `worst-user' bottleneck. This stems from the presence of receiving users with uneven channel capacities, which often forces the rate of transmission of each multicasting message to be reduced to that of the slowest user. This bottleneck, which can be detrimental in general wireless broadcast settings, motivates the analysis of coded caching over a standard Single-Input-Single-Output (SISO) Broadcast Channel (BC) with K cache-aided receivers, each with a generally different channel capacity. For this setting, we design a communication algorithm that is based on superposition coding that capitalizes on the realization that the user with the worst channel may not be the real bottleneck of communication. We then proceed to provide a converse that shows the algorithm to be near optimal, identifying the fundamental limits of this setting within a multiplicative factor of 4. Interestingly, the result reveals that, even if several users are experiencing channels with reduced capacity, the system can achieve the same optimal delivery time that would be achievable if all users enjoyed maximal capacity.

preprint2020arXiv

Information-Centric Grant-Free Access for IoT Fog Networks: Edge vs Cloud Detection and Learning

A multi-cell Fog-Radio Access Network (F-RAN) architecture is considered in which Internet of Things (IoT) devices periodically make noisy observations of a Quantity of Interest (QoI) and transmit using grant-free access in the uplink. The devices in each cell are connected to an Edge Node (EN), which may also have a finite-capacity fronthaul link to a central processor. In contrast to conventional information-agnostic protocols, the devices transmit using a Type-Based Multiple Access (TBMA) protocol that is tailored to enable the estimate of the field of correlated QoIs in each cell based on the measurements received from IoT devices. In this paper, this form of information-centric radio access is studied for the first time in a multi-cell F-RAN model with edge or cloud detection. Edge and cloud detection are designed and compared for a multi-cell system. Optimal model-based detectors are introduced and the resulting asymptotic behavior of the probability of error at cloud and edge is derived. Then, for the scenario in which a statistical model is not available, data-driven edge and cloud detectors are discussed and evaluated in numerical results.

preprint2020arXiv

Inter-Tenant Cooperative Reception for C-RAN Systems With Spectrum Pooling

This work studies the uplink of a multi-tenant cloud radio access network (C-RAN) system with spectrum pooling. In the system, each operator has a cloud processor (CP) connected to a set of proprietary radio units (RUs) through finite-capacity fronthaul links. The uplink spectrum is divided into private and shared subbands, and all the user equipments (UEs) of the participating operators can simultaneously transmit signals on the shared subband. To mitigate inter-operator interference on the shared subband, the CPs of the participating operators can exchange compressed uplink baseband signals on finite-capacity backhaul links. This work tackles the problem of jointly optimizing bandwidth allocation, transmit power control and fronthaul compression strategies. In the optimization, we impose that the inter-operator privacy loss be limited by a given threshold value. An iterative algorithm is proposed to find a suboptimal solution based on the matrix fractional programming approach. Numerical results validate the advantages of the proposed optimized spectrum pooling scheme.

preprint2020arXiv

ITENE: Intrinsic Transfer Entropy Neural Estimator

Quantifying the directionality of information flow is instrumental in understanding, and possibly controlling, the operation of many complex systems, such as transportation, social, neural, or gene-regulatory networks. The standard Transfer Entropy (TE) metric follows Granger's causality principle by measuring the Mutual Information (MI) between the past states of a source signal $X$ and the future state of a target signal $Y$ while conditioning on past states of $Y$. Hence, the TE quantifies the improvement, as measured by the log-loss, in the prediction of the target sequence $Y$ that can be accrued when, in addition to the past of $Y$, one also has available past samples from $X$. However, by conditioning on the past of $Y$, the TE also measures information that can be synergistically extracted by observing both the past of $X$ and $Y$, and not solely the past of $X$. Building on a private key agreement formulation, the Intrinsic TE (ITE) aims to discount such synergistic information to quantify the degree to which $X$ is \emph{individually} predictive of $Y$, independent of $Y$'s past. In this paper, an estimator of the ITE is proposed that is inspired by the recently proposed Mutual Information Neural Estimation (MINE). The estimator is based on variational bound on the KL divergence, two-sample neural network classifiers, and the pathwise estimator of Monte Carlo gradients.

preprint2020arXiv

LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning

Gradient-based distributed learning in Parameter Server (PS) computing architectures is subject to random delays due to straggling worker nodes, as well as to possible communication bottlenecks between PS and workers. Solutions have been recently proposed to separately address these impairments based on the ideas of gradient coding, worker grouping, and adaptive worker selection. This paper provides a unified analysis of these techniques in terms of wall-clock time, communication, and computation complexity measures. Furthermore, in order to combine the benefits of gradient coding and grouping in terms of robustness to stragglers with the communication and computation load gains of adaptive selection, novel strategies, named Lazily Aggregated Gradient Coding (LAGC) and Grouped-LAG (G-LAG), are introduced. Analysis and results show that G-LAG provides the best wall-clock time and communication performance, while maintaining a low computational cost, for two representative distributions of the computing times of the worker nodes.

preprint2020arXiv

Memristors -- from In-memory computing, Deep Learning Acceleration, Spiking Neural Networks, to the Future of Neuromorphic and Bio-inspired Computing

Machine learning, particularly in the form of deep learning, has driven most of the recent fundamental developments in artificial intelligence. Deep learning is based on computational models that are, to a certain extent, bio-inspired, as they rely on networks of connected simple computing units operating in parallel. Deep learning has been successfully applied in areas such as object/pattern recognition, speech and natural language processing, self-driving vehicles, intelligent self-diagnostics tools, autonomous robots, knowledgeable personal assistants, and monitoring. These successes have been mostly supported by three factors: availability of vast amounts of data, continuous growth in computing power, and algorithmic innovations. The approaching demise of Moore's law, and the consequent expected modest improvements in computing power that can be achieved by scaling, raise the question of whether the described progress will be slowed or halted due to hardware limitations. This paper reviews the case for a novel beyond CMOS hardware technology, memristors, as a potential solution for the implementation of power-efficient in-memory computing, deep learning accelerators, and spiking neural networks. Central themes are the reliance on non-von-Neumann computing architectures and the need for developing tailored learning and inference algorithms. To argue that lessons from biology can be useful in providing directions for further progress in artificial intelligence, we briefly discuss an example based reservoir computing. We conclude the review by speculating on the big picture view of future neuromorphic and brain-inspired computing systems.

preprint2020arXiv

Optimizing Over-the-Air Computation in IRS-Aided C-RAN Systems

Over-the-air computation (AirComp) is an efficient solution to enable federated learning on wireless channels. AirComp assumes that the wireless channels from different devices can be controlled, e.g., via transmitter-side phase compensation, in order to ensure coherent on-air combining. Intelligent reflecting surfaces (IRSs) can provide an alternative, or additional, means of controlling channel propagation conditions. This work studies the advantages of deploying IRSs for AirComp systems in a large-scale cloud radio access network (C-RAN). In this system, worker devices upload locally updated models to a parameter server (PS) through distributed access points (APs) that communicate with the PS on finite-capacity fronthaul links. The problem of jointly optimizing the IRSs' reflecting phases and a linear detector at the PS is tackled with the goal of minimizing the mean squared error (MSE) of a parameter estimated at the PS. Numerical results validate the advantages of deploying IRSs with optimized phases for AirComp in C-RAN systems.

preprint2020arXiv

SpinAPS: A High-Performance Spintronic Accelerator for Probabilistic Spiking Neural Networks

We discuss a high-performance and high-throughput hardware accelerator for probabilistic Spiking Neural Networks (SNNs) based on Generalized Linear Model (GLM) neurons, that uses binary STT-RAM devices as synapses and digital CMOS logic for neurons. The inference accelerator, termed "SpinAPS" for Spintronic Accelerator for Probabilistic SNNs, implements a principled direct learning rule for first-to-spike decoding without the need for conversion from pre-trained ANNs. The proposed solution is shown to achieve comparable performance with an equivalent ANN on handwritten digit and human activity recognition benchmarks. The inference engine, SpinAPS, is shown through software emulation tools to achieve 4x performance improvement in terms of GSOPS/W/mm2 when compared to an equivalent SRAM-based design. The architecture leverages probabilistic spiking neural networks that employ first-to-spike decoding rule to make inference decisions at low latencies, achieving 75% of the test performance in as few as 4 algorithmic time steps on the handwritten digit benchmark. The accelerator also exhibits competitive performance with other memristor-based DNN/SNN accelerators and state-of-the-art GPUs.

preprint2020arXiv

VOWEL: A Local Online Learning Rule for Recurrent Networks of Probabilistic Spiking Winner-Take-All Circuits

Networks of spiking neurons and Winner-Take-All spiking circuits (WTA-SNNs) can detect information encoded in spatio-temporal multi-valued events. These are described by the timing of events of interest, e.g., clicks, as well as by categorical numerical values assigned to each event, e.g., like or dislike. Other use cases include object recognition from data collected by neuromorphic cameras, which produce, for each pixel, signed bits at the times of sufficiently large brightness variations. Existing schemes for training WTA-SNNs are limited to rate-encoding solutions, and are hence able to detect only spatial patterns. Developing more general training algorithms for arbitrary WTA-SNNs inherits the challenges of training (binary) Spiking Neural Networks (SNNs). These amount, most notably, to the non-differentiability of threshold functions, to the recurrent behavior of spiking neural models, and to the difficulty of implementing backpropagation in neuromorphic hardware. In this paper, we develop a variational online local training rule for WTA-SNNs, referred to as VOWEL, that leverages only local pre- and post-synaptic information for visible circuits, and an additional common reward signal for hidden circuits. The method is based on probabilistic generalized linear neural models, control variates, and variational regularization. Experimental results on real-world neuromorphic datasets with multi-valued events demonstrate the advantages of WTA-SNNs over conventional binary SNNs trained with state-of-the-art methods, especially in the presence of limited computing resources.

preprint2019arXiv

An Introduction to Probabilistic Spiking Neural Networks: Probabilistic Models, Learning Rules, and Applications

Spiking neural networks (SNNs) are distributed trainable systems whose computing elements, or neurons, are characterized by internal analog dynamics and by digital and sparse synaptic communications. The sparsity of the synaptic spiking inputs and the corresponding event-driven nature of neural processing can be leveraged by energy-efficient hardware implementations, which can offer significant energy reductions as compared to conventional artificial neural networks (ANNs). The design of training algorithms lags behind the hardware implementations. Most existing training algorithms for SNNs have been designed either for biological plausibility or through conversion from pretrained ANNs via rate encoding. This article provides an introduction to SNNs by focusing on a probabilistic signal processing methodology that enables the direct derivation of learning rules by leveraging the unique time-encoding capabilities of SNNs. We adopt discrete-time probabilistic models for networked spiking neurons and derive supervised and unsupervised learning rules from first principles via variational inference. Examples and open research problems are also provided.

preprint2019arXiv

Learning First-to-Spike Policies for Neuromorphic Control Using Policy Gradients

Artificial Neural Networks (ANNs) are currently being used as function approximators in many state-of-the-art Reinforcement Learning (RL) algorithms. Spiking Neural Networks (SNNs) have been shown to drastically reduce the energy consumption of ANNs by encoding information in sparse temporal binary spike streams, hence emulating the communication mechanism of biological neurons. Due to their low energy consumption, SNNs are considered to be important candidates as co-processors to be implemented in mobile devices. In this work, the use of SNNs as stochastic policies is explored under an energy-efficient first-to-spike action rule, whereby the action taken by the RL agent is determined by the occurrence of the first spike among the output neurons. A policy gradient-based algorithm is derived considering a Generalized Linear Model (GLM) for spiking neurons. Experimental results demonstrate the capability of online trained SNNs as stochastic policies to gracefully trade energy consumption, as measured by the number of spikes, and control performance. Significant gains are shown as compared to the standard approach of converting an offline trained ANN into an SNN.