Source author record

Miguel Rodrigues

Miguel Rodrigues appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Information Theory math.IT Artificial Intelligence Computer Vision Distributed, Parallel, and Cluster Computing Social and Information Networks

Catalog footprint

What is connected

10works

7topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Minimax Demographic Group Fairness in Federated Learning

Federated learning is an increasingly popular paradigm that enables a large number of entities to collaboratively learn better models. In this work, we study minimax group fairness in federated learning scenarios where different participating entities may only have access to a subset of the population groups during the training phase. We formally analyze how our proposed group fairness objective differs from existing federated learning fairness criteria that impose similar performance across participants instead of demographic groups. We provide an optimization algorithm -- FedMinMax -- for solving the proposed problem that provably enjoys the performance guarantees of centralized learning algorithms. We experimentally compare the proposed approach against other state-of-the-art methods in terms of group fairness in various federated learning setups, showing that our approach exhibits competitive or superior performance.

preprint2022arXiv

Simple Regularisation for Uncertainty-Aware Knowledge Distillation

Considering uncertainty estimation of modern neural networks (NNs) is one of the most important steps towards deploying machine learning systems to meaningful real-world applications such as in medicine, finance or autonomous systems. At the moment, ensembles of different NNs constitute the state-of-the-art in both accuracy and uncertainty estimation in different tasks. However, ensembles of NNs are unpractical under real-world constraints, since their computation and memory consumption scale linearly with the size of the ensemble, which increase their latency and deployment cost. In this work, we examine a simple regularisation approach for distribution-free knowledge distillation of ensemble of machine learning models into a single NN. The aim of the regularisation is to preserve the diversity, accuracy and uncertainty estimation characteristics of the original ensemble without any intricacies, such as fine-tuning. We demonstrate the generality of the approach on combinations of toy data, SVHN/CIFAR-10, simple to complex NN architectures and different tasks.

preprint2022arXiv

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Generalization error bounds are essential to understanding machine learning algorithms. This paper presents novel expected generalization error upper bounds based on the average joint distribution between the output hypothesis and each input training sample. Multiple generalization error upper bounds based on different information measures are provided, including Wasserstein distance, total variation distance, KL divergence, and Jensen-Shannon divergence. Due to the convexity of the information measures, the proposed bounds in terms of Wasserstein distance and total variation distance are shown to be tighter than their counterparts based on individual samples in the literature. An example is provided to demonstrate the tightness of the proposed generalization error bounds.

preprint2022arXiv

Wireless Image Transmission Using Deep Source Channel Coding With Attention Modules

Recent research on joint source channel coding (JSCC) for wireless communications has achieved great success owing to the employment of deep learning (DL). However, the existing work on DL based JSCC usually trains the designed network to operate under a specific signal-to-noise ratio (SNR) regime, without taking into account that the SNR level during the deployment stage may differ from that during the training stage. A number of networks are required to cover the scenario with a broad range of SNRs, which is computational inefficiency (in the training stage) and requires large storage. To overcome these drawbacks our paper proposes a novel method called Attention DL based JSCC (ADJSCC) that can successfully operate with different SNR levels during transmission. This design is inspired by the resource assignment strategy in traditional JSCC, which dynamically adjusts the compression ratio in source coding and the channel coding rate according to the channel SNR. This is achieved by resorting to attention mechanisms because these are able to allocate computing resources to more critical tasks. Instead of applying the resource allocation strategy in traditional JSCC, the ADJSCC uses the channel-wise soft attention to scaling features according to SNR conditions. We compare the ADJSCC method with the state-of-the-art DL based JSCC method through extensive experiments to demonstrate its adaptability, robustness and versatility. Compared with the existing methods, the proposed method takes less storage and is more robust in the presence of channel mismatch.

preprint2021arXiv

VINNAS: Variational Inference-based Neural Network Architecture Search

In recent years, neural architecture search (NAS) has received intensive scientific and industrial interest due to its capability of finding a neural architecture with high accuracy for various artificial intelligence tasks such as image classification or object detection. In particular, gradient-based NAS approaches have become one of the more popular approaches thanks to their computational efficiency during the search. However, these methods often experience a mode collapse, where the quality of the found architectures is poor due to the algorithm resorting to choosing a single operation type for the entire network, or stagnating at a local minima for various datasets or search spaces. To address these defects, we present a differentiable variational inference-based NAS method for searching sparse convolutional neural networks. Our approach finds the optimal neural architecture by dropping out candidate operations in an over-parameterised supergraph using variational dropout with automatic relevance determination prior, which makes the algorithm gradually remove unnecessary operations and connections without risking mode collapse. The evaluation is conducted through searching two types of convolutional cells that shape the neural network for classifying different image datasets. Our method finds diverse network cells, while showing state-of-the-art accuracy with up to almost 2 times fewer non-zero parameters.

preprint2014arXiv

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers

We present fundamental limits on the reliable classification of linear and affine subspaces from noisy, linear features. Drawing an analogy between discrimination among subspaces and communication over vector wireless channels, we propose two Shannon-inspired measures to characterize asymptotic classifier performance. First, we define the classification capacity, which characterizes necessary and sufficient conditions for the misclassification probability to vanish as the signal dimension, the number of features, and the number of subspaces to be discerned all approach infinity. Second, we define the diversity-discrimination tradeoff which, by analogy with the diversity-multiplexing tradeoff of fading vector channels, characterizes relationships between the number of discernible subspaces and the misclassification probability as the noise power approaches zero. We derive upper and lower bounds on these measures which are tight in many regimes. Numerical results, including a face recognition application, validate the results in practice.

preprint2014arXiv

Latent Sentiment Detection in Online Social Networks: A Communications-oriented View

In this paper, we consider the problem of latent sentiment detection in Online Social Networks such as Twitter. We demonstrate the benefits of using the underlying social network as an Ising prior to perform network aided sentiment detection. We show that the use of the underlying network results in substantially lower detection error rates compared to strictly features-based detection. In doing so, we introduce a novel communications-oriented framework for characterizing the probability of error, based on information-theoretic analysis. We study the variation of the calculated error exponent for several stylized network topologies such as the complete network, the star network and the closed-chain network, and show the importance of the network structure in determining detection performance.

preprint2013arXiv

Generalized Bregman Divergence and Gradient of Mutual Information for Vector Poisson Channels

We investigate connections between information-theoretic and estimation-theoretic quantities in vector Poisson channel models. In particular, we generalize the gradient of mutual information with respect to key system parameters from the scalar to the vector Poisson channel model. We also propose, as another contribution, a generalization of the classical Bregman divergence that offers a means to encapsulate under a unifying framework the gradient of mutual information results for scalar and vector Poisson and Gaussian channel models. The so-called generalized Bregman divergence is also shown to exhibit various properties akin to the properties of the classical version. The vector Poisson channel model is drawing considerable attention in view of its application in various domains: as an example, the availability of the gradient of mutual information can be used in conjunction with gradient descent methods to effect compressive-sensing projection designs in emerging X-ray and document classification applications.

preprint2012arXiv

Communications Inspired Linear Discriminant Analysis

We study the problem of supervised linear dimensionality reduction, taking an information-theoretic viewpoint. The linear projection matrix is designed by maximizing the mutual information between the projected signal and the class label (based on a Shannon entropy measure). By harnessing a recent theoretical result on the gradient of mutual information, the above optimization problem can be solved directly using gradient descent, without requiring simplification of the objective function. Theoretical analysis and empirical comparison are made between the proposed method and two closely related methods (Linear Discriminant Analysis and Information Discriminant Analysis), and comparisons are also made with a method in which Renyi entropy is used to define the mutual information (in this case the gradient may be computed simply, under a special parameter setting). Relative to these alternative approaches, the proposed method achieves promising results on real datasets.

preprint2012arXiv

Multiple-antenna fading coherent channels with arbitrary inputs: Characterization and optimization of the reliable information transmission rate

We investigate the constrained capacity of multiple-antenna fading coherent channels, where the receiver knows the channel state but the transmitter knows only the channel distribution, driven by arbitrary equiprobable discrete inputs in a regime of high signal-to-noise ratio (${\sf snr}$). In particular, we capitalize on intersections between information theory and estimation theory to conceive expansions to the average minimum-mean squared error (MMSE) and the average mutual information, which leads to an expansion of the constrained capacity, that capture well their behavior in the asymptotic regime of high ${\sf snr}$. We use the expansions to study the constrained capacity of various multiple-antenna fading coherent channels, including Rayleigh fading models, Ricean fading models and antenna-correlated models. The analysis unveils in detail the impact of the number of transmit and receive antennas, transmit and receive antenna correlation, line-of-sight components and the geometry of the signalling scheme on the reliable information transmission rate. We also use the expansions to design key system elements, such as power allocation and precoding schemes, as well as to design space-time signalling schemes for multiple-antenna fading coherent channels. Simulations results demonstrate that the expansions lead to very sharp designs.

Miguel Rodrigues

What is connected

Connect this record

See the researcher in context

Building this map preview

10 published item(s)

Minimax Demographic Group Fairness in Federated Learning

Simple Regularisation for Uncertainty-Aware Knowledge Distillation

Tighter Expected Generalization Error Bounds via Convexity of Information Measures

Wireless Image Transmission Using Deep Source Channel Coding With Attention Modules

VINNAS: Variational Inference-based Neural Network Architecture Search

Discrimination on the Grassmann Manifold: Fundamental Limits of Subspace Classifiers

Latent Sentiment Detection in Online Social Networks: A Communications-oriented View

Generalized Bregman Divergence and Gradient of Mutual Information for Vector Poisson Channels

Communications Inspired Linear Discriminant Analysis

Multiple-antenna fading coherent channels with arbitrary inputs: Characterization and optimization of the reliable information transmission rate