Researcher profile

Guillermo Sapiro

Guillermo Sapiro contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

LLMs are not (consistently) Bayesian: Quantifying internal (in)consistencies of LLMs' probabilistic beliefs

Modern AI systems are being deployed in complex domains such as medicine, science, and law, where it is important that they not only produce correct answers, but also represent and update uncertain beliefs about the world as new evidence arrives. We introduce the novel technique of studying LLMs as information processing rules and utilize the information processing gap to study the internal (in)consistencies of how LLMs update their probabilistic beliefs from evidence. Our extensive experiments evaluate multiple approaches in which LLMs can incorporate evidence into their beliefs. Some of these approaches produce (nearly) Bayesian updates; others seem to use a learned heuristic. Surprisingly, the non-Bayesian heuristic updates often outperform exact Bayesian computation in terms of downstream task performance -- indicating the LLMs' probabilistic models of the world are misspecified. Lastly, we show how our measure can provide diagnostics to identify issues with LLM-powered inferential systems.

preprint2022arXiv

Minimax Demographic Group Fairness in Federated Learning

Federated learning is an increasingly popular paradigm that enables a large number of entities to collaboratively learn better models. In this work, we study minimax group fairness in federated learning scenarios where different participating entities may only have access to a subset of the population groups during the training phase. We formally analyze how our proposed group fairness objective differs from existing federated learning fairness criteria that impose similar performance across participants instead of demographic groups. We provide an optimization algorithm -- FedMinMax -- for solving the proposed problem that provably enjoys the performance guarantees of centralized learning algorithms. We experimentally compare the proposed approach against other state-of-the-art methods in terms of group fairness in various federated learning setups, showing that our approach exhibits competitive or superior performance.

preprint2022arXiv

Scaling-Translation-Equivariant Networks with Decomposed Convolutional Filters

Encoding the scale information explicitly into the representation learned by a convolutional neural network (CNN) is beneficial for many computer vision tasks especially when dealing with multiscale inputs. We study, in this paper, a scaling-translation-equivariant (ST-equivariant) CNN with joint convolutions across the space and the scaling group, which is shown to be both sufficient and necessary to achieve equivariance for the regular representation of the scaling-translation group ST . To reduce the model complexity and computational burden, we decompose the convolutional filters under two pre-fixed separable bases and truncate the expansion to low-frequency components. A further benefit of the truncated filter expansion is the improved deformation robustness of the equivariant representation, a property which is theoretically analyzed and empirically verified. Numerical experiments demonstrate that the proposed scaling-translation-equivariant network with decomposed convolutional filters (ScDCFNet) achieves significantly improved performance in multiscale image classification and better interpretability than regular CNNs at a reduced model size.

preprint2020arXiv

ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution

Convolutional Neural Networks (CNNs) are known to be significantly over-parametrized, and difficult to interpret, train and adapt. In this paper, we introduce a structural regularization across convolutional kernels in a CNN. In our approach, each convolution kernel is first decomposed as 2D dictionary atoms linearly combined by coefficients. The widely observed correlation and redundancy in a CNN hint a common low-rank structure among the decomposed coefficients, which is here further supported by our empirical observations. We then explicitly regularize CNN kernels by enforcing decomposed coefficients to be shared across sub-structures, while leaving each sub-structure only its own dictionary atoms, a few hundreds of parameters typically, which leads to dramatic model reductions. We explore models with sharing across different sub-structures to cover a wide range of trade-offs between parameter reduction and expressiveness. Our proposed regularized network structures open the door to better interpreting, training and adapting deep models. We validate the flexibility and compatibility of our method by image classification experiments on multiple datasets and underlying network structures, and show that CNNs now maintain performance with dramatic reduction in parameters and computations, e.g., only 5\% parameters are used in a ResNet-18 to achieve comparable performance. Further experiments on few-shot classification show that faster and more robust task adaptation is obtained in comparison with models with standard convolutions.

preprint2020arXiv

Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

Deep neural networks (DNNs) are notorious for their vulnerability to adversarial attacks, which are small perturbations added to their input images to mislead their prediction. Detection of adversarial examples is, therefore, a fundamental requirement for robust classification frameworks. In this work, we present a method for detecting such adversarial attacks, which is suitable for any pre-trained neural network classifier. We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN's activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this correlation is much weaker for adversarial inputs. We train an adversarial detector using the k-NN ranks and distances and show that it successfully distinguishes adversarial examples, getting state-of-the-art results on six attack methods with three datasets. Code is available at https://github.com/giladcohen/NNIF_adv_defense.

preprint2020arXiv

Differential 3D Facial Recognition: Adding 3D to Your State-of-the-Art 2D Method

Active illumination is a prominent complement to enhance 2D face recognition and make it more robust, e.g., to spoofing attacks and low-light conditions. In the present work we show that it is possible to adopt active illumination to enhance state-of-the-art 2D face recognition approaches with 3D features, while bypassing the complicated task of 3D reconstruction. The key idea is to project over the test face a high spatial frequency pattern, which allows us to simultaneously recover real 3D information plus a standard 2D facial image. Therefore, state-of-the-art 2D face recognition solution can be transparently applied, while from the high frequency component of the input image, complementary 3D facial features are extracted. Experimental results on ND-2006 dataset show that the proposed ideas can significantly boost face recognition performance and dramatically improve the robustness to spoofing attacks.

preprint2020arXiv

Nested Learning For Multi-Granular Tasks

Standard deep neural networks (DNNs) are commonly trained in an end-to-end fashion for specific tasks such as object recognition, face identification, or character recognition, among many examples. This specificity often leads to overconfident models that generalize poorly to samples that are not from the original training distribution. Moreover, such standard DNNs do not allow to leverage information from heterogeneously annotated training data, where for example, labels may be provided with different levels of granularity. Furthermore, DNNs do not produce results with simultaneous different levels of confidence for different levels of detail, they are most commonly an all or nothing approach. To address these challenges, we introduce the concept of nested learning: how to obtain a hierarchical representation of the input such that a coarse label can be extracted first, and sequentially refine this representation, if the sample permits, to obtain successively refined predictions, all of them with the corresponding confidence. We explicitly enforce this behavior by creating a sequence of nested information bottlenecks. Looking at the problem of nested learning from an information theory perspective, we design a network topology with two important properties. First, a sequence of low dimensional (nested) feature embeddings are enforced. Then we show how the explicit combination of nested outputs can improve both the robustness and the accuracy of finer predictions. Experimental results on Cifar-10, Cifar-100, MNIST, Fashion-MNIST, Dbpedia, and Plantvillage demonstrate that nested learning outperforms the same network trained in the standard end-to-end fashion.

preprint2020arXiv

Stochastic Conditional Generative Networks with Basis Decomposition

While generative adversarial networks (GANs) have revolutionized machine learning, a number of open questions remain to fully understand them and exploit their power. One of these questions is how to efficiently achieve proper diversity and sampling of the multi-mode data space. To address this, we introduce BasisGAN, a stochastic conditional multi-mode image generator. By exploiting the observation that a convolutional filter can be well approximated as a linear combination of a small set of basis elements, we learn a plug-and-played basis generator to stochastically generate basis elements, with just a few hundred of parameters, to fully embed stochasticity into convolutional filters. By sampling basis elements instead of filters, we dramatically reduce the cost of modeling the parameter space with no sacrifice on either image diversity or fidelity. To illustrate this proposed plug-and-play framework, we construct variants of BasisGAN based on state-of-the-art conditional image generation networks, and train the networks by simply plugging in a basis generator, without additional auxiliary components, hyperparameters, or training objectives. The experimental success is complemented with theoretical results indicating how the perturbations introduced by the proposed sampling of basis elements can propagate to the appearance of generated images.

preprint2019arXiv

Quantum Computing at the Frontiers of Biological Sciences

The search for meaningful structure in biological data has relied on cutting-edge advances in computational technology and data science methods. However, challenges arise as we push the limits of scale and complexity in biological problems. Innovation in massively parallel, classical computing hardware and algorithms continues to address many of these challenges, but there is a need to simultaneously consider new paradigms to circumvent current barriers to processing speed. Accordingly, we articulate a view towards quantum computation and quantum information science, where algorithms have demonstrated potential polynomial and exponential computational speedups in certain applications, such as machine learning. The maturation of the field of quantum computing, in hardware and algorithm development, also coincides with the growth of several collaborative efforts to address questions across length and time scales, and scientific disciplines. We use this coincidence to explore the potential for quantum computing to aid in one such endeavor: the merging of insights from genetics, genomics, neuroimaging and behavioral phenotyping. By examining joint opportunities for computational innovation across fields, we highlight the need for a common language between biological data analysis and quantum computing. Ultimately, we consider current and future prospects for the employment of quantum computing algorithms in the biological sciences.