Researcher profile

Jesus Malo

Jesus Malo contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
3topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

Isolating Nonlinear Independent Sources in fMRI with $β$-TCVAE Models

Learning meaningful latent representations from nonlinear fMRI data remains a fundamental challenge in neuroimaging analysis. Traditional independent component analysis, widely used due to its ability to estimate interpretable functional brain networks, relies on a linear mixing assumption for latent sources, limiting its ability to capture the inherently nonlinear and complex organization of brain dynamics. More recently, deep representation learning methods have emerged as promising alternatives for modeling nonlinear latent structure. However, many of these approaches have been evaluated primarily on simulated datasets or natural image benchmarks, with comparatively limited validation on real-world neuroimaging data such as fMRI. In this work, we are motivated by the $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the $β$-VAE framework for learning latent representations without introducing additional hyperparameters during training. We adapt and modify this model to fMRI data for nonlinear source disentanglement, aiming to separate mixed spatial and temporal brain signals into interpretable components. We show that the $β$-TCVAE framework can recover meaningful nonlinear spatial components with biological relevance, including well-established intrinsic connectivity networks such as the default mode network. Furthermore, we evaluate the learned representations using functional network connectivity, showing that the latent structure captures coherent and interpretable brain organization patterns. This study provides a pilot investigation that bridges nonlinear representation learning and fMRI analysis.

preprint2026arXiv

Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs

Convolutional Neural Networks (CNNs) are widely assumed to be translation-invariant, yet standard architectures exhibit a startling fragility: even a single-pixel shift can drastically degrade performance due to their reliance on spatially dependent fully connected layers. In this work, we resolve this vulnerability by proposing a lightweight 'Online Architecture' strategy. By strategically inserting Global Average Pooling (GAP) layers at various network depths, we effectively decouple feature recognition from spatial location. Using VGG-16 as a primary case study, we demonstrate that this architectural modification achieves a massive 98% reduction in trainable parameters (from 5.2M to just 82K) and a 90% reduction in total network size (138M to 14M). Despite this drastic pruning, our variants maintain competitive Top-1 accuracy on ImageNet (66.4%) while doubling translational robustness, reducing average relative loss from 0.09 to 0.05. Furthermore, our analysis identifies a fundamental limit to invariance: while GAP resolves macroscopic sensitivity, discrete pooling operations introduce a residual periodic aliasing that prevents perfect pixel-level stability. Finally, we extend these findings to Perceptual Image Quality Assessment (IQA) by integrating our invariant backbones into the LPIPS framework. The resulting metric significantly outperforms the retrained baseline in generalization across the KADID-10k dataset (Spearman 0.89 vs. 0.75) and achieves a near-perfect alignment with human psychophysical response curves on the RAID dataset (Spearman 0.95). These results confirm that enforcing architectural invariance is a far more efficient and biologically plausible path to robustness than traditional data augmentation. Data and code are publicly available. The data and code are publicly available to facilitate validation and further research.

preprint2022arXiv

Contrast Sensitivity Functions in Autoencoders

Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the Contrast Sensitivity Functions (CSFs) in light of (1) the current trend of using artificial neural networks for studying vision, and (2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatio-temporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic adaptation or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with an RMSE error of 11\% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher-level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science since the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision.

preprint2022arXiv

Paraphrasing Magritte's Observation

Contrast Sensitivity of the human visual system can be explained from certain low-level vision tasks (like retinal noise and optical blur removal), but not from others (like chromatic adaptation or pure reconstruction after simple bottlenecks). This conclusion still holds even under substantial change in stimulus statistics, as for instance considering cartoon-like images as opposed to natural images (Li et al. Journal of Vision, 2022, Preprint arXiv:2103.00481). In this note we present a method to generate original cartoon-like images compatible with the statistical training used in (Li et al., 2022). Following the classical observation in (Magritte, 1929), the stimuli generated by the proposed method certainly are not what they represent: Ceci n'est pas une pipe. The clear distinction between representation (the stimuli generated by the proposed method) and reality (the actual object) avoids eventual problems for the use of the generated stimuli in academic, non-profit, publications.

preprint2020arXiv

Spatio-Chromatic Information available from different Neural Layers via Gaussianization

How much visual information about the retinal images can be extracted from the different layers of the visual pathway?. Separate subsystems (e.g. opponent channels, spatial filters, nonlinearities of the texture sensors) have been suggested to be organized for optimal information transmission. However, the efficiency of these different layers has not been measured when they operate together on colorimetrically calibrated natural images and using multivariate information-theoretic units over the joint spatio-chromatic array of responses. In this work we present a statistical tool to address this question in an appropriate (multivariate) way. Specifically, we propose an empirical estimate of the information transmitted by the system based on a recent Gaussianization technique that reduces the challenging multivariate PDF estimation problem to a set of simpler univariate estimations. Total correlation measured using the proposed estimator is consistent with predictions based on the analytical Jacobian of a standard spatio-chromatic model of the retina-cortex pathway. If the noise at certain representation is proportional to the dynamic range of the response, and one assumes sensors of equivalent noise level, transmitted information shows the following trends: (1) progressively deeper representations are better in terms of the amount of information about the input, (2) the transmitted information up to the cortical representation follows the PDF of natural scenes over the chromatic and achromatic dimensions of the stimulus space, (3) the contribution of spatial transforms to capture visual information is substantially bigger than the contribution of chromatic transforms, and (4) nonlinearities of the responses contribute substantially to the transmitted information but less than the linear transforms.