Source author record

Jesus Malo

Jesus Malo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Neurons and Cognition Computer Vision Machine Learning

Catalog footprint

What is connected

8works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Isolating Nonlinear Independent Sources in fMRI with $β$-TCVAE Models

Learning meaningful latent representations from nonlinear fMRI data remains a fundamental challenge in neuroimaging analysis. Traditional independent component analysis, widely used due to its ability to estimate interpretable functional brain networks, relies on a linear mixing assumption for latent sources, limiting its ability to capture the inherently nonlinear and complex organization of brain dynamics. More recently, deep representation learning methods have emerged as promising alternatives for modeling nonlinear latent structure. However, many of these approaches have been evaluated primarily on simulated datasets or natural image benchmarks, with comparatively limited validation on real-world neuroimaging data such as fMRI. In this work, we are motivated by the $β$-TCVAE (Total Correlation Variational Autoencoder), a refinement of the $β$-VAE framework for learning latent representations without introducing additional hyperparameters during training. We adapt and modify this model to fMRI data for nonlinear source disentanglement, aiming to separate mixed spatial and temporal brain signals into interpretable components. We show that the $β$-TCVAE framework can recover meaningful nonlinear spatial components with biological relevance, including well-established intrinsic connectivity networks such as the default mode network. Furthermore, we evaluate the learned representations using functional network connectivity, showing that the latent structure captures coherent and interpretable brain organization patterns. This study provides a pilot investigation that bridges nonlinear representation learning and fMRI analysis.

preprint2026arXiv

Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs

Convolutional Neural Networks (CNNs) are widely assumed to be translation-invariant, yet standard architectures exhibit a startling fragility: even a single-pixel shift can drastically degrade performance due to their reliance on spatially dependent fully connected layers. In this work, we resolve this vulnerability by proposing a lightweight 'Online Architecture' strategy. By strategically inserting Global Average Pooling (GAP) layers at various network depths, we effectively decouple feature recognition from spatial location. Using VGG-16 as a primary case study, we demonstrate that this architectural modification achieves a massive 98% reduction in trainable parameters (from 5.2M to just 82K) and a 90% reduction in total network size (138M to 14M). Despite this drastic pruning, our variants maintain competitive Top-1 accuracy on ImageNet (66.4%) while doubling translational robustness, reducing average relative loss from 0.09 to 0.05. Furthermore, our analysis identifies a fundamental limit to invariance: while GAP resolves macroscopic sensitivity, discrete pooling operations introduce a residual periodic aliasing that prevents perfect pixel-level stability. Finally, we extend these findings to Perceptual Image Quality Assessment (IQA) by integrating our invariant backbones into the LPIPS framework. The resulting metric significantly outperforms the retrained baseline in generalization across the KADID-10k dataset (Spearman 0.89 vs. 0.75) and achieves a near-perfect alignment with human psychophysical response curves on the RAID dataset (Spearman 0.95). These results confirm that enforcing architectural invariance is a far more efficient and biologically plausible path to robustness than traditional data augmentation. Data and code are publicly available. The data and code are publicly available to facilitate validation and further research.

preprint2022arXiv

Contrast Sensitivity Functions in Autoencoders

Three decades ago, Atick et al. suggested that human frequency sensitivity may emerge from the enhancement required for a more efficient analysis of retinal images. Here we reassess the relevance of low-level vision tasks in the explanation of the Contrast Sensitivity Functions (CSFs) in light of (1) the current trend of using artificial neural networks for studying vision, and (2) the current knowledge of retinal image representations. As a first contribution, we show that a very popular type of convolutional neural networks (CNNs), called autoencoders, may develop human-like CSFs in the spatio-temporal and chromatic dimensions when trained to perform some basic low-level vision tasks (like retinal noise and optical blur removal), but not others (like chromatic adaptation or pure reconstruction after simple bottlenecks). As an illustrative example, the best CNN (in the considered set of simple architectures for enhancement of the retinal signal) reproduces the CSFs with an RMSE error of 11\% of the maximum sensitivity. As a second contribution, we provide experimental evidence of the fact that, for some functional goals (at low abstraction level), deeper CNNs that are better in reaching the quantitative goal are actually worse in replicating human-like phenomena (such as the CSFs). This low-level result (for the explored networks) is not necessarily in contradiction with other works that report advantages of deeper nets in modeling higher-level vision goals. However, in line with a growing body of literature, our results suggests another word of caution about CNNs in vision science since the use of simplified units or unrealistic architectures in goal optimization may be a limitation for the modeling and understanding of human vision.

preprint2022arXiv

Paraphrasing Magritte's Observation

Contrast Sensitivity of the human visual system can be explained from certain low-level vision tasks (like retinal noise and optical blur removal), but not from others (like chromatic adaptation or pure reconstruction after simple bottlenecks). This conclusion still holds even under substantial change in stimulus statistics, as for instance considering cartoon-like images as opposed to natural images (Li et al. Journal of Vision, 2022, Preprint arXiv:2103.00481). In this note we present a method to generate original cartoon-like images compatible with the statistical training used in (Li et al., 2022). Following the classical observation in (Magritte, 1929), the stimuli generated by the proposed method certainly are not what they represent: Ceci n'est pas une pipe. The clear distinction between representation (the stimuli generated by the proposed method) and reality (the actual object) avoids eventual problems for the use of the generated stimuli in academic, non-profit, publications.

preprint2020arXiv

Spatio-Chromatic Information available from different Neural Layers via Gaussianization

How much visual information about the retinal images can be extracted from the different layers of the visual pathway?. Separate subsystems (e.g. opponent channels, spatial filters, nonlinearities of the texture sensors) have been suggested to be organized for optimal information transmission. However, the efficiency of these different layers has not been measured when they operate together on colorimetrically calibrated natural images and using multivariate information-theoretic units over the joint spatio-chromatic array of responses. In this work we present a statistical tool to address this question in an appropriate (multivariate) way. Specifically, we propose an empirical estimate of the information transmitted by the system based on a recent Gaussianization technique that reduces the challenging multivariate PDF estimation problem to a set of simpler univariate estimations. Total correlation measured using the proposed estimator is consistent with predictions based on the analytical Jacobian of a standard spatio-chromatic model of the retina-cortex pathway. If the noise at certain representation is proportional to the dynamic range of the response, and one assumes sensors of equivalent noise level, transmitted information shows the following trends: (1) progressively deeper representations are better in terms of the amount of information about the input, (2) the transmitted information up to the cortical representation follows the PDF of natural scenes over the chromatic and achromatic dimensions of the stimulus space, (3) the contribution of spatial transforms to capture visual information is substantially bigger than the contribution of chromatic transforms, and (4) nonlinearities of the responses contribute substantially to the transmitted information but less than the linear transforms.

preprint2016arXiv

Derivatives and inverse of a linear-nonlinear multi-layer spatial vision model

Linear-nonlinear transforms are interesting in vision science because they are key in modeling a number of perceptual experiences such as color, motion or spatial texture. Here we first show that a number of issues in vision may be addressed through an analytic expression of the Jacobian of these linear-nonlinear transforms. The particular model analyzed afterwards (an extension of [Malo & Simoncelli SPIE 2015]) is illustrative because it consists of a cascade of standard linear-nonlinear modules. Each module roughly corresponds to a known psychophysical mechanism: (1) linear spectral integration and nonlinear brightness-from-luminance computation, (2) linear pooling of local brightness and nonlinear normalization for local contrast computation, (3) linear frequency selectivity and nonlinear normalization for spatial contrast masking, and (4) linear wavelet-like decomposition and nonlinear normalization for frequency-dependent masking. Beyond being the appropriate technical report with the missing details in [Malo & Simoncelli SPIE 2015], the interest of the presented analytic results and numerical methods transcend the particular model because of the ubiquity of the linear-nonlinear structure. Part of this material was presented at MODVIS 2016 (see slides of the conference talk in the appendix at the end of this document).

preprint2016arXiv

Dimensionality Reduction via Regression in Hyperspectral Imagery

This paper introduces a new unsupervised method for dimensionality reduction via regression (DRR). The algorithm belongs to the family of invertible transforms that generalize Principal Component Analysis (PCA) by using curvilinear instead of linear features. DRR identifies the nonlinear features through multivariate regression to ensure the reduction in redundancy between he PCA coefficients, the reduction of the variance of the scores, and the reduction in the reconstruction error. More importantly, unlike other nonlinear dimensionality reduction methods, the invertibility, volume-preservation, and straightforward out-of-sample extension, makes DRR interpretable and easy to apply. The properties of DRR enable learning a more broader class of data manifolds than the recently proposed Non-linear Principal Components Analysis (NLPCA) and Principal Polynomial Analysis (PPA). We illustrate the performance of the representation in reducing the dimensionality of remote sensing data. In particular, we tackle two common problems: processing very high dimensional spectral information such as in hyperspectral image sounding data, and dealing with spatial-spectral image patches of multispectral images. Both settings pose collinearity and ill-determination problems. Evaluation of the expressive power of the features is assessed in terms of truncation error, estimating atmospheric variables, and surface land cover classification error. Results show that DRR outperforms linear PCA and recently proposed invertible extensions based on neural networks (NLPCA) and univariate regressions (PPA).

preprint2016arXiv

Sequential Principal Curves Analysis

This work includes all the technical details of the Sequential Principal Curves Analysis (SPCA) in a single document. SPCA is an unsupervised nonlinear and invertible feature extraction technique. The identified curvilinear features can be interpreted as a set of nonlinear sensors: the response of each sensor is the projection onto the corresponding feature. Moreover, it can be easily tuned for different optimization criteria; e.g. infomax, error minimization, decorrelation; by choosing the right way to measure distances along each curvilinear feature. Even though proposed in [Laparra et al. Neural Comp. 12] and shown to work in multiple modalities in [Laparra and Malo Frontiers Hum. Neuro. 15], the SPCA framework has its original roots in the nonlinear ICA algorithm in [Malo and Gutierrez Network 06]. Later on, the SPCA philosophy for nonlinear generalization of PCA originated substantially faster alternatives at the cost of introducing different constraints in the model. Namely, the Principal Polynomial Analysis (PPA) [Laparra et al. IJNS 14], and the Dimensionality Reduction via Regression (DRR) [Laparra et al. IEEE TGRS 15]. This report illustrates the reasons why we developed such family and is the appropriate technical companion for the missing details in [Laparra et al., NeCo 12, Laparra and Malo, Front.Hum.Neuro. 15]. See also the data, code and examples in the dedicated sites http://isp.uv.es/spca.html and http://isp.uv.es/after effects.html

Institution

Affiliation not imported yet

This author record came from a source that does not expose affiliation metadata. Once the author claims the profile or we enrich the record from another provider, this section will link to the concrete institution.

Topic footprint