Source author record

Nuno Gonçalves

Nuno Gonçalves appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Artificial Intelligence Machine Learning Computation and Language hep-ph hep-th physics.comp-ph Software Engineering

Catalog footprint

What is connected

6works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

What's Holding Back Latent Visual Reasoning?

Humans can approach complex visual problems by mentally simulating intermediate visual steps, rather than reasoning through language alone. Inspired by this, several works on Vision-Language Models have recently explored chain-of-thought reasoning with continuous latent tokens as intermediate visual imagination steps. In this work, we investigate how recent models leverage such latent tokens. Surprisingly, we find that model accuracy is unaffected when latent tokens are replaced by uninformative dummy tokens. This indicates that latent tokens play a minimal causal role in the model's final prediction. To better understand this phenomenon, we analyze both the training signal provided by oracle latent representations and the quality of the latent tokens generated at inference time. Our experiments reveal two crucial issues holding back latent visual reasoning: First, in most existing datasets, oracle latent tokens provide limited additional information beyond the original image and do not substantially simplify the task, leading models to ignore them during training and effectively bypassing them at inference time. When fine-tuned on a diagnostic dataset, in which latent tokens provide sufficient support for the final prediction, we show that models can causally rely on them. Second, the latent tokens produced at inference time deviate from their corresponding oracle representations, collapsing to a narrow region and preventing benefits even when the model relies on them. Overall, our findings suggest that future progress in latent visual reasoning depends on two key pillars: high-quality datasets with informative intermediate steps and more precise latent token prediction.

preprint2023arXiv

Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces Recognition

Face recognition has achieved outstanding performance in the last decade with the development of deep learning techniques. Nowadays, the challenges in face recognition are related to specific scenarios, for instance, the performance under diverse image quality, the robustness for aging and edge cases of person age (children and elders), distinguishing of related identities. In this set of problems, recognizing children's faces is one of the most sensitive and important. One of the reasons for this problem is the existing bias towards adults in existing face datasets. In this work, we present a benchmark dataset for children's face recognition, which is compiled similarly to the famous face recognition benchmarks LFW, CALFW, CPLFW, XQLFW and AgeDB. We also present a development dataset (separated into train and test parts) for adapting face recognition models for face images of children. The proposed data is balanced for African, Asian, Caucasian, and Indian races. To the best of our knowledge, this is the first standartized data tool set for benchmarking and the largest collection for development for children's face recognition. Several face recognition experiments are presented to demonstrate the performance of the proposed data tool set.

preprint2022arXiv

MorDeephy: Face Morphing Detection Via Fused Classification

Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a sophisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these features. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a prominent ability for generalising the task of morphing detection to unseen scenarios.

preprint2022arXiv

Reducing Overconfidence Predictions for Autonomous Driving Perception

In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.

preprint2022arXiv

Stepwise Migration of a Monolith to a Microservices Architecture: Performance and Migration Effort Evaluation

The agility inherent to today's business promotes the definition of software architectures where the business entities are decoupled into modules and/or services. However, there are advantages in having a rich domain model, where domain entities are tightly connected, because it fosters an initial quick development. On the other hand, the split of the business logic into modules and/or services, its encapsulation through well-defined interfaces and the introduction of inter-service communication introduces a cost in terms of performance. In this paper we analyze the stepwise migrating of a monolith, using a rich domain object, into a microservice architecture, where a modular monolith architecture is used as an intermediate step. The impact on the migration effort and on performance is measured for both steps. Current state of the art analyses the migration of monolith systems to a microservices architecture, but we observed that migration effort and performance issues are already significant in the migration to a modular monolith. Therefore, a clear distinction is established for each one of the steps, which may inform software architects on the planning of the migration of monolith systems. In particular, the trade-offs of doing all the migration process or just migrating to a modular monolith.

preprint2015arXiv

SOSpin, a C++ library for Yukawa decomposition in SO(2N) models

We present in this paper the SOSpin library, which calculates an analytic decomposition of the Yukawa interactions invariant under any SO(2N) group in terms of an SU(N) basis. We make use of the oscillator expansion formalism, where the SO(2N) spinor representations are expressed in terms of creation and annihilation operators of a Grassmann algebra. These noncommutative operators and their products are simulated in SOSpin through the implementation of doubly-linked-list data structures. These data structures were determinant to achieve a higher performance in the simplification of large products of creation and annihilation operators. We illustrate the use of our library with complete examples of how to decompose Yukawa terms invariant under SO(2N) in terms of SU(N) degrees of freedom for N=2 and 5. We further demonstrate, with an example for SO(4), that higher dimensional field-operator terms can also be processed with our library. Finally, we describe the functions available in SOSpin that are made to simplify the writing of spinors and their interactions specifically for SO(10) models.

Nuno Gonçalves

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

What's Holding Back Latent Visual Reasoning?

Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces Recognition

MorDeephy: Face Morphing Detection Via Fused Classification

Reducing Overconfidence Predictions for Autonomous Driving Perception

Stepwise Migration of a Monolith to a Microservices Architecture: Performance and Migration Effort Evaluation

SOSpin, a C++ library for Yukawa decomposition in SO(2N) models