Researcher profile

Nuno Gonçalves

Nuno Gonçalves contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

What's Holding Back Latent Visual Reasoning?

Humans can approach complex visual problems by mentally simulating intermediate visual steps, rather than reasoning through language alone. Inspired by this, several works on Vision-Language Models have recently explored chain-of-thought reasoning with continuous latent tokens as intermediate visual imagination steps. In this work, we investigate how recent models leverage such latent tokens. Surprisingly, we find that model accuracy is unaffected when latent tokens are replaced by uninformative dummy tokens. This indicates that latent tokens play a minimal causal role in the model's final prediction. To better understand this phenomenon, we analyze both the training signal provided by oracle latent representations and the quality of the latent tokens generated at inference time. Our experiments reveal two crucial issues holding back latent visual reasoning: First, in most existing datasets, oracle latent tokens provide limited additional information beyond the original image and do not substantially simplify the task, leading models to ignore them during training and effectively bypassing them at inference time. When fine-tuned on a diagnostic dataset, in which latent tokens provide sufficient support for the final prediction, we show that models can causally rely on them. Second, the latent tokens produced at inference time deviate from their corresponding oracle representations, collapsing to a narrow region and preventing benefits even when the model relies on them. Overall, our findings suggest that future progress in latent visual reasoning depends on two key pillars: high-quality datasets with informative intermediate steps and more precise latent token prediction.

preprint2023arXiv

Young Labeled Faces in the Wild (YLFW): A Dataset for Children Faces Recognition

Face recognition has achieved outstanding performance in the last decade with the development of deep learning techniques. Nowadays, the challenges in face recognition are related to specific scenarios, for instance, the performance under diverse image quality, the robustness for aging and edge cases of person age (children and elders), distinguishing of related identities. In this set of problems, recognizing children's faces is one of the most sensitive and important. One of the reasons for this problem is the existing bias towards adults in existing face datasets. In this work, we present a benchmark dataset for children's face recognition, which is compiled similarly to the famous face recognition benchmarks LFW, CALFW, CPLFW, XQLFW and AgeDB. We also present a development dataset (separated into train and test parts) for adapting face recognition models for face images of children. The proposed data is balanced for African, Asian, Caucasian, and Indian races. To the best of our knowledge, this is the first standartized data tool set for benchmarking and the largest collection for development for children's face recognition. Several face recognition experiments are presented to demonstrate the performance of the proposed data tool set.

preprint2022arXiv

MorDeephy: Face Morphing Detection Via Fused Classification

Face morphing attack detection (MAD) is one of the most challenging tasks in the field of face recognition nowadays. In this work, we introduce a novel deep learning strategy for a single image face morphing detection, which implies the discrimination of morphed face images along with a sophisticated face recognition task in a complex classification scheme. It is directed onto learning the deep facial features, which carry information about the authenticity of these features. Our work also introduces several additional contributions: the public and easy-to-use face morphing detection benchmark and the results of our wild datasets filtering strategy. Our method, which we call MorDeephy, achieved the state of the art performance and demonstrated a prominent ability for generalising the task of morphing detection to unseen scenarios.

preprint2022arXiv

Reducing Overconfidence Predictions for Autonomous Driving Perception

In state-of-the-art deep learning for object recognition, SoftMax and Sigmoid functions are most commonly employed as the predictor outputs. Such layers often produce overconfident predictions rather than proper probabilistic scores, which can thus harm the decision-making of `critical' perception systems applied in autonomous driving and robotics. Given this, the experiments in this work propose a probabilistic approach based on distributions calculated out of the Logit layer scores of pre-trained networks. We demonstrate that Maximum Likelihood (ML) and Maximum a-Posteriori (MAP) functions are more suitable for probabilistic interpretations than SoftMax and Sigmoid-based predictions for object recognition. We explore distinct sensor modalities via RGB images and LiDARs (RV: range-view) data from the KITTI and Lyft Level-5 datasets, where our approach shows promising performance compared to the usual SoftMax and Sigmoid layers, with the benefit of enabling interpretable probabilistic predictions. Another advantage of the approach introduced in this paper is that the ML and MAP functions can be implemented in existing trained networks, that is, the approach benefits from the output of the Logit layer of pre-trained networks. Thus, there is no need to carry out a new training phase since the ML and MAP functions are used in the test/prediction phase.

preprint2022arXiv

Stepwise Migration of a Monolith to a Microservices Architecture: Performance and Migration Effort Evaluation

The agility inherent to today's business promotes the definition of software architectures where the business entities are decoupled into modules and/or services. However, there are advantages in having a rich domain model, where domain entities are tightly connected, because it fosters an initial quick development. On the other hand, the split of the business logic into modules and/or services, its encapsulation through well-defined interfaces and the introduction of inter-service communication introduces a cost in terms of performance. In this paper we analyze the stepwise migrating of a monolith, using a rich domain object, into a microservice architecture, where a modular monolith architecture is used as an intermediate step. The impact on the migration effort and on performance is measured for both steps. Current state of the art analyses the migration of monolith systems to a microservices architecture, but we observed that migration effort and performance issues are already significant in the migration to a modular monolith. Therefore, a clear distinction is established for each one of the steps, which may inform software architects on the planning of the migration of monolith systems. In particular, the trade-offs of doing all the migration process or just migrating to a modular monolith.