Researcher profile

Fernando Pereira

Fernando Pereira contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
9works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

9 published item(s)

preprint2026arXiv

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure modes: Context Pollution, where experiment history biases future candidate generation; Mode Collapse, where agents stagnate in local minima due to poor exploration-exploitation balance; and Weak Collaboration, where rigid crossover strategies fail to leverage parallel search trajectories effectively. We introduce Progress-Aware Consistent Evolution (PACEvolve), a framework designed to robustly govern the agent's context and search dynamics, to address these challenges. PACEvolve combines hierarchical context management (HCM) with pruning to address context pollution; momentum-based backtracking (MBB) to escape local minima; and a self-adaptive sampling policy that unifies backtracking and crossover for dynamic search coordination (CE), allowing agents to balance internal refinement with cross-trajectory collaboration. We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement, achieving state-of-the-art results on LLM-SR and KernelBench, while discovering solutions surpassing the record on Modded NanoGPT.

preprint2026arXiv

The Efficiency Gap in Byte Modeling

Modern language models have historically relied on two dominant design choices: subword tokenization and autoregressive (AR) ordering. These design decisions bake in priors that dictate a model's learning. Recently, two alternative paradigms have challenged this: byte-level modeling, which bypasses static statistically-derived token vocabularies, and masked diffusion modeling (MDM), which conducts parallel, non-sequential generation. Their intersection represents a fully end-to-end modality-agnostic generative prototype; however, removing these structural priors incurs a significant computational cost. In this work, we investigate this cost through a compute-matched scaling study. Our results reveal that the performance penalty of byte modeling is not uniform; across scale, the scaling overhead of byte modeling is worse for MDM than for AR. We hypothesize that this disparity stems from context fragility: while AR's stable causal history allows models to naturally rediscover subword patterns, the MDM objective destroys the local contiguity required to efficiently resolve semantics from raw bytes. Our findings from controlled permutation experiments suggest that future modality-agnostic designs must incorporate alternative structural biases to maintain viable scaling trajectories in the byte regime.

preprint2022arXiv

IT/IST/IPLeiria Response to the Call for Proposals on JPEG Pleno Point Cloud Coding

This document describes a deep learning-based point cloud geometry codec and a deep learning-based point cloud joint geometry and colour codec, submitted to the Call for Proposals on JPEG Pleno Point Cloud Coding issued in January 2022. The proposed codecs are based on recent developments in deep learning-based PC geometry coding and offer some of the key functionalities targeted by the Call for Proposals. The proposed geometry codec offers a compression efficiency that outperforms the MPEG G-PCC standard and outperforms or is competitive with the V-PCC Intra standard for the JPEG Call for Proposals test set; however, the same does not happen for the joint geometry and colour codec due to a quality saturation effect that needs to be overcome.

preprint2021arXiv

A Point-to-Distribution Joint Geometry and Color Metric for Point Cloud Quality Assessment

Point clouds (PCs) are a powerful 3D visual representation paradigm for many emerging application domains, especially virtual and augmented reality, and autonomous vehicles. However, the large amount of PC data required for highly immersive and realistic experiences requires the availability of efficient, lossy PC coding solutions are critical. Recently, two MPEG PC coding standards have been developed to address the relevant application requirements and further developments are expected in the future. In this context, the assessment of PC quality, notably for decoded PCs, is critical and asks for the design of efficient objective PC quality metrics. In this paper, a novel point-to-distribution metric is proposed for PC quality assessment considering both the geometry and texture. This new quality metric exploits the scale-invariance property of the Mahalanobis distance to assess first the geometry and color point-to-distribution distortions, which are after fused to obtain a joint geometry and color quality metric. The proposed quality metric significantly outperforms the best PC quality assessment metrics in the literature.

preprint2021arXiv

CapsField: Light Field-based Face and Expression Recognition in the Wild using Capsule Routing

Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been used for face and expression recognition in the wild. In this context, this paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network and an additional capsule network that utilizes dynamic routing to learn hierarchical relations between capsules. CapsField extracts the spatial features from facial images and learns the angular part-whole relations for a selected set of 2D sub-aperture images rendered from each LF image. To analyze the performance of the proposed solution in the wild, the first in the wild LF face dataset, along with a new complementary constrained face dataset captured from the same subjects recorded earlier have been captured and are made available. A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests. An extensive performance assessment study using the new datasets has been conducted for the proposed and relevant prior solutions, showing that the CapsField proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art.

preprint2021arXiv

Faithful Embeddings for Knowledge Base Queries

The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art.

preprint2020arXiv

A generalized Hausdorff distance based quality metric for point cloud geometry

Reliable quality assessment of decoded point cloud geometry is essential to evaluate the compression performance of emerging point cloud coding solutions and guarantee some target quality of experience. This paper proposes a novel point cloud geometry quality assessment metric based on a generalization of the Hausdorff distance. To achieve this goal, the so-called generalized Hausdorff distance for multiple rankings is exploited to identify the best performing quality metric in terms of correlation with the MOS scores obtained from a subjective test campaign. The experimental results show that the quality metric derived from the classical Hausdorff distance leads to low objective-subjective correlation and, thus, fails to accurately evaluate the quality of decoded point clouds for emerging codecs. However, the quality metric derived from the generalized Hausdorff distance with an appropriately selected ranking, outperforms the MPEG adopted geometry quality metrics when decoded point clouds with different types of coding distortions are considered.

preprint2020arXiv

Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating

In recent years, visual sensors have been quickly improving, notably targeting richer acquisitions of the light present in a visual scene. In this context, the so-called lenslet light field (LLF) cameras are able to go beyond the conventional 2D visual acquisition models, by enriching the visual representation with directional light measures for each pixel position. LLF imaging is associated to large amounts of data, thus critically demanding efficient coding solutions in order applications involving transmission and storage may be deployed. For this reason, considerable research efforts have been invested in recent years in developing increasingly efficient LLF imaging coding (LLFIC) solutions. In this context, the main objective of this paper is to review and evaluate some of the most relevant LLFIC solutions in the literature, guided by a novel classification taxonomy, which allows better organizing this field. In this way, more solid conclusions can be drawn about the current LLFIC status quo, thus allowing to better drive future research and standardization developments in this technical area.

preprint2020arXiv

Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition

Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different visual recognition tasks. A conventional LSTM network can learn a model to posteriorly extract information from one input sequence. However, if two or more dependent sequences of data are simultaneously acquired, the conventional LSTM networks may only process those sequences consecutively, not taking benefit of the information carried out by their mutual dependencies. In this context, this paper proposes two novel LSTM cell architectures that are able to jointly learn from multiple sequences simultaneously acquired, targeting to create richer and more effective models for recognition tasks. The efficacy of the novel LSTM cell architectures is assessed by integrating them into deep learning-based methods for face recognition with multi-view, light field images. The new cell architectures jointly learn the scene horizontal and vertical parallaxes available in a light field image, to capture richer spatio-angular information from both directions. A comprehensive evaluation, with the IST-EURECOM LFFD dataset using three challenging evaluation protocols, shows the advantage of using the novel LSTM cell architectures for face recognition over the state-of-the-art light field-based methods. These results highlight the added value of the novel cell architectures when learning from correlated input sequences.