Source author record

Fernando Pereira

Fernando Pereira appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation and Language Machine Learning eess.IV Multimedia Computer Vision Artificial Intelligence cmp-lg math.OC Neural and Evolutionary Computing Programming Languages

Catalog footprint

What is connected

19works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

Large Language Models (LLMs) have emerged as powerful operators for evolutionary search, yet the design of efficient search scaffolds remains ad hoc. While promising, current LLM-in-the-loop systems lack a systematic approach to managing the evolutionary process. We identify three distinct failure modes: Context Pollution, where experiment history biases future candidate generation; Mode Collapse, where agents stagnate in local minima due to poor exploration-exploitation balance; and Weak Collaboration, where rigid crossover strategies fail to leverage parallel search trajectories effectively. We introduce Progress-Aware Consistent Evolution (PACEvolve), a framework designed to robustly govern the agent's context and search dynamics, to address these challenges. PACEvolve combines hierarchical context management (HCM) with pruning to address context pollution; momentum-based backtracking (MBB) to escape local minima; and a self-adaptive sampling policy that unifies backtracking and crossover for dynamic search coordination (CE), allowing agents to balance internal refinement with cross-trajectory collaboration. We demonstrate that PACEvolve provides a systematic path to consistent, long-horizon self-improvement, achieving state-of-the-art results on LLM-SR and KernelBench, while discovering solutions surpassing the record on Modded NanoGPT.

preprint2026arXiv

The Efficiency Gap in Byte Modeling

Modern language models have historically relied on two dominant design choices: subword tokenization and autoregressive (AR) ordering. These design decisions bake in priors that dictate a model's learning. Recently, two alternative paradigms have challenged this: byte-level modeling, which bypasses static statistically-derived token vocabularies, and masked diffusion modeling (MDM), which conducts parallel, non-sequential generation. Their intersection represents a fully end-to-end modality-agnostic generative prototype; however, removing these structural priors incurs a significant computational cost. In this work, we investigate this cost through a compute-matched scaling study. Our results reveal that the performance penalty of byte modeling is not uniform; across scale, the scaling overhead of byte modeling is worse for MDM than for AR. We hypothesize that this disparity stems from context fragility: while AR's stable causal history allows models to naturally rediscover subword patterns, the MDM objective destroys the local contiguity required to efficiently resolve semantics from raw bytes. Our findings from controlled permutation experiments suggest that future modality-agnostic designs must incorporate alternative structural biases to maintain viable scaling trajectories in the byte regime.

preprint2022arXiv

IT/IST/IPLeiria Response to the Call for Proposals on JPEG Pleno Point Cloud Coding

This document describes a deep learning-based point cloud geometry codec and a deep learning-based point cloud joint geometry and colour codec, submitted to the Call for Proposals on JPEG Pleno Point Cloud Coding issued in January 2022. The proposed codecs are based on recent developments in deep learning-based PC geometry coding and offer some of the key functionalities targeted by the Call for Proposals. The proposed geometry codec offers a compression efficiency that outperforms the MPEG G-PCC standard and outperforms or is competitive with the V-PCC Intra standard for the JPEG Call for Proposals test set; however, the same does not happen for the joint geometry and colour codec due to a quality saturation effect that needs to be overcome.

preprint2021arXiv

A Point-to-Distribution Joint Geometry and Color Metric for Point Cloud Quality Assessment

Point clouds (PCs) are a powerful 3D visual representation paradigm for many emerging application domains, especially virtual and augmented reality, and autonomous vehicles. However, the large amount of PC data required for highly immersive and realistic experiences requires the availability of efficient, lossy PC coding solutions are critical. Recently, two MPEG PC coding standards have been developed to address the relevant application requirements and further developments are expected in the future. In this context, the assessment of PC quality, notably for decoded PCs, is critical and asks for the design of efficient objective PC quality metrics. In this paper, a novel point-to-distribution metric is proposed for PC quality assessment considering both the geometry and texture. This new quality metric exploits the scale-invariance property of the Mahalanobis distance to assess first the geometry and color point-to-distribution distortions, which are after fused to obtain a joint geometry and color quality metric. The proposed quality metric significantly outperforms the best PC quality assessment metrics in the literature.

preprint2021arXiv

CapsField: Light Field-based Face and Expression Recognition in the Wild using Capsule Routing

Light field (LF) cameras provide rich spatio-angular visual representations by sensing the visual scene from multiple perspectives and have recently emerged as a promising technology to boost the performance of human-machine systems such as biometrics and affective computing. Despite the significant success of LF representation for constrained facial image analysis, this technology has never been used for face and expression recognition in the wild. In this context, this paper proposes a new deep face and expression recognition solution, called CapsField, based on a convolutional neural network and an additional capsule network that utilizes dynamic routing to learn hierarchical relations between capsules. CapsField extracts the spatial features from facial images and learns the angular part-whole relations for a selected set of 2D sub-aperture images rendered from each LF image. To analyze the performance of the proposed solution in the wild, the first in the wild LF face dataset, along with a new complementary constrained face dataset captured from the same subjects recorded earlier have been captured and are made available. A subset of the in the wild dataset contains facial images with different expressions, annotated for usage in the context of face expression recognition tests. An extensive performance assessment study using the new datasets has been conducted for the proposed and relevant prior solutions, showing that the CapsField proposed solution achieves superior performance for both face and expression recognition tasks when compared to the state-of-the-art.

preprint2021arXiv

Faithful Embeddings for Knowledge Base Queries

The deductive closure of an ideal knowledge base (KB) contains exactly the logical queries that the KB can answer. However, in practice KBs are both incomplete and over-specified, failing to answer some queries that have real-world answers. \emph{Query embedding} (QE) techniques have been recently proposed where KB entities and KB queries are represented jointly in an embedding space, supporting relaxation and generalization in KB inference. However, experiments in this paper show that QE systems may disagree with deductive reasoning on answers that do not require generalization or relaxation. We address this problem with a novel QE method that is more faithful to deductive reasoning, and show that this leads to better performance on complex queries to incomplete KBs. Finally we show that inserting this new QE module into a neural question-answering system leads to substantial improvements over the state-of-the-art.

preprint2020arXiv

A generalized Hausdorff distance based quality metric for point cloud geometry

Reliable quality assessment of decoded point cloud geometry is essential to evaluate the compression performance of emerging point cloud coding solutions and guarantee some target quality of experience. This paper proposes a novel point cloud geometry quality assessment metric based on a generalization of the Hausdorff distance. To achieve this goal, the so-called generalized Hausdorff distance for multiple rankings is exploited to identify the best performing quality metric in terms of correlation with the MOS scores obtained from a subjective test campaign. The experimental results show that the quality metric derived from the classical Hausdorff distance leads to low objective-subjective correlation and, thus, fails to accurately evaluate the quality of decoded point clouds for emerging codecs. However, the quality metric derived from the generalized Hausdorff distance with an appropriately selected ranking, outperforms the MPEG adopted geometry quality metrics when decoded point clouds with different types of coding distortions are considered.

preprint2020arXiv

Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating

In recent years, visual sensors have been quickly improving, notably targeting richer acquisitions of the light present in a visual scene. In this context, the so-called lenslet light field (LLF) cameras are able to go beyond the conventional 2D visual acquisition models, by enriching the visual representation with directional light measures for each pixel position. LLF imaging is associated to large amounts of data, thus critically demanding efficient coding solutions in order applications involving transmission and storage may be deployed. For this reason, considerable research efforts have been invested in recent years in developing increasingly efficient LLF imaging coding (LLFIC) solutions. In this context, the main objective of this paper is to review and evaluate some of the most relevant LLFIC solutions in the literature, guided by a novel classification taxonomy, which allows better organizing this field. In this way, more solid conclusions can be drawn about the current LLFIC status quo, thus allowing to better drive future research and standardization developments in this technical area.

preprint2020arXiv

Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition

Long Short-Term Memory (LSTM) is a prominent recurrent neural network for extracting dependencies from sequential data such as time-series and multi-view data, having achieved impressive results for different visual recognition tasks. A conventional LSTM network can learn a model to posteriorly extract information from one input sequence. However, if two or more dependent sequences of data are simultaneously acquired, the conventional LSTM networks may only process those sequences consecutively, not taking benefit of the information carried out by their mutual dependencies. In this context, this paper proposes two novel LSTM cell architectures that are able to jointly learn from multiple sequences simultaneously acquired, targeting to create richer and more effective models for recognition tasks. The efficacy of the novel LSTM cell architectures is assessed by integrating them into deep learning-based methods for face recognition with multi-view, light field images. The new cell architectures jointly learn the scene horizontal and vertical parallaxes available in a light field image, to capture richer spatio-angular information from both directions. A comprehensive evaluation, with the IST-EURECOM LFFD dataset using three challenging evaluation protocols, shows the advantage of using the novel LSTM cell architectures for face recognition over the state-of-the-art light field-based methods. These results highlight the added value of the novel cell architectures when learning from correlated input sequences.

preprint2016arXiv

Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix Language Model

We describe Sparse Non-negative Matrix (SNM) language model estimation using multinomial loss on held-out data. Being able to train on held-out data is important in practical situations where the training data is usually mismatched from the held-out/test data. It is also less constrained than the previous training algorithm using leave-one-out on training data: it allows the use of richer meta-features in the adjustment model, e.g. the diversity counts used by Kneser-Ney smoothing which would be difficult to deal with correctly in leave-one-out training. In experiments on the one billion words language modeling benchmark, we are able to slightly improve on our previous results which use a different loss function, and employ leave-one-out training on a subset of the main training set. Surprisingly, an adjustment model with meta-features that discard all lexical information can perform as well as lexicalized meta-features. We find that fairly small amounts of held-out data (on the order of 30-70 thousand words) are sufficient for training the adjustment model. In a real-life scenario where the training data is a mix of data sources that are imbalanced in size, and of different degrees of relevance to the held-out and test data, taking into account the data source for a given skip-/n-gram feature and combining them for best performance on held-out/test data improves over skip-/n-gram SNM models trained on pooled data by about 8% in the SMT setup, or as much as 15% in the ASR/IME setup. The ability to mix various data sources based on how relevant they are to a mismatched held-out set is probably the most attractive feature of the new estimation method for SNM LM.

preprint2016arXiv

Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

In a typical video rate allocation problem, the objective is to optimally distribute a source rate budget among a set of (in)dependently coded data units to minimize the total distortion of all units. Conventional Lagrangian approaches convert the lone rate constraint to a linear rate penalty scaled by a multiplier in the objective, resulting in a simpler unconstrained formulation. However, the search for the "optimal" multiplier, one that results in a distortion-minimizing solution among all Lagrangian solutions that satisfy the original rate constraint, remains an elusive open problem in the general setting. To address this problem, we propose a computation-efficient search strategy to identify this optimal multiplier numerically. Specifically, we first formulate a general rate allocation problem where each data unit can be dependently coded at different quantization parameters (QP) using a previous unit as predictor, or left uncoded at the encoder and subsequently interpolated at the decoder using neighboring coded units. After converting the original rate constrained problem to the unconstrained Lagrangian counterpart, we design an efficient dynamic programming (DP) algorithm that finds the optimal Lagrangian solution for a fixed multiplier. Finally, within the DP framework, we iteratively compute neighboring singular multiplier values, each resulting in multiple simultaneously optimal Lagrangian solutions, to drive the rates of the computed Lagrangian solutions towards the bit budget. We terminate when a singular multiplier value results in two Lagrangian solutions with rates below and above the bit budget. In extensive monoview and multiview video coding experiments, we show that our DP algorithm and selection of optimal multipliers on average outperform comparable rate control solutions used in video compression standards such as HEVC that do not skip frames in Y-PSNR.

preprint2016arXiv

Optimal Rendezvous Trajectory for Unmanned Aerial-Ground Vehicles

Fixed-wind unmanned aerial vehicles (UAVs) are essential for low cost aerial surveillance and mapping applications in remote regions. One of the main limitations of UAVs is limited fuel capacity and hence requires periodic refueling to accomplish a mission. The usual mechanism of commanding the UAV to return to a stationary base station for refueling can result in fuel wastage and inefficient mission operation time. Alternatively, unmanned gound vehicle (UGV) can be used as a mobile refueling unit where the UAV will rendezvous with the UGV for refueling. In order to accurately perform this task in the presence of wind disturbances, we need to determine an optimal trajectory in 3D taking UAV and UGV dynamics and kinematics into account. In this paper, we propose an optimal control formulation to generate a tunable UAV trajectory for rendezvous on a moving UGV taking wind disturbances into account. By a suitable choice of the value of an aggressiveness index in our problem setting, we are able to control the UAV rendezvous behavior. Several numerical results are presented to show the reliability and effectiveness of our approach.

preprint2015arXiv

Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming

We consider an interactive multiview video streaming (IMVS) system where clients select their preferred viewpoint in a given navigation window. To provide high quality IMVS, many high quality views should be transmitted to the clients. However, this is not always possible due to the limited and heterogeneous capabilities of the clients. In this paper, we propose a novel adaptive IMVS solution based on a layered multiview representation where camera views are organized into layered subsets to match the different clients constraints. We formulate an optimization problem for the joint selection of the views subsets and their encoding rates. Then, we propose an optimal and a reduced computational complexity greedy algorithms, both based on dynamic-programming. Simulation results show the good performance of our novel algorithms compared to a baseline algorithm, proving that an effective IMVS adaptive solution should consider the scene content and the client capabilities and their preferences in navigation.

preprint2014arXiv

Controlling Complexity in Part-of-Speech Induction

We consider the problem of fully unsupervised learning of grammatical (part-of-speech) categories from unlabeled text. The standard maximum-likelihood hidden Markov model for this task performs poorly, because of its weak inductive bias and large model capacity. We address this problem by refining the model and modifying the learning objective to control its capacity via para- metric and non-parametric constraints. Our approach enforces word-category association sparsity, adds morphological and orthographic features, and eliminates hard-to-estimate parameters for rare words. We develop an efficient learning algorithm that is not much more computationally intensive than standard training. We also provide an open-source implementation of the algorithm. Our experiments on five diverse languages (Bulgarian, Danish, English, Portuguese, Spanish) achieve significant improvements compared with previous methods for the same task.

preprint2014arXiv

Parameterized Construction of Program Representations for Sparse Dataflow Analyses

Data-flow analyses usually associate information with control flow regions. Informally, if these regions are too small, like a point between two consecutive statements, we call the analysis dense. On the other hand, if these regions include many such points, then we call it sparse. This paper presents a systematic method to build program representations that support sparse analyses. To pave the way to this framework we clarify the bibliography about well-known intermediate program representations. We show that our approach, up to parameter choice, subsumes many of these representations, such as the SSA, SSI and e-SSA forms. In particular, our algorithms are faster, simpler and more frugal than the previous techniques used to construct SSI - Static Single Information - form programs. We produce intermediate representations isomorphic to Choi et al.'s Sparse Evaluation Graphs (SEG) for the family of data-flow problems that can be partitioned per variables. However, contrary to SEGs, we can handle - sparsely - problems that are not in this family.

preprint2013arXiv

Large Scale Distributed Acoustic Modeling With Back-off N-grams

The paper revives an older approach to acoustic modeling that borrows from n-gram language modeling in an attempt to scale up both the amount of training data and model size (as measured by the number of parameters in the model), to approximately 100 times larger than current sizes used in automatic speech recognition. In such a data-rich setting, we can expand the phonetic context significantly beyond triphones, as well as increase the number of Gaussian mixture components for the context-dependent states that allow it. We have experimented with contexts that span seven or more context-independent phones, and up to 620 mixture components per state. Dealing with unseen phonetic contexts is accomplished using the familiar back-off technique used in language modeling due to implementation simplicity. The back-off acoustic model is estimated, stored and served using MapReduce distributed computing infrastructure. Speech recognition experiments are carried out in an N-best list rescoring framework for Google Voice Search. Training big models on large amounts of data proves to be an effective way to increase the accuracy of a state-of-the-art automatic speech recognition system. We use 87,000 hours of training data (speech along with transcription) obtained by filtering utterances in Voice Search logs on automatic speech recognition confidence. Models ranging in size between 20--40 million Gaussians are estimated using maximum likelihood training. They achieve relative reductions in word-error-rate of 11% and 6% when combined with first-pass models trained using maximum likelihood, and boosted maximum mutual information, respectively. Increasing the context size beyond five phones (quinphones) does not help.

preprint2012arXiv

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets.

preprint2012arXiv

Case-Factor Diagrams for Structured Probabilistic Modeling

We introduce a probabilistic formalism subsuming Markov random fields of bounded tree width and probabilistic context free grammars. Our models are based on a representation of Boolean formulas that we call case-factor diagrams (CFDs). CFDs are similar to binary decision diagrams (BDDs) but are concise for circuits of bounded tree width (unlike BDDs) and can concisely represent the set of parse trees over a given string undera given context free grammar (also unlike BDDs). A probabilistic model consists of aCFD defining a feasible set of Boolean assignments and a weight (or cost) for each individual Boolean variable. We give an insideoutside algorithm for simultaneously computing the marginal of each Boolean variable, and a Viterbi algorithm for finding the mininum cost variable assignment. Both algorithms run in time proportional to the size of the CFD.

preprint1995arXiv

Quantifiers, Anaphora, and Intensionality

The relationship between Lexical-Functional Grammar (LFG) {\em functional structures} (f-structures) for sentences and their semantic interpretations can be expressed directly in a fragment of linear logic in a way that correctly explains the constrained interactions between quantifier scope ambiguity, bound anaphora and intensionality. This deductive approach to semantic interpretaion obviates the need for additional mechanisms, such as Cooper storage, to represent the possible scopes of a quantified NP, and explains the interactions between quantified NPs, anaphora and intensional verbs such as `seek'. A single specification in linear logic of the argument requirements of intensional verbs is sufficient to derive the correct reading predictions for intensional-verb clauses both with nonquantified and with quantified direct objects. In particular, both de dicto and de re readings are derived for quantified objects. The effects of type-raising or quantifying-in rules in other frameworks here just follow as linear-logic theorems. While our approach resembles current categorial approaches in important ways, it differs from them in allowing the greater type flexibility of categorial semantics while maintaining a precise connection to syntax. As a result, we are able to provide derivations for certain readings of sentences with intensional verbs and complex direct objects that are not derivable in current purely categorial accounts of the syntax-semantics interface.

Fernando Pereira

What is connected

Connect this record

See the researcher in context

Building this map preview

19 published item(s)

PACEvolve: Enabling Long-Horizon Progress-Aware Consistent Evolution

The Efficiency Gap in Byte Modeling

IT/IST/IPLeiria Response to the Call for Proposals on JPEG Pleno Point Cloud Coding

A Point-to-Distribution Joint Geometry and Color Metric for Point Cloud Quality Assessment

CapsField: Light Field-based Face and Expression Recognition in the Wild using Capsule Routing

Faithful Embeddings for Knowledge Base Queries

A generalized Hausdorff distance based quality metric for point cloud geometry

Lenslet Light Field Image Coding: Classifying, Reviewing and Evaluating

Long Short-Term Memory with Gate and State Level Fusion for Light Field-Based Face Recognition

Multinomial Loss on Held-out Data for the Sparse Non-negative Matrix Language Model

Optimal Lagrange Multipliers for Dependent Rate Allocation in Video Coding

Optimal Rendezvous Trajectory for Unmanned Aerial-Ground Vehicles

Optimal Layered Representation for Adaptive Interactive Multiview Video Streaming

Controlling Complexity in Part-of-Speech Induction

Parameterized Construction of Program Representations for Sparse Dataflow Analyses

Large Scale Distributed Acoustic Modeling With Back-off N-grams

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

Case-Factor Diagrams for Structured Probabilistic Modeling

Quantifiers, Anaphora, and Intensionality