Researcher profile

Eduardo García-Portugués

Eduardo García-Portugués contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
8topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2020arXiv

A generative angular model of protein structure evolution

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modelled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modelling both "smooth" conformational changes and "catastrophic" conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence-structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof.

preprint2020arXiv

A goodness-of-fit test for the functional linear model with scalar response

In this work, a goodness-of-fit test for the null hypothesis of a functional linear model with scalar response is proposed. The test is based on a generalization to the functional framework of a previous one, designed for the goodness-of-fit of regression models with multivariate covariates using random projections. The test statistic is easy to compute using geometrical and matrix arguments, and simple to calibrate in its distribution by a wild bootstrap on the residuals. The finite sample properties of the test are illustrated by a simulation study for several types of basis and under different alternatives. Finally, the test is applied to two datasets for checking the assumption of the functional linear model and a graphical tool is introduced. Supplementary materials are available online.

preprint2020arXiv

A test for directional-linear independence, with applications to wildfire orientation and size

The relation between wildfire orientation and size is analyzed by means of a nonparametric test for directional-linear independence. The test statistic is designed for assessing the independence between two random variables of different nature, specifically directional (fire orientation, circular or spherical, as particular cases) and linear (fire size measured as burnt area, scalar), based on a directional-linear nonparametric kernel density estimator. In order to apply the proposed methodology in practice, a resampling procedure based on permutations and bootstrap is provided. The finite sample performance of the test is assessed by a simulation study, comparing its behavior with other classical tests for the circular-linear case. Finally, the test is applied to analyze wildfire data from Portugal.

preprint2020arXiv

Bootstrap independence test for functional linear models

Functional data have been the subject of many research works over the last years. Functional regression is one of the most discussed issues. Specifically, significant advances have been made for functional linear regression models with scalar response. Let $(\mathcal{H},<\cdot,\cdot>)$ be a separable Hilbert space. We focus on the model $Y=<Θ,X>+b+\varepsilon$, where $Y$ and $\varepsilon$ are real random variables, $X$ is an $\mathcal{H}$-valued random element, and the model parameters $b$ and $Θ$ are in $\mathbb{R}$ and $\mathcal{H}$, respectively. Furthermore, the error satisfies that $E(\varepsilon|X)=0$ and $E(\varepsilon^2|X)=σ^2<\infty$. A consistent bootstrap method to calibrate the distribution of statistics for testing $H_0: Θ=0$ versus $H_1: Θ\neq 0$ is developed. The asymptotic theory, as well as a simulation study and a real data application illustrating the usefulness of our proposed bootstrap in practice, is presented.

preprint2020arXiv

Central limit theorems for directional and linear random variables with applications

A central limit theorem for the integrated squared error of the directional-linear kernel density estimator is established. The result enables the construction and analysis of two testing procedures based on squared loss: a nonparametric independence test for directional and linear random variables and a goodness-of-fit test for parametric families of directional-linear densities. Limit distributions for both test statistics, and a consistent bootstrap strategy for the goodness-of-fit test, are developed for the directional-linear case and adapted to the directional-directional setting. Finite sample performance for the goodness-of-fit test is illustrated in a simulation study. This test is also applied to datasets from biology and environmental sciences.

preprint2020arXiv

Discounted optimal stopping of a Brownian bridge, with application to American options under pinning

Mathematically, the execution of an American-style financial derivative is commonly reduced to solving an optimal stopping problem. Breaking the general assumption that the knowledge of the holder is restricted to the price history of the underlying asset, we allow for the disclosure of future information about the terminal price of the asset by modeling it as a Brownian bridge. This model may be used under special market conditions, in particular we focus on what in the literature is known as the &#34;pinning effect&#34;, that is, when the price of the asset approaches the strike price of a highly-traded option close to its expiration date. Our main mathematical contribution is in characterizing the solution to the optimal stopping problem when the gain function includes the discount factor. We show how to numerically compute the solution and we analyze the effect of the volatility estimation on the strategy by computing the confidence curves around the optimal stopping boundary. Finally, we compare our method with the optimal exercise time based on a geometric Brownian motion by using real data exhibiting pinning.

preprint2020arXiv

Distance weighted discrimination of face images for gender classification

We illustrate the advantages of distance weighted discrimination for classification and feature extraction in a High Dimension Low Sample Size (HDLSS) situation. The HDLSS context is a gender classification problem of face images in which the dimension of the data is several orders of magnitude larger than the sample size. We compare distance weighted discrimination with Fisher&#39;s linear discriminant, support vector machines, and principal component analysis by exploring their classification interpretation through insightful visuanimations and by examining the classifiers&#39; discriminant errors. This analysis enables us to make new contributions to the understanding of the drivers of human discrimination between males and females.

preprint2020arXiv

Exact risk improvement of bandwidth selectors for kernel density estimation with directional data

New bandwidth selectors for kernel density estimation with directional data are presented in this work. These selectors are based on asymptotic and exact error expressions for the kernel density estimator combined with mixtures of von Mises distributions. The performance of the proposed selectors is investigated in a simulation study and compared with other existing rules for a large variety of directional scenarios, sample sizes and dimensions. The selector based on the exact error expression turns out to have the best behaviour of the studied selectors for almost all the situations. This selector is illustrated with real data for the circular and spherical cases.

preprint2020arXiv

Exploring wind direction and SO2 concentration by circular-linear density estimation

The study of environmental problems usually requires the description of variables with different nature and the assessment of relations between them. In this work, an algorithm for flexible estimation of the joint density for a circular-linear variable is proposed. The method is applied for exploring the relation between wind direction and SO2 concentration in a monitoring station close to a power plant located in Galicia (NW-Spain), in order to compare the effectiveness of precautionary measures for pollutants reduction in two different years.

preprint2020arXiv

Goodness-of-fit tests for functional linear models based on integrated projections

Functional linear models are one of the most fundamental tools to assess the relation between two random variables of a functional or scalar nature. This contribution proposes a goodness-of-fit test for the functional linear model with functional response that neatly adapts to functional/scalar responses/predictors. In particular, the new goodness-of-fit test extends a previous proposal for scalar response. The test statistic is based on a convenient regularized estimator, is easy to compute, and is calibrated through an efficient bootstrap resampling. A graphical diagnostic tool, useful to visualize the deviations from the model, is introduced and illustrated with a novel data application. The R package goffda implements the proposed methods and allows for the reproducibility of the data application.

preprint2020arXiv

Kernel density estimation for directional-linear data

A nonparametric kernel density estimator for directional-linear data is introduced. The proposal is based on a product kernel accounting for the different nature of both (directional and linear) components of the random vector. Expressions for bias, variance and Mean Integrated Squared Error (MISE) are derived, jointly with an asymptotic normality result for the proposed estimator. For some particular distributions, an explicit formula for the MISE is obtained and compared with its asymptotic version, both for directional and directional-linear kernel density estimators. In this same setting a closed expression for the bootstrap MISE is also derived.

preprint2020arXiv

Langevin diffusions on the torus: estimation and applications

We introduce stochastic models for continuous-time evolution of angles and develop their estimation. We focus on studying Langevin diffusions with stationary distributions equal to well-known distributions from directional statistics, since such diffusions can be regarded as toroidal analogues of the Ornstein-Uhlenbeck process. Their likelihood function is a product of transition densities with no analytical expression, but that can be calculated by solving the Fokker-Planck equation numerically through adequate schemes. We propose three approximate likelihoods that are computationally tractable: (i) a likelihood based on the stationary distribution; (ii) toroidal adaptations of the Euler and Shoji-Ozaki pseudo-likelihoods; (iii) a likelihood based on a specific approximation to the transition density of the wrapped normal process. A simulation study compares, in dimensions one and two, the approximate transition densities to the exact ones, and investigates the empirical performance of the approximate likelihoods. Finally, two diffusions are used to model the evolution of the backbone angles of the protein G (PDB identifier 1GB1) during a molecular dynamics simulation. The software package sdetorus implements the estimation methods and applications presented in the paper.

preprint2020arXiv

On a projection-based class of uniformity tests on the hypersphere

We propose a projection-based class of uniformity tests on the hypersphere using statistics that integrate, along all possible directions, the weighted quadratic discrepancy between the empirical cumulative distribution function of the projected data and the projected uniform distribution. Simple expressions for several test statistics are obtained for the circle and sphere, and relatively tractable forms for higher dimensions. Despite its different origin, the proposed class is shown to be related with the well-studied Sobolev class of uniformity tests. Our new class proves itself advantageous by allowing to derive new tests for hyperspherical data that neatly extend the circular tests by Watson, Ajne, and Rothman, and by introducing the first instance of an Anderson-Darling-like test for such data. The asymptotic distributions and the local optimality against certain alternatives of the new tests are obtained. A simulation study evaluates the theoretical findings and evidences that, for certain scenarios, the new tests are competitive against previous proposals. The new tests are employed in three astronomical applications.

preprint2020arXiv

Recent advances in directional statistics

Mainstream statistical methodology is generally applicable to data observed in Euclidean space. There are, however, numerous contexts of considerable scientific interest in which the natural supports for the data under consideration are Riemannian manifolds like the unit circle, torus, sphere and their extensions. Typically, such data can be represented using one or more directions, and directional statistics is the branch of statistics that deals with their analysis. In this paper we provide a review of the many recent developments in the field since the publication of Mardia and Jupp (1999), still the most comprehensive text on directional statistics. Many of those developments have been stimulated by interesting applications in fields as diverse as astronomy, medicine, genetics, neurology, aeronautics, acoustics, image analysis, text mining, environmetrics, and machine learning. We begin by considering developments for the exploratory analysis of directional data before progressing to distributional models, general approaches to inference, hypothesis testing, regression, nonparametric curve estimation, methods for dimension reduction, classification and clustering, and the modelling of time series, spatial and spatio-temporal data. An overview of currently available software for analysing directional data is also provided, and potential future developments discussed.

preprint2020arXiv

Smoothing-based tests with directional random variables

Testing procedures for assessing specific parametric model forms, or for checking the plausibility of simplifying assumptions, play a central role in the mathematical treatment of the uncertain. No certain answers are obtained by testing methods, but at least the uncertainty of these answers is properly quantified. This is the case for tests designed on the two most general data generating mechanisms in practice: distribution/density and regression models. Testing proposals are usually formulated on the Euclidean space, but important challenges arise in non-Euclidean settings, such as when directional variables (i.e., random vectors on the hypersphere) are involved. This work reviews some of the smoothing-based testing procedures for density and regression models that comprise directional variables. The asymptotic distributions of the revised proposals are presented, jointly with some numerical illustrations justifying the need of employing resampling mechanisms for effective test calibration.

preprint2020arXiv

Testing parametric models in linear-directional regression

This paper presents a goodness-of-fit test for parametric regression models with scalar response and directional predictor, that is, a vector on a sphere of arbitrary dimension. The testing procedure is based on the weighted squared distance between a smooth and a parametric regression estimator, where the smooth regression estimator is obtained by a projected local approach. Asymptotic behavior of the test statistic under the null hypothesis and local alternatives is provided, jointly with a consistent bootstrap algorithm for application in practice. A simulation study illustrates the performance of the test in finite samples. The procedure is applied to test a linear model in text mining.

preprint2020arXiv

Toroidal diffusions and protein structure evolution

This chapter shows how toroidal diffusions are convenient methodological tools for modelling protein evolution in a probabilistic framework. The chapter addresses the construction of ergodic diffusions with stationary distributions equal to well-known directional distributions, which can be regarded as toroidal analogues of the Ornstein-Uhlenbeck process. The important challenges that arise in the estimation of the diffusion parameters require the consideration of tractable approximate likelihoods and, among the several approaches introduced, the one yielding a specific approximation to the transition density of the wrapped normal process is shown to give the best empirical performance on average. This provides the methodological building block for Evolutionary Torus Dynamic Bayesian Network (ETDBN), a hidden Markov model for protein evolution that emits a wrapped normal process and two continuous-time Markov chains per hidden state. The chapter describes the main features of ETDBN, which allows for both &#34;smooth&#34; conformational changes and &#34;catastrophic&#34; conformational jumps, and several empirical benchmarks. The insights into the relationship between sequence and structure evolution that ETDBN provides are illustrated in a case study.