Source author record

Ioannis Ntzoufras

Ioannis Ntzoufras appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computation Methodology Applications stat.OT

Catalog footprint

What is connected

12works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Bayesian Handwriting Evidence Evaluation using MANOVA via Fourier-Based Extracted Features

This paper proposes a novel statistical approach that aims at the identification of valid and useful patterns in handwriting examination via Bayesian modeling. Starting from a sample of characters selected among 13 French native writers, an accurate loop reconstruction can be achieved through Fourier analysis. The contour shape of handwritten characters can be described by the first four pairs of Fourier coefficients and by the surface size. Six Bayesian models are considered for such handwritten features. These models arise from two likelihood structures: (a) a multivariate Normal model, and (b) a MANOVA model that accounts for character-level variability. For each likelihood, three different prior formulations are examined, resulting in distinct Bayesian models: (i) a conjugate Normal-Inverse-Wishart prior, (ii) a hierarchical Normal-Inverse-Wishart prior, and (iii) a Normal-LogNormal-LKJ prior specification. The hierarchical prior formulations are of primary interest because they can incorporate the between-writers variability, a distinguishing element that sets writers apart. These approaches do not allow calculation of the marginal likelihood in a closed-form expression. Therefore, bridge sampling is used to estimate it. The Bayes factor is estimated to compare the performance of the proposed models and to evaluate their efficiency for discriminating purposes. Bayesian MANOVA with Normal-LogNormal-LKJ prior showed an overall better performance, in terms of discriminatory capacity and model fitting. Finally, a sensitivity analysis for the elicitation of the prior distribution parameters is performed.

preprint2023arXiv

Assessing competitive balance in the English Premier League for over forty seasons using a stochastic block model

Competitive balance is the subject of much interest in the sports analytics literature and beyond. In this paper, we develop a statistical network model based on an extension of the stochastic block model to assess the balance between teams in a league. Here we represent the outcome of all matches in a football season as a dense network with nodes identified by teams and categorical edges representing the outcome of each game as a win, draw or a loss. The main focus and motivation for this paper is to provide a statistical framework to assess the issue of competitive balance in the context of the English First Division / Premier League over more than 40 seasons. The Premier League is arguably one of the most popular leagues in the world, in terms of its global reach and the revenue which it generates. Therefore it is of wide interest to assess its competitiveness. Our analysis provides evidence suggesting a structural change around the early 2000's from a reasonably balanced league to a two-tier league.

preprint2022arXiv

A Metropolized adaptive subspace algorithm for high-dimensional Bayesian variable selection

A simple and efficient adaptive Markov Chain Monte Carlo (MCMC) method, called the Metropolized Adaptive Subspace (MAdaSub) algorithm, is proposed for sampling from high-dimensional posterior model distributions in Bayesian variable selection. The MAdaSub algorithm is based on an independent Metropolis-Hastings sampler, where the individual proposal probabilities of the explanatory variables are updated after each iteration using a form of Bayesian adaptive learning, in a way that they finally converge to the respective covariates' posterior inclusion probabilities. We prove the ergodicity of the algorithm and present a parallel version of MAdaSub with an adaptation scheme for the proposal probabilities based on the combination of information from multiple chains. The effectiveness of the algorithm is demonstrated via various simulated and real data examples, including a high-dimensional problem with more than 20,000 covariates.

preprint2022arXiv

On the identifiability of Bayesian factor analytic models

A well known identifiability issue in factor analytic models is the invariance with respect to orthogonal transformations. This problem burdens the inference under a Bayesian setup, where Markov chain Monte Carlo (MCMC) methods are used to generate samples from the posterior distribution. We introduce a post-processing scheme in order to deal with rotation, sign and permutation invariance of the MCMC sample. The exact version of the contributed algorithm requires to solve $2^q$ assignment problems per (retained) MCMC iteration, where $q$ denotes the number of factors of the fitted model. For large numbers of factors two approximate schemes based on simulated annealing are also discussed. We demonstrate that the proposed method leads to interpretable posterior distributions using synthetic and publicly available data from typical factor analytic models as well as mixtures of factor analyzers. An R package is available online at CRAN web-page.

preprint2020arXiv

A Bayesian Quest for Finding a Unified Model for Predicting Volleyball Games

Volleyball is a team sport with unique and specific characteristics. We introduce a new two level-hierarchical Bayesian model which accounts for theses volleyball specific characteristics. In the first level, we model the set outcome with a simple logistic regression model. Conditionally on the winner of the set, in the second level, we use a truncated negative binomial distribution for the points earned by the loosing team. An additional Poisson distributed inflation component is introduced to model the extra points played in the case that the two teams have point difference less than two points. The number of points of the winner within each set is deterministically specified by the winner of the set and the points of the inflation component. The team specific abilities and the home effect are used as covariates on all layers of the model (set, point, and extra inflated points). The implementation of the proposed model on the Italian Superlega 2017/2018 data shows an exceptional reproducibility of the final league table and a satisfactory predictive ability.

preprint2014arXiv

Limiting behavior of the Jeffreys Power-Expected-Posterior Bayes Factor in Gaussian Linear Models

Expected-posterior priors (EPP) have been proved to be extremely useful for testing hypothesis on the regression coefficients of normal linear models. One of the advantages of using EPPs is that impropriety of baseline priors causes no indeterminacy. However, in regression problems, they based on one or more \textit{training samples}, that could influence the resulting posterior distribution. The power-expected-posterior priors are minimally-informative priors that diminishing the effect of training samples on the EPP approach, by combining ideas from the power-prior and unit-information-prior methodologies. In this paper we show the consistency of the Bayes factors when using the power-expected-posterior priors, with the independence Jeffreys (or reference) prior as a baseline, for normal linear models under very mild conditions on the design matrix.

preprint2014arXiv

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

In the context of the expected-posterior prior (EPP) approach to Bayesian variable selection in linear models, we combine ideas from power-prior and unit-information-prior methodologies to simultaneously produce a minimally-informative prior and diminish the effect of training samples. The result is that in practice our power-expected-posterior (PEP) methodology is sufficiently insensitive to the size n* of the training sample, due to PEP's unit-information construction, that one may take n* equal to the full-data sample size n and dispense with training samples altogether. In this paper we focus on Gaussian linear models and develop our method under two different baseline prior choices: the independence Jeffreys (or reference) prior, yielding the J-PEP posterior, and the Zellner g-prior, leading to Z-PEP. We find that, under the reference baseline prior, the asymptotics of PEP Bayes factors are equivalent to those of Schwartz's BIC criterion, ensuring consistency of the PEP approach to model selection. We compare the performance of our method, in simulation studies and a real example involving prediction of air-pollutant concentrations from meteorological covariates, with that of a variety of previously-defined variants on Bayes factors for objective variable selection. Our prior, due to its unit-information structure, leads to a variable-selection procedure that (1) is systematically more parsimonious than the basic EPP with minimal training sample, while sacrificing no desirable performance characteristics to achieve this parsimony; (2) is robust to the size of the training sample, thus enjoying the advantages described above arising from the avoidance of training samples altogether; and (3) identifies maximum-a-posteriori models that achieve good out-of-sample predictive performance.

preprint2013arXiv

Bayesian transformation family selection: moving towards a transformed Gaussian universe

The problem of transformation selection is thoroughly treated from a Bayesian perspective. Several families of transformations are considered with a view to achieving normality: the Box-Cox, the Modulus, the Yeo & Johnson and the Dual transformation. Markov chain Monte Carlo algorithms have been constructed in order to sample from the posterior distribution of the transformation parameter $λ_T$ associated with each competing family $T$. We investigate different approaches to constructing compatible prior distributions for $λ_T$ over alternative transformation families, using a unit-information power-prior approach and an alternative normal prior with approximate unit-information interpretation. Selection and discrimination between different transformation families is attained via posterior model probabilities. We demonstrate the efficiency of our approach using a variety of simulated datasets. Although there is no choice of transformation family that can be universally applied to all problems, empirical evidence suggests that some particular data structures are best treated by specific transformation families. For example, skewness is associated with the Box-Cox family while fat-tailed distributions are efficiently treated using the Modulus transformation.

preprint2013arXiv

Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions

In latent variable models the parameter estimation can be implemented by using the joint or the marginal likelihood, based on independence or conditional independence assumptions. The same dilemma occurs within the Bayesian framework with respect to the estimation of the Bayesian marginal (or integrated) likelihood, which is the main tool for model comparison and averaging. In most cases, the Bayesian marginal likelihood is a high dimensional integral that cannot be computed analytically and a plethora of methods based on Monte Carlo integration (MCI) are used for its estimation. In this work, it is shown that the joint MCI approach makes subtle use of the properties of the adopted model, leading to increased error and bias in finite settings. The sources and the components of the error associated with estimators under the two approaches are identified here and provided in exact forms. Additionally, the effect of the sample covariation on the Monte Carlo estimators is examined. In particular, even under independence assumptions the sample covariance will be close to (but not exactly) zero which surprisingly has a severe effect on the estimated values and their variability. To address this problem, an index of the sample's divergence from independence is introduced as a multivariate extension of covariance. The implications addressed here are important in the majority of practical problems appearing in Bayesian inference of multi-parameter models with analogous structures.

preprint2013arXiv

Power-Conditional-Expected Priors: Using g-priors with Random Imaginary Data for Variable Selection

The Zellner's g-prior and its recent hierarchical extensions are the most popular default prior choices in the Bayesian variable selection context. These prior set-ups can be expressed power-priors with fixed set of imaginary data. In this paper, we borrow ideas from the power-expected-posterior (PEP) priors in order to introduce, under the g-prior approach, an extra hierarchical level that accounts for the imaginary data uncertainty. For normal regression variable selection problems, the resulting power-conditional-expected-posterior (PCEP) prior is a conjugate normal-inverse gamma prior which provides a consistent variable selection procedure and gives support to more parsimonious models than the ones supported using the g-prior and the hyper-g prior for finite samples. Detailed illustrations and comparisons of the variable selection procedures using the proposed method, the g-prior and the hyper-g prior are provided using both simulated and real data examples.

preprint2013arXiv

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Within path sampling framework, we show that probability distribution divergences, such as the Chernoff information, can be estimated via thermodynamic integration. The Boltzmann-Gibbs distribution pertaining to different Hamiltonians is implemented to derive tempered transitions along the path, linking the distributions of interest at the endpoints. Under this perspective, a geometric approach is feasible, which prompts intuition and facilitates tuning the error sources. Additionally, there are direct applications in Bayesian model evaluation. Existing marginal likelihood and Bayes factor estimators are reviewed here along with their stepping-stone sampling analogues. New estimators are presented and the use of compound paths is introduced.

preprint2012arXiv

Joint Specification of Model Space and Parameter Space Prior Distributions

We consider the specification of prior distributions for Bayesian model comparison, focusing on regression-type models. We propose a particular joint specification of the prior distribution across models so that sensitivity of posterior model probabilities to the dispersion of prior distributions for the parameters of individual models (Lindley's paradox) is diminished. We illustrate the behavior of inferential and predictive posterior quantities in linear and log-linear regressions under our proposed prior densities with a series of simulated and real data examples.

Ioannis Ntzoufras

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Bayesian Handwriting Evidence Evaluation using MANOVA via Fourier-Based Extracted Features

Assessing competitive balance in the English Premier League for over forty seasons using a stochastic block model

A Metropolized adaptive subspace algorithm for high-dimensional Bayesian variable selection

On the identifiability of Bayesian factor analytic models

A Bayesian Quest for Finding a Unified Model for Predicting Volleyball Games

Limiting behavior of the Jeffreys Power-Expected-Posterior Bayes Factor in Gaussian Linear Models

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Bayesian transformation family selection: moving towards a transformed Gaussian universe

Explaining the behavior of joint and marginal Monte Carlo estimators in latent variable models with independence assumptions

Power-Conditional-Expected Priors: Using g-priors with Random Imaginary Data for Variable Selection

Thermodynamic assessment of probability distribution divergencies and Bayesian model comparison

Joint Specification of Model Space and Parameter Space Prior Distributions