Source author record

Willem Kruijer

Willem Kruijer appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications math.ST Methodology Populations and Evolution Quantitative Methods Statistics Theory

Catalog footprint

What is connected

4works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

A latent factor approach to hyperspectral time series data for multivariate genomic prediction of grain yield in wheat

High-dimensional time series phenotypic data is becoming increasingly common within plant breeding programmes. However, analysing and integrating such data for genetic analysis and genomic prediction remains difficult. Here we show how factor analysis with Procrustes rotation on the genetic correlation matrix of hyperspectral secondary phenotype data can help in extracting relevant features for within-trial prediction. We use a subset of Centro Internacional de Mejoramiento de Maíz y Trigo (CIMMYT) elite yield wheat trial of 2014-2015, consisting of 1,033 genotypes. These were measured across three irrigation treatments at several timepoints during the season, using manned airplane flights with hyperspectral sensors capturing 62 bands in the spectrum of 385-850 nm. We perform multivariate genomic prediction using latent variables to improve within-trial genomic predictive ability (PA) of wheat grain yield within three distinct watering treatments. By integrating latent variables of the hyperspectral data in a multivariate genomic prediction model, we are able to achieve an absolute gain of .1 to .3 (on the correlation scale) in PA compared to univariate genomic prediction. Furthermore, we show which timepoints within a trial are important and how these relate to plant growth stages. This paper showcases how domain knowledge and data-driven approaches can be combined to increase PA and gain new insights from sensor data of high-throughput phenotyping platforms.

preprint2015arXiv

Marker-based estimation of heritability in immortal populations

Heritability is a central parameter in quantitative genetics, both from an evolutionary and a breeding perspective. For plant traits heritability is traditionally estimated by comparing within and between genotype variability. This approach estimates broad-sense heritability, and does not account for different genetic relatedness. With the availability of high-density markers there is growing interest in marker based estimates of narrow-sense heritability, using mixed models in which genetic relatedness is estimated from genetic markers. Such estimates have received much attention in human genetics but are rarely reported for plant traits. A major obstacle is that current methodology and software assume a single phenotypic value per genotype, hence requiring genotypic means. An alternative that we propose here, is to use mixed models at individual plant or plot level. Using statistical arguments, simulations and real data we investigate the feasibility of both approaches, and how these affect genomic prediction with G-BLUP and genome-wide association studies. Heritability estimates obtained from genotypic means had very large standard errors and were sometimes biologically unrealistic. Mixed models at individual plant or plot level produced more realistic estimates, and for simulated traits standard errors were up to 13 times smaller. Genomic prediction was also improved by using these mixed models, with up to a 49% increase in accuracy. For GWAS on simulated traits, the use of individual plant data gave almost no increase in power. The new methodology is applicable to any complex trait where multiple replicates of individual genotypes can be scored. This includes important agronomic crops, as well as bacteria and fungi.

preprint2015arXiv

Misspecification in mixed-model based association analysis

Additive genetic variance in natural populations is commonly estimated using mixed models, in which the covariance of the genetic effects is modeled by a genetic similarity matrix derived from a dense set of markers. An important but usually implicit assumption is that the presence of any non-additive genetic effect only increases the residual variance, and does not affect estimates of additive genetic variance. Here we show that this is only true for panels of unrelated individuals. In case there is genetic relatedness, the combination of population structure and epistatic interactions can lead to inflated estimates of additive genetic variance.

preprint2012arXiv

Bayesian semi-parametric estimation of the long-memory parameter under FEXP-priors

For a Gaussian time series with long-memory behavior, we use the FEXP-model for semi-parametric estimation of the long-memory parameter $d$. The true spectral density $f_o$ is assumed to have long-memory parameter $d_o$ and a FEXP-expansion of Sobolev-regularity $\be > 1$. We prove that when $k$ follows a Poisson or geometric prior, or a sieve prior increasing at rate $n^{\frac{1}{1+2\be}}$, $d$ converges to $d_o$ at a suboptimal rate. When the sieve prior increases at rate $n^{\frac{1}{2\be}}$ however, the minimax rate is almost obtained. Our results can be seen as a Bayesian equivalent of the result which Moulines and Soulier obtained for some frequentist estimators.