Source author record

Jan van Waaij

Jan van Waaij appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

math.ST Methodology Statistics Theory

Catalog footprint

What is connected

3works

3topics

2close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Estimation of the covariance structure from SNP allele frequencies

We propose two new statistics, V and S, to disentangle the population history of related populations from SNP frequency data. If the populations are related by a tree, we show by theoretical means as well as by simulation that the new statistics are able to identify the root of a tree correctly, in contrast to standard statistics, such as the observed matrix of F2-statistics (distances between pairs of populations). The statistic V is obtained by averaging over all SNPs (similar to standard statistics). Its expectation is the true covariance matrix of the observed population SNP frequencies, offset by a matrix with identical entries. In contrast, the statistic S is put in a Bayesian context and is obtained by averaging over pairs of SNPs, such that each SNP is only used once. It thus makes use of the joint distribution of pairs of SNPs. In addition, we provide a number of novel mathematical results about old and new statistics, and their mutual relationship.

preprint2022arXiv

Necessary and sufficient conditions for identifiability in the admixture model

We consider M SNP data from N individuals who are an admixture of K unknown ancient populations. Let $Π_{si}$ be the frequency of the reference allele of individual i at SNP s. So the number of reference alleles at SNP s for a diploid individual is binomially distributed with parameters 2 and $Π_{si}$. We suppose $Π_{si}=\sum_{k=1}^KF_{sk}Q_{ki}$, where $F_{sk}$ is the allele frequency of SNP s in population k and $Q_{ki}$ is the proportion of population k in the ancestry of individual i. I am interested in the identifiability of F and Q, up to a relabelling of the ancient populations. Under what conditions, when $Π=F^1Q^1=F^2Q^2$ are $F^1$ and $F^2$ and $Q^1$ and $Q^2$ equal? I show that the anchor condition (Cabreros and Storey, 2019) on one matrix together with an independence condition on the other matrix is sufficient for identifiability. I will argue that the proof of the necessary condition in Cabreros and Storey, 2019 is incorrect, and I will provide a correct proof, which in addition does not require knowledge of the number of ancestral populations. I will also provide abstract necessary and sufficient conditions for identifiability. I will show that one cannot deviate substantially from the anchor condition without losing identifiability. Finally, I show necessary and sufficient conditions for identifiability for the non-admixed case.

preprint2020arXiv

Adaptive posterior contraction rates for empirical Bayesian drift estimation of a diffusion

Due to their conjugate posteriors, Gaussian process priors are attractive for estimating the drift of stochastic differential equations with continuous time observations. However, their performance strongly depends on the choice of the hyper-parameters. We employ the marginal maximum likelihood estimator to estimate the scaling and/or smoothness parameter(s) of the prior and show that the corresponding posterior has optimal rates of convergence. General theorems do not apply directly to this model as the usual test functions are with respect to a random Hellinger-type metric. We allow for continuous and discrete, one- and two-dimensional sets of hyper-parameters, where optimising over the two-dimensional set of smoothness and scaling hyper-parameters is shown to be beneficial in terms of the adaptive range.