Researcher profile

Thorsten Dickhaus

Thorsten Dickhaus contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
8works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

8 published item(s)

preprint2022arXiv

Combining Multiple Testing with Multivariate Singular Spectrum Analysis

Appropriate preprocessing is a fundamental prerequisite for analyzing a noisy dataset. The purpose of this paper is to apply a nonparametric preprocessing method, called Singular Spectrum Analysis (SSA), to a variety of datasets which are subsequently analyzed by means of multiple statistical hypothesis tests. SSA is a nonparametric preprocessing method which has recently been utilized in the context of many life science problems. In the present work, SSA is compared with three other state-of-the-art preprocessing methods in terms of goodness of denoising and in terms of the statistical power of the subsequent multiple test. These other methods are either parametric or nonparametric. Our findings demonstrate that (multivariate) SSA can be taken into account as a promising method to reduce noise, to extract the main signal from noisy data, and to detect statistically significant signal components.

preprint2022arXiv

Long-term temporal evolution of extreme temperature in a warming Earth

We present a new approach to modeling the future development of extreme temperatures globally and on a long time-scale by using non-stationary generalized extreme value distributions in combination with logistic functions. This approach is applied to data from the fully coupled climate model AWI-ESM. It enables us to investigate how extremes will change depending on the geographic location not only in terms of the magnitude, but also in terms of the timing of the changes. We observe that in general, changes in extremes are stronger and more rapid over land masses than over oceans. In addition, our models differentiate between changes in mean, in variability and in distributional shape, allowing for developments in these statistics to take place independently and at different times. Different models are presented and the Bayesian Information Criterion is used for model selection. It turns out that in most regions, changes in mean and variance take place simultaneously while the shape parameter of the distribution is predicted to stay constant. In the Arctic region, however, a different picture emerges: There, climate variability drastically and abruptly increases around 2050 due to the melting of ice, whereas changes in the mean values take longer and come into effect later.

preprint2022arXiv

Multiple multi-sample testing under arbitrary covariance dependency

Modern high-throughput biomedical devices routinely produce data on a large scale, and the analysis of high-dimensional datasets has become commonplace in biomedical studies. However, given thousands or tens of thousands of measured variables in these datasets, extracting meaningful features poses a challenge. In this article, we propose a procedure to evaluate the strength of the associations between a nominal (categorical) response variable and multiple features simultaneously. Specifically, we propose a framework of large-scale multiple testing under arbitrary correlation dependency among test statistics. First, marginal multinomial regressions are performed for each feature individually. Second, we use an approach of multiple marginal models for each baseline-category pair to establish asymptotic joint normality of the stacked vector of the marginal multinomial regression coefficients. Third, we estimate the (limiting) covariance matrix between the estimated coefficients from all marginal models. Finally, our approach approximates the realized false discovery proportion of a thresholding procedure for the marginal p-values, for each baseline-category pair. The proposed approach offers a sensible trade-off between the expected numbers of true and false rejections. Furthermore, we demonstrate a practical application of the method on hyperspectral imaging data. This dataset is obtained by a matrix-assisted laser desorption/ionization (MALDI) instrument. MALDI demonstrates tremendous potential for clinical diagnosis, particularly for cancer research. In our application, the nominal response categories represent cancer subtypes.

preprint2020arXiv

On the usage of randomized p-values in the Schweder-Spjotvoll estimator

We are concerned with multiple test problems with composite null hypotheses and the estimation of the proportion $π_{0}$ of true null hypotheses. The Schweder-Spjøtvoll estimator $\hatπ_0$ utilizes marginal $p$-values and only works properly if the $p$-values that correspond to the true null hypotheses are uniformly distributed on $[0,1]$ ($\mathrm{Uni}[0,1]$-distributed). In the case of composite null hypotheses, marginal $p$-values are usually computed under least favorable parameter configurations (LFCs). Thus, they are stochastically larger than $\mathrm{Uni}[0,1]$ under non-LFCs in the null hypotheses. When using these LFC-based $p$-values, $\hatπ_0$ tends to overestimate $π_{0}$. We introduce a new way of randomizing $p$-values that depends on a tuning parameter $c\in[0,1]$, such that $c=0$ and $c=1$ lead to $\mathrm{Uni}[0,1]$-distributed $p$-values, which are independent of the data, and to the original LFC-based $p$-values, respectively. For a certain value $c=c^{\star}$ the bias of $\hatπ_0$ is minimized when using our randomized $p$-values. This often also entails a smaller mean squared error of the estimator as compared to the usage of the LFC-based $p$-values. We analyze these points theoretically, and we demonstrate them numerically in computer simulations under various standard statistical models.

preprint2020arXiv

Optimizing effective numbers of tests by vine copula modeling

In the multiple testing context, we utilize vine copulae for optimizing the effective number of tests. It is well known that for the calibration of multiple tests (for control of the family-wise error rate) the dependencies between the marginal tests are of utmost importance. It has been shown in previous work, that positive dependencies between the marginal tests can be exploited in order to derive a relaxed Sidak-type multiplicity correction. This correction can conveniently be expressed by calculating the corresponding "effective number of tests" for a given (global) significance level. This methodology can also be applied to blocks of test statistics so that the effective number of tests can be calculated by the sum of the effective numbers of tests for each block. In the present work, we demonstrate how the power of the multiple test can be optimized by taking blocks with high inner-block dependencies. The determination of those blocks will be performed by means of an estimated vine copula model. An algorithm is presented which uses the information of the estimated vine copula to make a data-driven choice of appropriate blocks in terms of (estimated) dependencies. Numerical experiments demonstrate the usefulness of the proposed approach.

preprint2020arXiv

Randomized p-values for multiple testing and their application in replicability analysis

We are concerned with testing replicability hypotheses for many endpoints simultaneously. This constitutes a multiple test problem with composite null hypotheses. Traditional $p$-values, which are computed under least favourable parameter configurations, are over-conservative in the case of composite null hypotheses. As demonstrated in prior work, this poses severe challenges in the multiple testing context, especially when one goal of the statistical analysis is to estimate the proportion $π_0$ of true null hypotheses. Randomized $p$-values have been proposed to remedy this issue. In the present work, we discuss the application of randomized $p$-values in replicability analysis. In particular, we introduce a general class of statistical models for which valid, randomized $p$-values can be calculated easily. By means of computer simulations, we demonstrate that their usage typically leads to a much more accurate estimation of $π_0$. Finally, we apply our proposed methodology to a real data example from genomics.

preprint2013arXiv

False Discovery Rate Control under Archimedean Copula

We are considered with the false discovery rate (FDR) of the linear step-up test $φ^{LSU}$ considered by Benjamini and Hochberg (1995). It is well known that $φ^{LSU}$ controls the FDR at level $m_0 q / m$ if the joint distribution of $p$-values is multivariate totally positive of order 2. In this, $m$ denotes the total number of hypotheses, $m_0$ the number of true null hypotheses, and $q$ the nominal FDR level. Under the assumption of an Archimedean $p$-value copula with completely monotone generator, we derive a sharper upper bound for the FDR of $φ^{LSU}$ as well as a non-trivial lower bound. Application of the sharper upper bound to parametric subclasses of Archimedean $p$-value copulae allows us to increase the power of $φ^{LSU}$ by pre-estimating the copula parameter and adjusting $q$. Based on the lower bound, a sufficient condition is obtained under which the FDR of $φ^{LSU}$ is exactly equal to $m_0 q / m$, as in the case of stochastically independent $p$-values. Finally, we deal with high-dimensional multiple test problems with exchangeable test statistics by drawing a connection between infinite sequences of exchangeable $p$-values and Archimedean copulae with completely monotone generators. Our theoretical results are applied to important copula families, including Clayton copulae and Gumbel copulae.

preprint2011arXiv

On least favorable configurations for step-up-down tests

This paper investigates an open issue related to false discovery rate (FDR) control of step-up-down (SUD) multiple testing procedures. It has been established in earlier literature that for this type of procedure, under some broad conditions, and in an asymptotical sense, the FDR is maximum when the signal strength under the alternative is maximum. In other words, so-called "Dirac uniform configurations" are asymptotically {\em least favorable} in this setting. It is known that this property also holds in a non-asymptotical sense (for any finite number of hypotheses), for the two extreme versions of SUD procedures, namely step-up and step-down (with extra conditions for the step-down case). It is therefore very natural to conjecture that this non-asymptotical {\em least favorable configuration} property could more generally be true for all "intermediate" forms of SUD procedures. We prove that this is, somewhat surprisingly, not the case. The argument is based on the exact calculations proposed earlier by Roquain and Villers (2011), that we extend here by generalizing Steck's recursion to the case of two populations. Secondly, we quantify the magnitude of this phenomenon by providing a nonasymptotic upper-bound and explicit vanishing rates as a function of the total number of hypotheses.