Researcher profile

Markus Pauly

Markus Pauly contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
17works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

17 published item(s)

preprint2025arXiv

Choosing a Model, Shaping a Future: Comparing LLM Perspectives on Sustainability and its Relationship with AI

As organizations increasingly rely on AI systems for decision support in sustainability contexts, it becomes critical to understand the inherent biases and perspectives embedded in Large Language Models (LLMs). This study systematically investigates how five state-of-the-art LLMs -- Claude, DeepSeek, GPT, LLaMA, and Mistral - conceptualize sustainability and its relationship with AI. We administered validated, psychometric sustainability-related questionnaires - each 100 times per model -- to capture response patterns and variability. Our findings revealed significant inter-model differences: For example, GPT exhibited skepticism about the compatibility of AI and sustainability, whereas LLaMA demonstrated extreme techno-optimism with perfect scores for several Sustainable Development Goals (SDGs). Models also diverged in attributing institutional responsibility for AI and sustainability integration, a results that holds implications for technology governance approaches. Our results demonstrate that model selection could substantially influence organizational sustainability strategies, highlighting the need for awareness of model-specific biases when deploying LLMs for sustainability-related decision-making.

preprint2023arXiv

The impact of neglected confounding and interactions in mixed-effects meta-regression

Analysts seldom include interaction terms in meta-regression model, what can introduce bias if an interaction is present. We illustrate this in the current paper by re-analyzing an example from research on acute heart failure, where neglecting an interaction might have led to erroneous inference and conclusions. Moreover, we perform a brief simulation study based on this example highlighting the effects caused by omitting or unnecessarily including interaction terms. Based on our results, we recommend to always include interaction terms in mixed-effects meta-regression models, when such interactions are plausible.

preprint2022arXiv

Cluster-Robust Estimators for Bivariate Mixed-Effects Meta-Regression

Meta-analyses frequently include trials that report multiple effect sizes based on a common set of study participants. These effect sizes will generally be correlated. Cluster-robust variance-covariance estimators are a fruitful approach for synthesizing dependent effects. However, when the number of studies is small, state-of-the-art robust estimators can yield inflated Type 1 errors. We present two new cluster-robust estimators, in order to improve small sample performance. For both new estimators the idea is to transform the estimated variances of the residuals using only the diagonal entries of the hat matrix. Our proposals are asymptotically equivalent to previously suggested cluster-robust estimators such as the bias reduced linearization approach. We apply the methods to real world data and compare and contrast their performance in an extensive simulation study. We focus on bivariate meta-regression, although the approaches can be applied more generally.

preprint2022arXiv

Estimating Gaussian Copulas with Missing Data

In this work we present a rigorous application of the Expectation Maximization algorithm to determine the marginal distributions and the dependence structure in a Gaussian copula model with missing data. We further show how to circumvent a priori assumptions on the marginals with semiparametric modelling. The joint distribution learned through this algorithm is considerably closer to the underlying distribution than existing methods.

preprint2022arXiv

Inference for high-dimensional split-plot designs with different dimensions between groups

In repeated Measure Designs with multiple groups, the primary purpose is to compare different groups in various aspects. For several reasons, the number of measurements and therefore the dimension of the observation vectors can depend on the group, making the usage of existing approaches impossible. We develop an approach which can be used not only for a possibly increasing number of groups $a$, but also for group-depending dimension $d_i$, which is allowed to go to infinity. This is a unique high-dimensional asymptotic framework impressing through its variety and do without usual conditions on the relation between sample size and dimension. It especially includes settings with fixed dimensions in some groups and increasing dimensions in other ones, which can be seen as semi-high-dimensional. To find a appropriate statistic test new and innovative estimators are developed, which can be used under these diverse settings on $a,d_i$ and $n_i$ without any adjustments. We investigated the asymptotic distribution of a quadratic-form-based test statistic and developed an asymptotic correct test. Finally, an extensive simulation study is conducted to investigate the role of the single group's dimension.

preprint2022arXiv

Machine Learning for Multi-Output Regression: When should a holistic multivariate approach be preferred over separate univariate ones?

Tree-based ensembles such as the Random Forest are modern classics among statistical learning methods. In particular, they are used for predicting univariate responses. In case of multiple outputs the question arises whether we separately fit univariate models or directly follow a multivariate approach. For the latter, several possibilities exist that are, e.g. based on modified splitting or stopping rules for multi-output regression. In this work we compare these methods in extensive simulations to help in answering the primary question when to use multivariate ensemble techniques.

preprint2022arXiv

On the role of data, statistics and decisions in a pandemic

A pandemic poses particular challenges to decision-making because of the need to continuously adapt decisions to rapidly changing evidence and available data. For example, which countermeasures are appropriate at a particular stage of the pandemic? How can the severity of the pandemic be measured? What is the effect of vaccination in the population and which groups should be vaccinated first? The process of decision-making starts with data collection and modeling and continues to the dissemination of results and the subsequent decisions taken. The goal of this paper is to give an overview of this process and to provide recommendations for the different steps from a statistical perspective. In particular, we discuss a range of modeling techniques including mathematical, statistical and decision-analytic models along with their applications in the COVID-19 context. With this overview, we aim to foster the understanding of the goals of these modeling approaches and the specific data requirements that are essential for the interpretation of results and for successful interdisciplinary collaborations. A special focus is on the role played by data in these different models, and we incorporate into the discussion the importance of statistical literacy, and of effective dissemination and communication of findings.

preprint2022arXiv

The nonparametric Behrens-Fisher problem in small samples

While there appears to be a general consensus in the literature on the definition of the estimand and estimator associated with the Wilcoxon-Mann-Whitney test, it seems somewhat less clear as to how best to estimate the variance. In addition to the Wilcoxon-Mann-Whitney test, we review different proposals of variance estimators consistent under both the null hypothesis and the alternative. Moreover, in case of small sample sizes, an approximation of the distribution of the test statistic based on the t-distribution, a logit transformation and a permutation approach have been proposed. Focussing as well on different estimators of the degrees of freedom as regards the t-approximation, we carried out simulations for a range of scenarios, with results indicating that the performance of different variance estimators in terms of controlling the type I error rate largely depends on the heteroskedasticity pattern and the sample size allocation ratio, not on the specific type of distributions employed. By and large, a particular t-approximation together with Perme and Manevski's variance estimator best maintains the nominal significance level

preprint2021arXiv

Goodness (of fit) of Imputation Accuracy: The GoodImpact Analysis

In statistical survey analysis, (partial) non-responders are integral elements during data acquisition. Treating missing values during data preparation and data analysis is therefore a non-trivial underpinning. Focusing on different data sets from the Federal Statistical Office of Germany (DESTATIS), we investigate various imputation methods regarding their imputation accuracy. Since the latter is not uniquely determined in theory and practice, we study different measures for assessing imputation accuracy: Beyond the most common measures, the normalized-root mean squared error (NRMSE) and the proportion of false classification (PFC), we put a special focus on (distribution) distance- and association measures for assessing imputation accuracy. The aim is to deliver guidelines for correctly assessing distributional accuracy after imputation. Our empirical findings indicate a discrepancy between the NRMSE resp. PFC and distance measures. While the latter measure distributional similarities, NRMSE and PFC focus on data reproducibility. We realize that a low NRMSE or PFC seem not to imply lower distributional discrepancies. Although several measures for assessing distributional discrepancies exist, our results indicate that not all of them are suitable for evaluating imputation-induced differences.

preprint2021arXiv

On the Relation between Prediction and Imputation Accuracy under Missing Covariates

Missing covariates in regression or classification problems can prohibit the direct use of advanced tools for further analysis. Recent research has realized an increasing trend towards the usage of modern Machine Learning algorithms for imputation. It originates from their capability of showing favourable prediction accuracy in different learning problems. In this work, we analyze through simulation the interaction between imputation accuracy and prediction accuracy in regression learning problems with missing covariates when Machine Learning based methods for both, imputation and prediction are used. In addition, we explore imputation performance when using statistical inference procedures in prediction settings, such as coverage rates of (valid) prediction intervals. Our analysis is based on empirical datasets provided by the UCI Machine Learning repository and an extensive simulation study.

preprint2020arXiv

Fisher transformation based Confidence Intervals of Correlations in Fixed- and Random-Effects Meta-Analysis

Meta-analyses of correlation coefficients are an important technique to integrate results from many cross-sectional and longitudinal research designs. Uncertainty in pooled estimates is typically assessed with the help of confidence intervals, which can double as hypothesis tests for two-sided hypotheses about the underlying correlation. A standard approach to construct confidence intervals for the main effect is the Hedges-Olkin-Vevea Fisher-z (HOVz) approach, which is based on the Fisher-z transformation. Results from previous studies (Field, 2005; Hafdahl and Williams, 2009), however, indicate that in random-effects models the performance of the HOVz confidence interval can be unsatisfactory. To this end, we propose improvements of the HOVz approach, which are based on enhanced variance estimators for the main effect estimate. In order to study the coverage of the new confidence intervals in both fixed- and random-effects meta-analysis models, we perform an extensive simulation study, comparing them to established approaches. Data were generated via a truncated normal and beta distribution model. The results show that our newly proposed confidence intervals based on a Knapp-Hartung-type variance estimator or robust heteroscedasticity consistent sandwich estimators in combination with the integral z-to-r transformation (Hafdahl, 2009) provide more accurate coverage than existing approaches in most scenarios, especially in the more appropriate beta distribution simulation model.

preprint2020arXiv

Inferring median survival differences in general factorial designs via permutation tests

Factorial survival designs with right-censored observations are commonly inferred by Cox regression and explained by means of hazard ratios. However, in case of non-proportional hazards, their interpretation can become cumbersome; especially for clinicians. We therefore offer an alternative: median survival times are used to estimate treatment and interaction effects and null hypotheses are formulated in contrasts of their population versions. Permutation-based tests and confidence regions are proposed and shown to be asymptotically valid. Their type-1 error control and power behavior are investigated in extensive simulations, showing the new methods' wide applicability. The latter is complemented by an illustrative data analysis.

preprint2020arXiv

Is there a role for statistics in artificial intelligence?

The research on and application of artificial intelligence (AI) has triggered a comprehensive scientific, economic, social and political discussion. Here we argue that statistics, as an interdisciplinary scientific field, plays a substantial role both for the theoretical and practical understanding of AI and for its future development. Statistics might even be considered a core element of AI. With its specialist knowledge of data evaluation, starting with the precise formulation of the research question and passing through a study design stage on to analysis and interpretation of the results, statistics is a natural partner for other disciplines in teaching, research and practice. This paper aims at contributing to the current discussion by highlighting the relevance of statistical methodology in the context of AI development. In particular, we discuss contributions of statistics to the field of artificial intelligence concerning methodological development, planning and design of studies, assessment of data quality and data collection, differentiation of causality and associations and assessment of uncertainty in results. Moreover, the paper also deals with the equally necessary and meaningful extension of curricula in schools and universities.

preprint2020arXiv

Permutation inference in factorial survival designs with the CASANOVA

We propose inference procedures for general nonparametric factorial survival designs with possibly right-censored data. Similar to additive Aalen models, null hypotheses are formulated in terms of cumulative hazards. Thereby, deviations are measured in terms of quadratic forms in Nelson-Aalen-type integrals. Different to existing approaches this allows to work without restrictive model assumptions as proportional hazards. In particular, crossing survival or hazard curves can be detected without a significant loss of power. For a distribution-free application of the method, a permutation strategy is suggested. The resulting procedures' asymptotic validity as well as their consistency are proven and their small sample performances are analyzed in extensive simulations. Their applicability is finally illustrated by analyzing an oncology data set.

preprint2020arXiv

QANOVA: Quantile-based Permutation Methods For General Factorial Designs

Population means and standard deviations are the most common estimands to quantify effects in factorial layouts. In fact, most statistical procedures in such designs are built towards inferring means or contrasts thereof. For more robust analyses, we consider the population median, the interquartile range (IQR) and more general quantile combinations as estimands in which we formulate null hypotheses and calculate compatible confidence regions. Based upon simultaneous multivariate central limit theorems and corresponding resampling results, we derive asymptotically correct procedures in general, potentially heteroscedastic, factorial designs with univariate endpoints. Special cases cover robust tests for the population median or the IQR in arbitrary crossed one-, two- and higher-way layouts with potentially heteroscedastic error distributions. In extensive simulations we analyze their small sample properties and also conduct an illustrating data analysis comparing children's height and weight from different countries.

preprint2020arXiv

Small-sample performance and underlying assumptions of a bootstrap-based inference method for a general analysis of covariance model with possibly heteroskedastic and nonnormal errors

It is well known that the standard F test is severely affected by heteroskedasticity in unbalanced analysis of covariance (ANCOVA) models. Currently available potential remedies for such a scenario are based on heteroskedasticity-consistent covariance matrix estimation (HCCME). However, the HCCME approach tends to be liberal in small samples. Therefore, in the present manuscript, we propose a combination of HCCME and a wild bootstrap technique, with the aim of improving the small-sample performance. We precisely state a set of assumptions for the general ANCOVA model and discuss their practical interpretation in detail, since this issue may have been somewhat neglected in applied research so far. We prove that these assumptions are sufficient to ensure the asymptotic validity of the combined HCCME-wild bootstrap ANCOVA. The results of our simulation study indicate that our proposed test remedies the problems of the ANCOVA F test and its heteroskedasticity-consistent alternatives in small to moderate sample size scenarios. Our test only requires very mild conditions, thus being applicable in a broad range of real-life settings, as illustrated by the detailed discussion of a dataset from preclinical research on spinal cord injury. Our proposed method is ready-to-use and allows for valid hypothesis testing in frequently encountered settings (e.g., comparing group means while adjusting for baseline measurements in a randomized controlled clinical trial).

preprint2019arXiv

Multivariate analysis of covariance when standard assumptions are violated

In applied research, it is often sensible to account for one or several covariates when testing for differences between multivariate means of several groups. However, the "classical" parametric multivariate analysis of covariance (MANCOVA) tests (e.g., Wilks' Lambda) are based on quite restrictive assumptions (homoscedasticity and normality of the errors), which might be difficult to justify in small sample size settings. Furthermore, existing potential remedies (e.g., heteroskedasticity-robust approaches) become inappropriate in cases where the covariance matrices are singular. Nevertheless, such scenarios are frequently encountered in the life sciences and other fields, when for example, in the context of standardized assessments, a summary performance measure as well as its corresponding subscales are analyzed. In the present manuscript, we consider a general MANCOVA model, allowing for potentially heteroskedastic and even singular covariance matrices as well as non-normal errors. We combine heteroskedasticity-consistent covariance matrix estimation methods with our proposed modified MANCOVA ANOVA-type statistic (MANCATS) and apply two different bootstrap approaches. We provide the proofs of the asymptotic validity of the respective testing procedures as well as the results from an extensive simulation study, which indicate that especially the parametric bootstrap version of the MANCATS outperforms its competitors in most scenarios, both in terms of type I error rates and power. These considerations are further illustrated and substantiated by examining real-life data from standardized achievement tests.