Source author record

Matthias Egger

Matthias Egger appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology Digital Libraries Machine Learning

Catalog footprint

What is connected

5works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Multiple imputation of incomplete multilevel data using Heckman selection models

Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, i.e. the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five subregions in Uganda.

preprint2022arXiv

A two-stage prediction model for heterogeneous effects of many treatment options: application to drugs for Multiple Sclerosis

Treatment effects vary across different patients and estimation of this variability is important for clinical decisions. The aim is to develop a model to estimate the benefit of alternative treatment options for individual patients. Hence, we developed a two-stage prediction model for heterogeneous treatment effects, by combining prognosis research and network meta-analysis methods when individual patient data is available. In a first stage, we develop a prognostic model and we predict the baseline risk of the outcome. In the second stage, we use this baseline risk score from the first stage as a single prognostic factor and effect modifier in a network meta-regression model. We apply the approach to a network meta-analysis of three randomized clinical trials comparing the relapse rate in Natalizumab, Glatiramer Acetate and Dimethyl Fumarate including 3590 patients diagnosed with relapsing-remitting multiple sclerosis. We find that the baseline risk score modifies the relative and absolute treatment effects. Several patient characteristics such as age and disability status impact on the baseline risk of relapse, and this in turn moderates the benefit that may be expected for each of the treatments. For high-risk patients, the treatment that minimizes the risk to relapse in two years is Natalizumab, whereas for low-risk patients Dimethyl Fumarate Fumarate might be a better option. Our approach can be easily extended to all outcomes of interest and has the potential to inform a personalised treatment approach.

preprint2022arXiv

Development, validation and clinical usefulness of a prognostic model for relapse in relapsing-remitting multiple sclerosis

Prognosis on the occurrence of relapses in individuals with Relapsing-Remitting Multiple Sclerosis (RRMS), the most common subtype of Multiple Sclerosis (MS), could support individualized decisions and disease management and could be helpful for efficiently selecting patients in future randomized clinical trials. There are only three previously published prognostic models on this, all of them with important methodological shortcomings. We aim to present the development, internal validation, and evaluation of the potential clinical benefit of a prognostic model for relapses for individuals with RRMS using real world data. We followed seven steps to develop and validate the prognostic model. Finally, we evaluated the potential clinical benefit of the developed prognostic model using decision curve analysis. We selected eight baseline prognostic factors: age, sex, prior MS treatment, months since last relapse, disease duration, number of prior relapses, expanded disability status scale (EDSS), and gadolinium enhanced lesions. We also developed a web application where the personalized probabilities to relapse within two years are calculated automatically. The optimism-corrected c-statistic is 0.65 and the optimism-corrected calibration slope was 0.92. The model appears to be clinically useful between the range 15% and 30% of the threshold probability to relapse. The prognostic model we developed offers several advantages in comparison to previously published prognostic models on RRMS. Importantly, we assessed the potential clinical benefit to better quantify the clinical impact of the model. Our web application, once externally validated in the future, could be used by patients and doctors to calculate the individualized probability to relapse within two years and to inform the management of their disease.

preprint2022arXiv

Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study

The journal impact factor (JIF) is often equated with journal quality and the quality of the peer review of the papers submitted to the journal. We examined the association between the content of peer review and JIF by analysing 10,000 peer review reports submitted to 1,644 medical and life sciences journals. Two researchers hand-coded a random sample of 2,000 sentences. We then trained machine learning models to classify all 187,240 sentences as contributing or not contributing to content categories. We examined the association between ten groups of journals defined by JIF deciles and the content of peer reviews using linear mixed-effects models, adjusting for the length of the review. The JIF ranged from 0.21 to 74.70. The length of peer reviews increased from the lowest (median number of words 185) to the JIF group (387 words). The proportion of sentences allocated to different content categories varied widely, even within JIF groups. For thoroughness, sentences on 'Materials and Methods' were more common in the highest JIF journals than in the lowest JIF group (difference of 7.8 percentage points; 95% CI 4.9 to 10.7%). The trend for 'Presentation and Reporting' went in the opposite direction, with the highest JIF journals giving less emphasis to such content (difference -8.9%; 95% CI -11.3 to -6.5%). For helpfulness, reviews for higher JIF journals devoted less attention to 'Suggestion and Solution' and provided fewer Examples than lower impact factor journals. No, or only small differences were evident for other content categories. In conclusion, peer review in journals with higher JIF tends to be more thorough in discussing the methods used but less helpful in terms of suggesting solutions and providing examples. Differences were modest and variability high, indicating that the JIF is a bad predictor for the quality of peer review of an individual manuscript.

preprint2022arXiv

Rethinking the Funding Line at the Swiss National Science Foundation: Bayesian Ranking and Lottery

Funding agencies rely on peer review and expert panels to select the research deserving funding. Peer review has limitations, including bias against risky proposals or interdisciplinary research. The inter-rater reliability between reviewers and panels is low, particularly for proposals near the funding line. Funding agencies are also increasingly acknowledging the role of chance. The Swiss National Science Foundation (SNSF) introduced a lottery for proposals in the middle group of good but not excellent proposals. In this article, we introduce a Bayesian hierarchical model for the evaluation process. To rank the proposals, we estimate their expected ranks (ER), which incorporates both the magnitude and uncertainty of the estimated differences between proposals. A provisional funding line is defined based on ER and budget. The ER and its credible interval are used to identify proposals with similar quality and credible intervals that overlap with the funding line. These proposals are entered into a lottery. We illustrate the approach for two SNSF grant schemes in career and project funding. We argue that the method could reduce bias in the evaluation process. R code, data and other materials for this article are available online.

Matthias Egger

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

Multiple imputation of incomplete multilevel data using Heckman selection models

A two-stage prediction model for heterogeneous effects of many treatment options: application to drugs for Multiple Sclerosis

Development, validation and clinical usefulness of a prognostic model for relapse in relapsing-remitting multiple sclerosis

Journal Impact Factor and Peer Review Thoroughness and Helpfulness: A Supervised Machine Learning Study

Rethinking the Funding Line at the Swiss National Science Foundation: Bayesian Ranking and Lottery