Source author record

Martijn J. Schuemie

Martijn J. Schuemie appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Methodology cs.CY Information Retrieval

Catalog footprint

What is connected

4works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2025arXiv

PDA in Action: Ten Principles for High-Quality Multi-Site Clinical Evidence Generation

Background: Distributed Research Networks (DRNs) offer significant opportunities for collaborative multi-site research and have significantly advanced healthcare research based on clinical observational data. However, generating high-quality real-world evidence using fit-for-use data from multi-site studies faces important challenges, including biases associated with various types of heterogeneity within and across sites and data sharing difficulties. Over the last ten years, Privacy-Preserving Distributed Algorithms (PDA) have been developed and utilized in numerous national and international real-world studies spanning diverse domains, from comparative effectiveness research, target trial emulation, to healthcare delivery, policy evaluation, and system performance assessment. Despite these advances, there remains a lack of comprehensive and clear guiding principles for generating high-quality real-world evidence through collaborative studies leveraging the methods under PDA. Objective: The paper aims to establish ten principles of best practice for conducting high-quality multi-site studies using PDA. These principles cover all phases of research, including study preparation, protocol development, analysis, and final reporting. Discussion: The ten principles for conducting a PDA study outline a principled, efficient, and transparent framework for employing distributed learning algorithms within DRNs to generate reliable and reproducible real-world evidence.

preprint2022arXiv

Adjusting for both sequential testing and systematic error in safety surveillance using observational data: Empirical calibration and MaxSPRT

Post-approval safety surveillance of medical products using observational healthcare data can help identify safety issues beyond those found in pre-approval trials. When testing sequentially as data accrue, maximum sequential probability ratio testing (MaxSPRT) is a common approach to maintaining nominal type 1 error. However, the true type 1 error may still deviate from the specified one because of systematic error due to the observational nature of the analysis. This systematic error may persist even after controlling for known confounders. Here we propose to address this issue by combing MaxSPRT with empirical calibration. In empirical calibration, we assume uncertainty about the systematic error in our analysis, the source of uncertainty commonly overlooked in practice. We infer a probability distribution of systematic error by relying on a large set of negative controls: exposure-outcome where no causal effect is believed to exist. Integrating this distribution into our test statistics has previously been shown to restore type 1 error to nominal. Here we show how we can calibrate the critical value central to MaxSPRT. We evaluate this novel approach using simulations and real electronic health records, using H1N1 vaccinations during the 2009-2010 season as an example. Results show that combining empirical calibration with MaxSPRT restores nominal type 1 error. In our real-world example, adjusting for systematic error using empirical calibration has a larger impact than, and hence is just as essential as, adjusting for sequential testing using MaxSPRT. We recommend performing both, using the method described here.

preprint2021arXiv

Combining Cox Regressions Across a Heterogeneous Distributed Research Network Facing Small and Zero Counts

Studies of the effects of medical interventions increasingly take place in distributed research settings using data from multiple clinical data sources including electronic health records and administrative claims. In such settings, privacy concerns typically prohibit sharing of individual patient data, and instead, analyses can only utilize summary statistics from the individual databases. In the specific but very common context of the Cox proportional hazards model, we show that standard meta analysis methods then lead to substantial bias when outcome counts are small. This bias derives primarily from the normal approximations that the methods utilize. Here we propose and evaluate methods that eschew normal approximations in favor of three more flexible approximations: a skew-normal, a one-dimensional grid, and a custom parametric function that mimics the behavior of the Cox likelihood function. In extensive simulation studies we demonstrate how these approximations impact bias in the context of both fixed-effects and (Bayesian) random-effects models. We then apply these approaches to three real-world studies of the comparative safety of antidepressants, each using data from four observational healthcare databases.

preprint2010arXiv

A Concept Annotation System for Clinical Records

Unstructured information comprises a valuable source of data in clinical records. For text mining in clinical records, concept extraction is the first step in finding assertions and relationships. This study presents a system developed for the annotation of medical concepts, including medical problems, tests, and treatments, mentioned in clinical records. The system combines six publicly available named entity recognition system into one framework, and uses a simple voting scheme that allows to tune precision and recall of the system to specific needs. The system provides both a web service interface and a UIMA interface which can be easily used by other systems. The system was tested in the fourth i2b2 challenge and achieved an F-score of 82.1% for the concept exact match task, a score which is among the top-ranking systems. To our knowledge, this is the first publicly available clinical record concept annotation system.