Researcher profile

David B. Blumenthal

David B. Blumenthal contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2025arXiv

UnPaSt: unsupervised patient stratification by biclustering of omics data

Unsupervised patient stratification is essential for disease subtype discovery, yet, despite growing evidence of molecular heterogeneity of non-oncological diseases, popular methods are benchmarked primarily using cancers with mutually exclusive molecular subtypes well-differentiated by numerous biomarkers. Evaluating 22 unsupervised methods, including clustering and biclustering, using simulated and real transcriptomics data revealed their inefficiency in scenarios with non-mutually exclusive subtypes or subtypes discriminated only by few biomarkers. To address these limitations and advance precision medicine, we developed UnPaSt, a novel biclustering algorithm for unsupervised patient stratification based on differentially expressed biclusters. UnPaSt outperformed widely used patient stratification approaches in the de novo identification of known subtypes of breast cancer and asthma. In addition, it detected many biologically insightful patterns across bulk transcriptomics, proteomics, single-cell, spatial transcriptomics, and multi-omics datasets, enabling a more nuanced and interpretable view of high-throughput data heterogeneity than traditionally used methods.

preprint2022arXiv

Federated singular value decomposition for high dimensional data

Federated learning (FL) is emerging as a privacy-aware alternative to classical cloud-based machine learning. In FL, the sensitive data remains in data silos and only aggregated parameters are exchanged. Hospitals and research institutions which are not willing to share their data can join a federated study without breaching confidentiality. In addition to the extreme sensitivity of biomedical data, the high dimensionality poses a challenge in the context of federated genome-wide association studies (GWAS). In this article, we present a federated singular value decomposition (SVD) algorithm, suitable for the privacy-related and computational requirements of GWAS. Notably, the algorithm has a transmission cost independent of the number of samples and is only weakly dependent on the number of features, because the singular vectors associated with the samples are never exchanged and the vectors associated with the features only for a fixed number of iterations. Although motivated by GWAS, the algorithm is generically applicable for both horizontally and vertically partitioned data.

preprint2021arXiv

Upper Bounding the Graph Edit Distance Based on Rings and Machine Learning

The graph edit distance (GED) is a flexible distance measure which is widely used for inexact graph matching. Since its exact computation is NP-hard, heuristics are used in practice. A popular approach is to obtain upper bounds for GED via transformations to the linear sum assignment problem with error-correction (LSAPE). Typically, local structures and distances between them are employed for carrying out this transformation, but recently also machine learning techniques have been used. In this paper, we formally define a unifying framework LSAPE-GED for transformations from GED to LSAPE. We also introduce rings, a new kind of local structures designed for graphs where most information resides in the topology rather than in the node labels. Furthermore, we propose two new ring based heuristics RING and RING-ML, which instantiate LSAPE-GED using the traditional and the machine learning based approach for transforming GED to LSAPE, respectively. Extensive experiments show that using rings for upper bounding GED significantly improves the state of the art on datasets where most information resides in the graphs' topologies. This closes the gap between fast but rather inaccurate LSAPE based heuristics and more accurate but significantly slower GED algorithms based on local search.

preprint2020arXiv

Exploring the SARS-CoV-2 virus-host-drug interactome for drug repurposing

Coronavirus Disease-2019 (COVID-19) is an infectious disease caused by the SARS-CoV-2 virus. It was first identified in Wuhan, China, and has since spread causing a global pandemic. Various studies have been performed to understand the molecular mechanisms of viral infection for predicting drug repurposing candidates. However, such information is spread across many publications and it is very time-consuming to access, integrate, explore, and exploit. We developed CoVex, the first interactive online platform for SARS-CoV-2 and SARS-CoV-1 host interactome exploration and drug (target) identification. CoVex integrates 1) experimentally validated virus-human protein interactions, 2) human protein-protein interactions and 3) drug-target interactions. The web interface allows user-friendly visual exploration of the virus-host interactome and implements systems medicine algorithms for network-based prediction of drugs. Thus, CoVex is an important resource, not only to understand the molecular mechanisms involved in SARS-CoV-2 and SARS-CoV-1 pathogenicity, but also in clinical research for the identification and prioritization of candidate therapeutics. We apply CoVex to investigate recent hypotheses on a systems biology level and to systematically explore the molecular mechanisms driving the virus life cycle. Furthermore, we extract and discuss drug repurposing candidates involved in these mechanisms. CoVex renders COVID-19 drug research systems-medicine-ready by giving the scientific community direct access to network medicine algorithms integrating virus-host-drug interactions. It is available at https://exbio.wzw.tum.de/covex/.

preprint2020arXiv

Privacy-preserving Artificial Intelligence Techniques in Biomedicine

Artificial intelligence (AI) has been successfully applied in numerous scientific domains. In biomedicine, AI has already shown tremendous potential, e.g. in the interpretation of next-generation sequencing data and in the design of clinical decision support systems. However, training an AI model on sensitive data raises concerns about the privacy of individual participants. For example, summary statistics of a genome-wide association study can be used to determine the presence or absence of an individual in a given dataset. This considerable privacy risk has led to restrictions in accessing genomic and other biomedical data, which is detrimental for collaborative research and impedes scientific progress. Hence, there has been a substantial effort to develop AI methods that can learn from sensitive data while protecting individuals' privacy. This paper provides a structured overview of recent advances in privacy-preserving AI techniques in biomedicine. It places the most important state-of-the-art approaches within a unified taxonomy and discusses their strengths, limitations, and open problems. As the most promising direction, we suggest combining federated machine learning as a more scalable approach with other additional privacy preserving techniques. This would allow to merge the advantages to provide privacy guarantees in a distributed way for biomedical applications. Nonetheless, more research is necessary as hybrid approaches pose new challenges such as additional network or computation overhead.