Researcher profile

Yuan Ji

Yuan Ji contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2026arXiv

Precision Dose-Finding Design for Phase I Oncology Trials by Integrating Pharmacology Data

Phase I oncology trials aim to identify a safe dose - often the maximum tolerated dose (MTD) - for subsequent studies. Conventional designs focus on population-level toxicity modeling, with recent attention on leveraging pharmacokinetic (PK) data to improve dose selection. We propose the Precision Dose-Finding (PDF) design, a novel Bayesian phase I framework that integrates individual patient PK profiles into the dose-finding process. By incorporating patient-specific PK parameters (such as volume of distribution and elimination rate), PDF models toxicity risk at the individual level, in contrast to traditional methods that ignore inter-patient variability. The trial is structured in two stages: an initial training stage to update model parameters using cohort-based dose escalation, and a subsequent test stage in which doses for new patients are chosen based on each patient's own PK-predicted toxicity probability. This two-stage approach enables truly personalized dose assignment while maintaining rigorous safety oversight. Extensive simulation studies demonstrate the feasibility of PDF and suggest that it provides improved safety and dosing precision relative to the continual reassessment method (CRM). The PDF design thus offers a refined dose-finding strategy that tailors the MTD to individual patients, aligning phase I trials with the ideals of precision medicine.

preprint2026arXiv

Semi-Parametric Bayesian Additive Regression Trees for Risk Prediction with High-Dimensional Epigenetic Signatures and Low-Dimensional Covariates

In the era of precision medicine, genome-wide epigenetic modifications offer rich data that could inform risk prediction. However, these data are high-dimensional and exhibit complex dependence structures, which makes it difficult to jointly model them with low-dimensional covariates when the goal is to obtain interpretable effect estimates for covariate adjustment. Standard Bayesian additive regression trees (BART) provide strong predictive performance but treat all predictors uniformly within the tree ensemble, obscuring the contributions of significant covariates and complicating variable selection in high-dimensional settings. We propose a semi-parametric BART model (spBART) that addresses this limitation by modeling low-dimensional covariates through a parametric component with interpretable coefficients, while capturing complex nonlinear associations among high-dimensional predictors through the tree ensemble. To perform stable variable selection, we develop a cross-validation-based procedure that aggregates posterior inclusion probabilities across folds and applies Bayesian false discovery rate control. We apply the proposed method to a pooled case--control analysis of high-dimensional genome-wide 5-hydroxymethylcytosine profiles derived from circulating cell-free DNA in two multiple myeloma studies ($N = 869$). The approach identifies a parsimonious set of candidate loci and achieves strong out-of-sample discrimination (AUC $= 0.96$) in a held-out validation set. Overall, spBART provides a unified framework for combining interpretable covariate inference with flexible modeling and variable selection in high-dimensional biomedical studies.

preprint2023arXiv

A Class of Dependent Random Distributions Based on Atom Skipping

We propose the Plaid Atoms Model (PAM), a novel Bayesian nonparametric model for grouped data. Founded on an idea of `atom skipping', PAM is part of a well-established category of models that generate dependent random distributions and clusters across multiple groups. Atom skipping referrs to stochastically assigning 0 weights to atoms in an infinite mixture. Deploying atom skipping across groups, PAM produces a dependent clustering pattern with overlapping and non-overlapping clusters across groups. As a result, interpretable posterior inference is possible such as reporting the posterior probability of a cluster being exclusive to a single group or shared among a subset of groups. We discuss the theoretical properties of the proposed and related models. Minor extensions of the proposed model for multivariate or count data are presented. Simulation studies and applications using real-world datasets illustrate the performance of the new models with comparison to existing models.

preprint2021arXiv

Lessons Learned from the Bayesian Design and Analysis for the BNT162b2 COVID-19 Vaccine Phase 3 Trial

The phase III BNT162b2 mRNA COVID-19 vaccine trial is based on a Bayesian design and analysis, and the main evidence of vaccine efficacy is presented in Bayesian statistics. Confusion and mistakes are produced in the presentation of the Bayesian results. Some key statistics, such as Bayesian credible intervals, are mislabeled and stated as confidence intervals. Posterior probabilities of the vaccine efficacy are not reported as the main results. We illustrate the main differences in the reporting of Bayesian analysis results for a clinical trial and provide four recommendations. We argue that statistical evidence from a Bayesian trial, when presented properly, is easier to interpret and directly addresses the main clinical questions, thereby better supporting regulatory decision making. We also recommend using abbreviation "BI" to represent Bayesian credible intervals as a differentiation to "CI" which stands for confidence interval.

preprint2020arXiv

Consensus Monte Carlo for Random Subsets using Shared Anchors

We present a consensus Monte Carlo algorithm that scales existing Bayesian nonparametric models for clustering and feature allocation to big data. The algorithm is valid for any prior on random subsets such as partitions and latent feature allocation, under essentially any sampling model. Motivated by three case studies, we focus on clustering induced by a Dirichlet process mixture sampling model, inference under an Indian buffet process prior with a binomial sampling model, and with a categorical sampling model. We assess the proposed algorithm with simulation studies and show results for inference with three datasets: an MNIST image dataset, a dataset of pancreatic cancer mutations, and a large set of electronic health records (EHR). Supplementary materials for this article are available online.

preprint2020arXiv

Semiparametric Bayesian Inference for the Transmission Dynamics of COVID-19 with a State-Space Model

The outbreak of Coronavirus Disease 2019 (COVID-19) is an ongoing pandemic affecting over 200 countries and regions. Inference about the transmission dynamics of COVID-19 can provide important insights into the speed of disease spread and the effects of mitigation policies. We develop a novel Bayesian approach to such inference based on a probabilistic compartmental model using data of daily confirmed COVID-19 cases. In particular, we consider a probabilistic extension of the classical susceptible-infectious-recovered model, which takes into account undocumented infections and allows the epidemiological parameters to vary over time. We estimate the disease transmission rate via a Gaussian process prior, which captures nonlinear changes over time without the need of specific parametric assumptions. We utilize a parallel-tempering Markov chain Monte Carlo algorithm to efficiently sample from the highly correlated posterior space. Predictions for future observations are done by sampling from their posterior predictive distributions. Performance of the proposed approach is assessed using simulated datasets. Finally, our approach is applied to COVID-19 data from four states of the United States: Washington, New York, California, and Illinois. An R package BaySIR is made available at https://github.com/tianjianzhou/BaySIR for the public to conduct independent analysis or reproduce the results in this paper.

preprint2019arXiv

PoD-TPI: Probability-of-Decision Toxicity Probability Interval Design to Accelerate Phase I Trials

Cohort-based enrollment can slow down dose-finding trials since the outcomes of the previous cohort must be fully evaluated before the next cohort can be enrolled. This results in frequent suspension of patient enrollment. The issue is exacerbated in recent immune-oncology trials where toxicity outcomes can take a long time to observe. We propose a novel phase I design, the probability-of-decision toxicity probability interval (PoD-TPI) design, to accelerate phase I trials. PoD-TPI enables dose assignment in real-time in the presence of pending toxicity outcomes. With uncertain outcomes, the dose assignment decisions are treated as a random variable, and we calculate the posterior distribution of the decisions. The posterior distribution reflects the variability in the pending outcomes and allows a direct and intuitive evaluation of the confidence of all possible decisions. Optimal decisions are calculated based on 0-1 loss, and extra safety rules are constructed to enforce sufficient protection from exposing patients to risky doses. A new and useful feature of PoD-TPI is that it allows investigators and regulators to balance the trade-off between enrollment speed and making risky decisions by tuning a pair of intuitive design parameters. Through numerical studies, we evaluate the operating characteristics of PoD-TPI and demonstrate that PoD-TPI shortens trial duration and maintains trial safety and efficiency compared to existing time-to-event designs.