Researcher profile

Akihiko Nishimura

Akihiko Nishimura contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
7works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

7 published item(s)

preprint2022arXiv

Accelerating Bayesian inference of dependency between complex biological traits

Inferring dependencies between complex biological traits while accounting for evolutionary relationships between specimens is of great scientific interest yet remains infeasible when trait and specimen counts grow large. The state-of-the-art approach uses a phylogenetic multivariate probit model to accommodate binary and continuous traits via a latent variable framework, and utilizes an efficient bouncy particle sampler (BPS) to tackle the computational bottleneck -- integrating many latent variables from a high-dimensional truncated normal distribution. This approach breaks down as the number of specimens grows and fails to reliably characterize conditional dependencies between traits. Here, we propose an inference pipeline for phylogenetic probit models that greatly outperforms BPS. The novelty lies in 1) a combination of the recent Zigzag Hamiltonian Monte Carlo (Zigzag-HMC) with linear-time gradient evaluations and 2) a joint sampling scheme for highly correlated latent variables and correlation matrix elements. In an application exploring HIV-1 evolution from 535 viruses, the inference requires joint sampling from an 11,235-dimensional truncated normal and a 24-dimensional covariance matrix. Our method yields a 5-fold speedup compared to BPS and makes it possible to learn partial correlations between candidate viral mutations and virulence. Computational speedup now enables us to tackle even larger problems: we study the evolution of influenza H1N1 glycosylations on around 900 viruses. For broader applicability, we extend the phylogenetic probit model to incorporate categorical traits, and demonstrate its use to study Aquilegia flower and pollinator co-evolution.

preprint2022arXiv

Adjusting for both sequential testing and systematic error in safety surveillance using observational data: Empirical calibration and MaxSPRT

Post-approval safety surveillance of medical products using observational healthcare data can help identify safety issues beyond those found in pre-approval trials. When testing sequentially as data accrue, maximum sequential probability ratio testing (MaxSPRT) is a common approach to maintaining nominal type 1 error. However, the true type 1 error may still deviate from the specified one because of systematic error due to the observational nature of the analysis. This systematic error may persist even after controlling for known confounders. Here we propose to address this issue by combing MaxSPRT with empirical calibration. In empirical calibration, we assume uncertainty about the systematic error in our analysis, the source of uncertainty commonly overlooked in practice. We infer a probability distribution of systematic error by relying on a large set of negative controls: exposure-outcome where no causal effect is believed to exist. Integrating this distribution into our test statistics has previously been shown to restore type 1 error to nominal. Here we show how we can calibrate the critical value central to MaxSPRT. We evaluate this novel approach using simulations and real electronic health records, using H1N1 vaccinations during the 2009-2010 season as an example. Results show that combining empirical calibration with MaxSPRT restores nominal type 1 error. In our real-world example, adjusting for systematic error using empirical calibration has a larger impact than, and hence is just as essential as, adjusting for sequential testing using MaxSPRT. We recommend performing both, using the method described here.

preprint2022arXiv

Computational Statistics and Data Science in the Twenty-first Century

Data science has arrived, and computational statistics is its engine. As the scale and complexity of scientific and industrial data grow, the discipline of computational statistics assumes an increasingly central role among the statistical sciences. An explosion in the range of real-world applications means the development of more and more specialized computational methods, but five Core Challenges remain. We provide a high-level introduction to computational statistics by focusing on its central challenges, present recent model-specific advances and preach the ever-increasing role of non-sequential computational paradigms such as multi-core, many-core and quantum computing. Data science is bringing major changes to computational statistics, and these changes will shape the trajectory of the discipline in the 21st century.

preprint2022arXiv

Learning and Predicting from Dynamic Models for COVID-19 Patient Monitoring

COVID-19 has challenged health systems to learn how to learn. This paper describes the context, methods and challenges for learning to improve COVID-19 care at one academic health center. Challenges to learning include: (1) choosing a right clinical target; (2) designing methods for accurate predictions by borrowing strength from prior patients' experiences; (3) communicating the methodology to clinicians so they understand and trust it; (4) communicating the predictions to the patient at the moment of clinical decision; and (5) continuously evaluating and revising the methods so they adapt to changing patients and clinical demands. To illustrate these challenges, this paper contrasts two statistical modeling approaches - prospective longitudinal models in common use and retrospective analogues complementary in the COVID-19 context - for predicting future biomarker trajectories and major clinical events. The methods are applied to and validated on a cohort of 1,678 patients who were hospitalized with COVID-19 during the early months of the pandemic. We emphasize graphical tools to promote physician learning and inform clinical decision making.

preprint2022arXiv

Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large $n$ & large $p$" Bayesian sparse regression

In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of $10^5$ ~ $10^6$ and of $10^4$ ~ $10^5$. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian methods based on shrinkage priors. In the "large n & large p" setting, however, posterior computation encounters a major bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix $Φ$ is expensive to compute and factorize. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $Φβ= b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by $Φ$; this involves no explicit factorization or calculation of $Φ$ itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day.

preprint2019arXiv

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article, we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

preprint2019arXiv

Recycling intermediate steps to improve Hamiltonian Monte Carlo

Hamiltonian Monte Carlo (HMC) and related algorithms have become routinely used in Bayesian computation. In this article, we present a simple and provably accurate method to improve the efficiency of HMC and related algorithms with essentially no extra computational cost. This is achieved by {recycling the intermediate states along simulated trajectories of Hamiltonian dynamics. Standard algorithms use only the end points of trajectories, wastefully discarding all the intermediate states. Compared to the alternative methods for utilizing the intermediate states, our algorithm is simpler to apply in practice and requires little programming effort beyond the usual implementations of HMC and related algorithms. Our algorithm applies straightforwardly to the no-U-turn sampler, arguably the most popular variant of HMC. Through a variety of experiments, we demonstrate that our recycling algorithm yields substantial computational efficiency gains.