Source author record

Paul S. Albert

Paul S. Albert appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Applications Methodology math.PR math.ST stat.OT Statistics Theory

Catalog footprint

What is connected

8works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Is Group Testing Ready for Prime-time in Disease Identification?

Large scale disease screening is a complicated process in which high costs must be balanced against pressing public health needs. When the goal is screening for infectious disease, one approach is group testing in which samples are initially tested in pools and individual samples are retested only if the initial pooled test was positive. Intuitively, if the prevalence of infection is small, this could result in a large reduction of the total number of tests required. Despite this, the use of group testing in medical studies has been limited, largely due to skepticism about the impact of pooling on the accuracy of a given assay. While there is a large body of research addressing the issue of testing errors in group testing studies, it is customary to assume that the misclassification parameters are known from an external population and/or that the values do not change with the group size. Both of these assumptions are highly questionable for many medical practitioners considering group testing in their study design. In this article, we explore how the failure of these assumptions might impact the efficacy of a group testing design and, consequently, whether group testing is currently feasible for medical screening. Specifically, we look at how incorrect assumptions about the sensitivity function at the design stage can lead to poor estimation of a procedure's overall sensitivity and expected number of tests. Furthermore, if a validation study is used to estimate the pooled misclassification parameters of a given assay, we show that the sample sizes required are so large as to be prohibitive in all but the largest screening programs

preprint2021arXiv

Nested Group Testing Procedures for Screening

This article reviews a class of adaptive group testing procedures that operate under a probabilistic model assumption as follows. Consider a set of $N$ items, where item $i$ has the probability $p$ ($p_i$ in the generalized group testing) to be defective, and the probability $1-p$ to be non-defective independent from the other items. A group test applied to any subset of size $n$ is a binary test with two possible outcomes, positive or negative. The outcome is negative if all $n$ items are non-defective, whereas the outcome is positive if at least one item among the $n$ items is defective. The goal is complete identification of all $N$ items with the minimum expected number of tests.

preprint2020arXiv

An optimal design for hierarchical generalized group testing

Choosing an optimal strategy for hierarchical group testing is an important problem for practitioners who are interested in disease screening with limited resources. For example, when screening for infectious diseases in large populations, it is important to use algorithms that minimize the cost of potentially expensive assays. Black et al. (2015) described this as an intractable problem unless the number of individuals to screen is small. They proposed an approximation to an optimal strategy that is difficult to implement for large population sizes. In this article, we develop an optimal design with respect to the expected total number of tests that can be obtained using a novel dynamic programming algorithm. We show that this algorithm is substantially more efficient than the approach proposed by Black et al. (2015). In addition, we compare the two designs for imperfect tests. R code is provided for the practitioner.

preprint2015arXiv

A two-state mixed hidden Markov model for risky teenage driving behavior

This paper proposes a joint model for longitudinal binary and count outcomes. We apply the model to a unique longitudinal study of teen driving where risky driving behavior and the occurrence of crashes or near crashes are measured prospectively over the first 18 months of licensure. Of scientific interest is relating the two processes and predicting crash and near crash outcomes. We propose a two-state mixed hidden Markov model whereby the hidden state characterizes the mean for the joint longitudinal crash/near crash outcomes and elevated g-force events which are a proxy for risky driving. Heterogeneity is introduced in both the conditional model for the count outcomes and the hidden process using a shared random effect. An estimation procedure is presented using the forward-backward algorithm along with adaptive Gaussian quadrature to perform numerical integration. The estimation procedure readily yields hidden state probabilities as well as providing for a broad class of predictors.

preprint2015arXiv

Mixed model and estimating equation approaches for zero inflation in clustered binary response data with application to a dating violence study

The NEXT Generation Health study investigates the dating violence of adolescents using a survey questionnaire. Each student is asked to affirm or deny multiple instances of violence in his/her dating relationship. There is, however, evidence suggesting that students not in a relationship responded to the survey, resulting in excessive zeros in the responses. This paper proposes likelihood-based and estimating equation approaches to analyze the zero-inflated clustered binary response data. We adopt a mixed model method to account for the cluster effect, and the model parameters are estimated using a maximum-likelihood (ML) approach that requires a Gaussian-Hermite quadrature (GHQ) approximation for implementation. Since an incorrect assumption on the random effects distribution may bias the results, we construct generalized estimating equations (GEE) that do not require the correct specification of within-cluster correlation. In a series of simulation studies, we examine the performance of ML and GEE methods in terms of their bias, efficiency and robustness. We illustrate the importance of properly accounting for this zero inflation by reanalyzing the NEXT data where this issue has previously been ignored.

preprint2014arXiv

A note on the minimax solution for the two-stage group testing problem

Group testing is an active area of current research and has important applications in medicine, biotechnology, genetics, and product testing. There have been recent advances in design and estimation, but the simple Dorfman procedure introduced by R. Dorfman in 1943 is widely used in practice. In many practical situations the exact value of the probability p of being affected is unknown. We present both minimax and Bayesian solutions for the group size problem when p is unknown. For unbounded p we show that the minimax solution for group size is 8, while using a Bayesian strategy with Jeffreys prior results in a group size of 13. We also present solutions when p is bounded from above. For the practitioner we propose strong justification for using a group size of between eight to thirteen when a constraint on p is not incorporated and provide useable code for computing the minimax group size under a constrained p.

preprint2012arXiv

Marginal analysis of longitudinal count data in long sequences: Methods and applications to a driving study

Most of the available methods for longitudinal data analysis are designed and validated for the situation where the number of subjects is large and the number of observations per subject is relatively small. Motivated by the Naturalistic Teenage Driving Study (NTDS), which represents the exact opposite situation, we examine standard and propose new methodology for marginal analysis of longitudinal count data in a small number of very long sequences. We consider standard methods based on generalized estimating equations, under working independence or an appropriate correlation structure, and find them unsatisfactory for dealing with time-dependent covariates when the counts are low. For this situation, we explore a within-cluster resampling (WCR) approach that involves repeated analyses of random subsamples with a final analysis that synthesizes results across subsamples. This leads to a novel WCR method which operates on separated blocks within subjects and which performs better than all of the previously considered methods. The methods are applied to the NTDS data and evaluated in simulation experiments mimicking the NTDS.

preprint2010arXiv

An approach for jointly modeling multivariate longitudinal measurements and discrete time-to-event data

In many medical studies, patients are followed longitudinally and interest is on assessing the relationship between longitudinal measurements and time to an event. Recently, various authors have proposed joint modeling approaches for longitudinal and time-to-event data for a single longitudinal variable. These joint modeling approaches become intractable with even a few longitudinal variables. In this paper we propose a regression calibration approach for jointly modeling multiple longitudinal measurements and discrete time-to-event data. Ideally, a two-stage modeling approach could be applied in which the multiple longitudinal measurements are modeled in the first stage and the longitudinal model is related to the time-to-event data in the second stage. Biased parameter estimation due to informative dropout makes this direct two-stage modeling approach problematic. We propose a regression calibration approach which appropriately accounts for informative dropout. We approximate the conditional distribution of the multiple longitudinal measurements given the event time by modeling all pairwise combinations of the longitudinal measurements using a bivariate linear mixed model which conditions on the event time. Complete data are then simulated based on estimates from these pairwise conditional models, and regression calibration is used to estimate the relationship between longitudinal data and time-to-event data using the complete data. We show that this approach performs well in estimating the relationship between multivariate longitudinal measurements and the time-to-event data and in estimating the parameters of the multiple longitudinal process subject to informative dropout. We illustrate this methodology with simulations and with an analysis of primary biliary cirrhosis (PBC) data.