Source author record

David Swanson

David Swanson appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Distributed, Parallel, and Cluster Computing Methodology Quantitative Methods

Catalog footprint

What is connected

3works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Identifying expanding TCR clonotypes with a longitudinal Bayesian mixture model and their associations with cancer patient prognosis, metastasis-directed therapy, and VJ gene enrichment

Examination of T-cell receptor (TCR) clonality has become a way of understanding immunologic response to cancer and its interventions in recent years. An aspect of these analyses is determining which receptors expand or contract statistically significantly as a function of an exogenous perturbation such as therapeutic intervention. We characterize the commonly used Fisher's exact test approach for such analyses and propose an alternative formulation that does not necessitate pairwise, within-patient comparisons. We develop this flexible Bayesian longitudinal mixture model that accommodates variable length patient followup and handles missingness where present, not omitting data in estimation because of structural practicalities. Once clones are partitioned by the model into dynamic (expanding or contracting) and static categories, one can associate their counts or other characteristics with disease state, interventions, baseline biomarkers, and patient prognosis. We apply these developments to a cohort of prostate cancer patients who underwent randomized metastasis-directed therapy or not. Our analyses reveal a significant increase in clonal expansions among MDT patients and their association with later progressions both independent and within strata of MDT. Analysis of receptor motifs and VJ gene enrichment combinations using a high-dimensional penalized log-linear model we develop also suggests distinct biological characteristics of expanding clones, with and without inducement by MDT.

preprint2020arXiv

Exploring Erasure Coding Techniques for High Availability of Intermediate Data

Scientific computing workflows generate enormous distributed data that is short-lived, yet critical for job completion time. This class of data is called intermediate data. A common way to achieve high data availability is to replicate data. However, an increasing scale of intermediate data generated in modern scientific applications demands new storage techniques to improve storage efficiency. Erasure Codes, as an alternative, can use less storage space while maintaining similar data availability. In this paper, we adopt erasure codes for storing intermediate data and compare its performance with replication. We also use the metric of Mean-Time-To-Data-Loss (MTTDL) to estimate the lifetime of intermediate data. We propose an algorithm to proactively relocate data redundancy from vulnerable machines to reliable ones to improve data availability with some extra network overhead. Furthermore, we propose an algorithm to assign redundancy units of data physically close to each other on the network to reduce the network bandwidth for reconstructing data when it is being accessed.

preprint2020arXiv

Trua: Efficient Task Replication for Flexible User-defined Availability in Scientific Grids

Failure is inevitable in scientific computing. As scientific applications and facilities increase their scales over the last decades, finding the root cause of a failure can be very complex or at times nearly impossible. Different scientific computing customers have varying availability demands as well as a diverse willingness to pay for availability. In contrast to existing solutions that try to provide higher and higher availability in scientific grids, we propose a model called Task Replication for User-defined Availability (Trua). Trua provides flexible, user-defined, availability in scientific grids, allowing customers to express their desire for availability to computational providers. Trua differs from existing task replication approaches in two folds. First, it relies on the historic failure information collected from the virtual layer of the scientific grids. The reliability model for the failures can be represented with a bimodal Johnson distribution which is different from any existing distributions. Second, it adopts an anomaly detector to filter out anomalous failures; it additionally adopts novel selection algorithms to mitigate the effects of temporary and spatial correlations of the failures without knowing the root cause of the failures. We apply the Trua on real-world traces collected from the Open Science Grid (OSG). Our results show that the Trua can successfully meet user-defined availability demands.

David Swanson

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Identifying expanding TCR clonotypes with a longitudinal Bayesian mixture model and their associations with cancer patient prognosis, metastasis-directed therapy, and VJ gene enrichment

Exploring Erasure Coding Techniques for High Availability of Intermediate Data

Trua: Efficient Task Replication for Flexible User-defined Availability in Scientific Grids