Source author record

Brandon Barker

Brandon Barker appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Molecular Networks Distributed, Parallel, and Cluster Computing Populations and Evolution cs.CY Software Engineering

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2020arXiv

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

The increasing availability of cloud computing services for science has changed the way scientific code can be developed, deployed, and run. Many modern scientific workflows are capable of running on cloud computing resources. Consequently, there is an increasing interest in the scientific computing community in methods, tools, and implementations that enable moving an application to the cloud and simplifying the process, and decreasing the time to meaningful scientific results. In this paper, we have applied the concepts of containerization for portability and multi-cloud automated deployment with industry-standard tools to three scientific workflows. We show how our implementations provide reduced complexity to portability of both the applications themselves, and their deployment across private and public clouds. Each application has been packaged in a Docker container with its dependencies and necessary environment setup for production runs. Terraform and Ansible have been used to automate the provisioning of compute resources and the deployment of each scientific application in a Multi-VM cluster. Each application has been deployed on the AWS and Aristotle Cloud Federation platforms. Variation in data management constraints, Multi-VM MPI communication, and embarrassingly parallel instance deployments were all explored and reported on. We thus present a sample of scientific workflows that can be simplified using the tools and our proposed implementation to deploy and run in a variety of cloud environments.

preprint2020arXiv

Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing

Container technologies such as Docker have become a crucial component of many software industry practices especially those pertaining to reproducibility and portability. The containerization philosophy has influenced the scientific computing community, which has begun to adopt - and even develop - container technologies (such as Singularity). Leveraging containers for scientific software often poses challenges distinct from those encountered in industry, and requires different methodologies. This is especially true for HPC. With an increasing number of options for HPC in the cloud (including NSF-funded cloud projects), there is strong motivation to seek solutions that provide flexibility to develop and deploy scientific software on a variety of computational infrastructures in a portable and reproducible way. The flexibility offered by cloud services enables virtual HPC clusters that scale on-demand, and the Cyberinfrastructure Resource Integration team in the XSEDE project has developed a set of tools which provides scalable infrastructure in the cloud. We now present a solution which uses the Nix package manager in an MPI-capable Docker container that is converted to Singularity. It provides consistent installations, dependencies, and environments in each image that are reproducible and portable across scientific computing infrastructures. We demonstrate the utility of these containers with cluster benchmark runs in a self-scaling virtual cluster using the Slurm scheduler deployed in the Jetstream and Aristotle Red Cloud OpenStack clouds. We conclude this technique is useful as a template for scientific software application containers to be used in the XSEDE compute environment, other Singularity HPC environments, and cloud computing environments.

preprint2020arXiv

Using Containers to Create More Interactive Online Training and Education Materials

Containers are excellent hands-on learning environments for computing topics because they are customizable, portable, and reproducible. The Cornell University Center for Advanced Computing has developed the Cornell Virtual Workshop in high performance computing topics for many years, and we have always sought to make the materials as rich and interactive as possible. Toward the goal of building a more hands-on experimental learning experience directly into web-based online training environments, we developed the Cornell Container Runner Service, which allows online content developers to build container-based interactive edit and run commands directly into their web pages. Using containers along with CCRS has the potential to increase learner engagement and outcomes.

preprint2015arXiv

A robust and efficient method for estimating enzyme complex abundance and metabolic flux from expression data

A major theme in constraint-based modeling is unifying experimental data, such as biochemical information about the reactions that can occur in a system or the composition and localization of enzyme complexes, with highthroughput data including expression data, metabolomics, or DNA sequencing. The desired result is to increase predictive capability resulting in improved understanding of metabolism. The approach typically employed when only gene (or protein) intensities are available is the creation of tissue-specific models, which reduces the available reactions in an organism model, and does not provide an objective function for the estimation of fluxes, which is an important limitation in many modeling applications. We develop a method, flux assignment with LAD (least absolute deviation) convex objectives and normalization (FALCON), that employs metabolic network reconstructions along with expression data to estimate fluxes. In order to use such a method, accurate measures of enzyme complex abundance are needed, so we first present a new algorithm that addresses quantification of complex abundance. Our extensions to prior techniques include the capability to work with large models and significantly improved run-time performance even for smaller models, an improved analysis of enzyme complex formation logic, the ability to handle very large enzyme complex rules that may incorporate multiple isoforms, and depending on the model constraints, either maintained or significantly improved correlation with experimentally measured fluxes. FALCON has been implemented in MATLAB and ATS, and can be downloaded from: https://github.com/bbarker/FALCON. ATS is not required to compile the software, as intermediate C source code is available, and binaries are provided for Linux x86-64 systems. FALCON requires use of the COBRA Toolbox, also implemented in MATLAB.

preprint2014arXiv

Dynamic epistasis for different alleles of the same gene

Epistasis refers to the phenomenon in which phenotypic consequences caused by mutation of one gene depend on one or more mutations at another gene. Epistasis is critical for understanding many genetic and evolutionary processes, including pathway organization, evolution of sexual reproduction, mutational load, ploidy, genomic complexity, speciation, and the origin of life. Nevertheless, current understandings for the genome-wide distribution of epistasis are mostly inferred from interactions among one mutant type per gene, whereas how epistatic interaction partners change dynamically for different mutant alleles of the same gene is largely unknown. Here we address this issue by combining predictions from flux balance analysis and data from a recently published high-throughput experiment. Our results show that different alleles can epistatically interact with very different gene sets. Furthermore, between two random mutant alleles of the same gene, the chance for the allele with more severe mutational consequence to develop a higher percentage of negative epistasis than the other allele is 50-70% in eukaryotic organisms, but only 20-30% in bacteria and archaea. We developed a population genetics model that predicts that the observed distribution for the sign of epistasis can speed up the process of purging deleterious mutations in eukaryotic organisms. Our results indicate that epistasis among genes can be dynamically rewired at the genome level, and call on future efforts to revisit theories that can integrate epistatic dynamics among genes in biological systems.

preprint2014arXiv

Dynamic Epistasis under Varying Environmental Perturbations

Epistasis describes the phenomenon that mutations at different loci do not have independent effects with regard to certain phenotypes. Understanding the global epistatic landscape is vital for many genetic and evolutionary theories. Current knowledge for epistatic dynamics under multiple conditions is limited by the technological difficulties in experimentally screening epistatic relations among genes. We explored this issue by applying flux balance analysis to simulate epistatic landscapes under various environmental perturbations. Specifically, we looked at gene-gene epistatic interactions, where the mutations were assumed to occur in different genes. We predicted that epistasis tends to become more positive from glucose-abundant to nutrient-limiting conditions, indicating that selection might be less effective in removing deleterious mutations in the latter. We also observed a stable core of epistatic interactions in all tested conditions, as well as many epistatic interactions unique to each condition. Interestingly, genes in the stable epistatic interaction network are directly linked to most other genes whereas genes with condition-specific epistasis form a scale-free network. Furthermore, genes with stable epistasis tend to have similar evolutionary rates, whereas this co-evolving relationship does not hold for genes with condition-specific epistasis. Our findings provide a novel genome-wide picture about epistatic dynamics under environmental perturbations.

Brandon Barker

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Reproducible and Portable Workflows for Scientific Computing and HPC in the Cloud

Self-Scaling Clusters and Reproducible Containers to Enable Scientific Computing

Using Containers to Create More Interactive Online Training and Education Materials

A robust and efficient method for estimating enzyme complex abundance and metabolic flux from expression data

Dynamic epistasis for different alleles of the same gene

Dynamic Epistasis under Varying Environmental Perturbations