Researcher profile

Aydin Buluc

Aydin Buluc contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
6topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2022arXiv

Distributed-Memory Parallel Contig Generation for De Novo Long-Read Genome Assembly

De novo genome assembly, i.e., rebuilding the sequence of an unknown genome from redundant and erroneous short sequences, is a key but computationally intensive step in many genomics pipelines. The exponential growth of genomic data is increasing the computational demand and requires scalable, high-performance approaches. In this work, we present a novel distributed-memory algorithm that, from a string graph representation of the genome and using sparse matrices, generates the contig set, i.e., overlapping sequences that form a map representing a region of a chromosome. Using matrix abstraction, we mask branches in the string graph and compute the connected component to group genomic sequences that belong to the same linear chain (i.e., contig). Then, we perform multiway number partitioning to minimize the load imbalance in local assembly, i.e., concatenation of sequences from a given contig. Based on the assignment obtained by partitioning, we compute the induce subgraph function to redistribute sequences between processes, resulting in a set of local sparse matrices. Finally, we traverse each matrix using depth-first search to concatenate sequences. Our algorithm shows good scaling with parallel efficiency up to 80% on 128 nodes, resulting in uniform genome coverage and showing promising results in terms of assembly quality. Our contig generation algorithm localizes the assembly process to significantly reduce the amount of computation spent on this step. Our work is a step forward for efficient de novo long read assembly of large genomes in a distributed memory.

preprint2022arXiv

Randomized Algorithms for Scientific Computing (RASC)

Randomized algorithms have propelled advances in artificial intelligence and represent a foundational research area in advancing AI for Science. Future advancements in DOE Office of Science priority areas such as climate science, astrophysics, fusion, advanced materials, combustion, and quantum computing all require randomized algorithms for surmounting challenges of complexity, robustness, and scalability. This report summarizes the outcomes of that workshop, "Randomized Algorithms for Scientific Computing (RASC)," held virtually across four days in December 2020 and January 2021.

preprint2021arXiv

10 Years Later: Cloud Computing is Closing the Performance Gap

Can cloud computing infrastructures provide HPC-competitive performance for scientific applications broadly? Despite prolific related literature, this question remains open. Answers are crucial for designing future systems and democratizing high-performance computing. We present a multi-level approach to investigate the performance gap between HPC and cloud computing, isolating different variables that contribute to this gap. Our experiments are divided into (i) hardware and system microbenchmarks and (ii) user application proxies. The results show that today's high-end cloud computing can deliver HPC-competitive performance not only for computationally intensive applications but also for memory- and communication-intensive applications - at least at modest scales - thanks to the high-speed memory systems and interconnects and dedicated batch scheduling now available on some cloud platforms.

preprint2020arXiv

Reducing Communication in Graph Neural Network Training

Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the naturally sparse connectivity information of the data. GNNs represent this connectivity as sparse matrices, which have lower arithmetic intensity and thus higher communication costs compared to dense matrices, making GNNs harder to scale to high concurrencies than convolutional or fully-connected neural networks. We introduce a family of parallel algorithms for training GNNs and show that they can asymptotically reduce communication compared to previous parallel GNN training methods. We implement these algorithms, which are based on 1D, 1.5D, 2D, and 3D sparse-dense matrix multiplication, using torch.distributed on GPU-equipped clusters. Our algorithms optimize communication across the full GNN training pipeline. We train GNNs on over a hundred GPUs on multiple datasets, including a protein network with over a billion edges.

preprint2020arXiv

The Parallelism Motifs of Genomic Data Analysis

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these genomic data analysis problems require large scale computational platforms to meet both the memory and computational requirements. These applications differ from scientific simulations that dominate the workload on high end parallel systems today and place different requirements on programming support, software libraries, and parallel architectural design. For example, they involve irregular communication patterns such as asynchronous updates to shared data structures. We consider several problems in high performance genomics analysis, including alignment, profiling, clustering, and assembly for both single genomes and metagenomes. We identify some of the common computational patterns or motifs that help inform parallelization strategies and compare our motifs to some of the established lists, arguing that at least two key patterns, sorting and hashing, are missing.

preprint2018arXiv

Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance Estimation

Across a variety of scientific disciplines, sparse inverse covariance estimation is a popular tool for capturing the underlying dependency relationships in multivariate data. Unfortunately, most estimators are not scalable enough to handle the sizes of modern high-dimensional data sets (often on the order of terabytes), and assume Gaussian samples. To address these deficiencies, we introduce HP-CONCORD, a highly scalable optimization method for estimating a sparse inverse covariance matrix based on a regularized pseudolikelihood framework, without assuming Gaussianity. Our parallel proximal gradient method uses a novel communication-avoiding linear algebra algorithm and runs across a multi-node cluster with up to 1k nodes (24k cores), achieving parallel scalability on problems with up to ~819 billion parameters (1.28 million dimensions); even on a single node, HP-CONCORD demonstrates scalability, outperforming a state-of-the-art method. We also use HP-CONCORD to estimate the underlying dependency structure of the brain from fMRI data, and use the result to identify functional regions automatically. The results show good agreement with a clustering from the neuroscience literature.