Source author record

Boyana Norris

Boyana Norris appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

hep-ex Performance physics.ins-det Distributed, Parallel, and Cluster Computing Mathematical Software physics.comp-ph Programming Languages Software Engineering

Catalog footprint

What is connected

12works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2024arXiv

Guiding Effort Allocation in Open-Source Software Projects Using Bus Factor Analysis

A critical issue faced by open-source software projects is the risk of key personnel leaving the project. This risk is exacerbated in large projects that have been under development for a long time and experienced growth in their development teams. One way to quantify this risk is to measure the concentration of knowledge about the project among its developers. Formally known as the Bus Factor (BF) of a project and defined as 'the number of key developers who would need to be incapacitated to make a project unable to proceed'. Most of the proposed algorithms for BF calculation measure a developer's knowledge of a file based on the number of commits. In this work, we propose using other metrics like lines of code changes (LOCC) and cosine difference of lines of code (change-size-cos) to calculate the BF. We use these metrics for BF calculation for five open-source GitHub projects using the CST algorithm and the RIG algorithm, which is git-blame-based. Moreover, we calculate the BF on project sub-directories that have seen the most active development recently. Lastly, we compare the results of the two algorithms in accuracy, similarity in results, execution time, and trends in BF values over time.

preprint2022arXiv

Optimizing the Hit Finding Algorithm for Liquid Argon TPC Neutrino Detectors Using Parallel Architectures

Neutrinos are particles that interact rarely, so identifying them requires large detectors which produce lots of data. Processing this data with the computing power available is becoming even more difficult as the detectors increase in size to reach their physics goals. Liquid argon time projection chamber (LArTPC) neutrino experiments are expected to grow in the next decade to have 100 times more wires than in currently operating experiments, and modernization of LArTPC reconstruction code, including parallelization both at data- and instruction-level, will help to mitigate this challenge. The LArTPC hit finding algorithm is used across multiple experiments through a common software framework. In this paper we discuss a parallel implementation of this algorithm. Using a standalone setup we find speed up factors of two times from vectorization and 30--100 times from multi-threading on Intel architectures. The new version has been incorporated back into the framework so that it can be used by experiments. On a serial execution, the integrated version is about 10 times faster than the previous one and, once parallelization is enabled, further speedups comparable to the standalone program are achieved.

preprint2021arXiv

Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs

We present results from parallelizing the unpacking and clustering steps of the raw data from the silicon strip modules for reconstruction of charged particle tracks. Throughput is further improved by concurrently processing multiple events using nested OpenMP parallelism on CPU or CUDA streams on GPU. The new implementation along with earlier work in developing a parallelized and vectorized implementation of the combinatoric Kalman filter algorithm has enabled efficient global reconstruction of the entire event on modern computer architectures. We demonstrate the performance of the new implementation on Intel Xeon and NVIDIA GPU architectures.

preprint2020arXiv

Distributed-Memory Vertex-Centric Network Embedding for Large-Scale Graphs

Network embedding is an important step in many different computations based on graph data. However, existing approaches are limited to small or middle size graphs with fewer than a million edges. In practice, web or social network graphs are orders of magnitude larger, thus making most current methods impractical for very large graphs. To address this problem, we introduce a new distributed-memory parallel network embedding method based on Apache Spark and GraphX. We demonstrate the scalability of our method as well as its ability to generate meaningful embeddings for vertex classification and link prediction on both real-world and synthetic graphs.

preprint2020arXiv

Guiding Optimizations with Meliora: A Deep Walk down Memory Lane

Performance models can be very useful for understanding the behavior of applications and hence can help guide design and optimization decisions. Unfortunately, performance modeling of nontrivial computations typically requires significant expertise and human effort. Moreover, even when performed by experts, it is necessarily limited in scope, accuracy, or both. However, since models are not typically available, programmers, compilers or autotuners cannot use them easily to guide optimizations and are limited to heuristic-based methods that potentially take a lot of time to perform unnecessary transformations. We believe that streamlining model generation and making it scalable (both in terms of human effort and code size) would enable dramatic improvements in compilation techniques, as well as manual optimization and autotuning. To that end, we are building the Meliora code analysis infrastructure for machine learning-based performance model generation of arbitrary codes based on static analysis of intermediate language representations. We demonstrate good accuracy in matching known codes and show how Meliora can be used to optimize new codes though reusing optimization knowledge, either manually or in conjunction with an autotuner. When autotuning, Meliora eliminates or dramatically reduces the empirical search space, while generally achieving competitive performance.

preprint2020arXiv

Reconstruction for Liquid Argon TPC Neutrino Detectors Using Parallel Architectures

Neutrinos are particles that interact rarely, so identifying them requires large detectors which produce lots of data. Processing this data with the computing power available is becoming more difficult as the detectors increase in size to reach their physics goals. In liquid argon time projection chambers (TPCs) the charged particles from neutrino interactions produce ionization electrons which drift in an electric field towards a series of collection wires, and the signal on the wires is used to reconstruct the interaction. The MicroBooNE detector currently collecting data at Fermilab has 8000 wires, and planned future experiments like DUNE will have 100 times more, which means that the time required to reconstruct an event will scale accordingly. Modernization of liquid argon TPC reconstruction code, including vectorization, parallelization and code portability to GPUs, will help to mitigate these challenges. The liquid argon TPC hit finding algorithm within the \texttt{LArSoft}\xspace framework used across multiple experiments has been vectorized and parallelized. This increases the speed of the algorithm on the order of ten times within a standalone version on Intel architectures. This new version has been incorporated back into \texttt{LArSoft}\xspace so that it can be generally used. These methods will also be applied to other low-level reconstruction algorithms of the wire signals such as the deconvolution. The applications and performance of this modernized liquid argon TPC wire reconstruction will be presented.

preprint2020arXiv

Reconstruction of Charged Particle Tracks in Realistic Detector Geometry Using a Vectorized and Parallelized Kalman Filter Algorithm

One of the most computationally challenging problems expected for the High-Luminosity Large Hadron Collider (HL-LHC) is finding and fitting particle tracks during event reconstruction. Algorithms used at the LHC today rely on Kalman filtering, which builds physical trajectories incrementally while incorporating material effects and error estimation. Recognizing the need for faster computational throughput, we have adapted Kalman-filter-based methods for highly parallel, many-core SIMD and SIMT architectures that are now prevalent in high-performance hardware. Previously we observed significant parallel speedups, with physics performance comparable to CMS standard tracking, on Intel Xeon, Intel Xeon Phi, and (to a limited extent) NVIDIA GPUs. While early tests were based on artificial events occurring inside an idealized barrel detector, we showed subsequently that our mkFit software builds tracks successfully from complex simulated events (including detector pileup) occurring inside a geometrically accurate representation of the CMS-2017 tracker. Here, we report on advances in both the computational and physics performance of mkFit, as well as progress toward integration with CMS production software. Recently we have improved the overall efficiency of the algorithm by preserving short track candidates at a relatively early stage rather than attempting to extend them over many layers. Moreover, mkFit formerly produced an excess of duplicate tracks; these are now explicitly removed in an additional processing step. We demonstrate that with these enhancements, mkFit becomes a suitable choice for the first iteration of CMS tracking, and eventually for later iterations as well. We plan to test this capability in the CMS High Level Trigger during Run 3 of the LHC, with an ultimate goal of using it in both the CMS HLT and offline reconstruction for the HL-LHC CMS tracker.

preprint2019arXiv

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures with the CMS Detector

In the High-Luminosity Large Hadron Collider (HL-LHC), one of the most challenging computational problems is expected to be finding and fitting charged-particle tracks during event reconstruction. The methods currently in use at the LHC are based on the Kalman filter. Such methods have shown to be robust and to provide good physics performance, both in the trigger and offline. In order to improve computational performance, we explored Kalman-filter-based methods for track finding and fitting, adapted for many-core SIMD and SIMT architectures. Our adapted Kalman-filter-based software has obtained significant parallel speedups using such processors, e.g., Intel Xeon Phi, Intel Xeon SP (Scalable Processors) and (to a limited degree) NVIDIA GPUs. Recently, an effort has started towards the integration of our software into the CMS software framework, in view of its exploitation for the Run III of the LHC. Prior reports have shown that our software allows in fact for some significant improvements over the existing framework in terms of computational performance with comparable physics performance, even when applied to realistic detector configurations and event complexity. Here, we demonstrate that in such conditions physics performance can be further improved with respect to our prior reports, while retaining the improvements in computational performance, by making use of the knowledge of the detector and its geometry.

preprint2015arXiv

A Roofline Visualization Framework

The Roofline Model and its derivatives provide an intuitive representation of the best achievable performance on a given architecture. The Roofline Toolkit project is a collaboration among researchers at Argonne National Laboratory, Lawrence Berkeley National Laboratory, and the University of Oregon and consists of three main parts: hardware characterization, software characterization, and data manipulation and visualization interface. These components address the different aspects of performance data acquisition and manipulation required for performance analysis, modeling and optimization of codes on existing and emerging architectures. In this paper we introduce an initial implementation of the third component, a system for visualizing roofline charts and managing roofline performance analysis data. We discuss the implementation and rationale for the integration of the roofline visualization system into the Eclipse IDE. An overview of our continuing efforts and goals in the development of this project is provided.

preprint2014arXiv

Lighthouse: A User-Centered Web Service for Linear Algebra Software

Various fields of science and engineering rely on linear algebra for large scale data analysis, modeling and simulation, machine learning, and other applied problems. Linear algebra computations often dominate the execution time of such applications. Meanwhile, experts in these domains typically lack the training or time required to develop efficient, high-performance implementations of linear algebra algorithms. In the Lighthouse project, we enable developers with varied backgrounds to readily discover and effectively apply the best available numerical software for their problems. We have developed a search-based expert system that combines expert knowledge, machine learningbased classification of existing numerical software collections, and automated code generation and optimization. Lighthouse provides a novel software engineering environment aimed at maximizing both developer productivity and application performance for dense and sparse linear algebra computations.

preprint2013arXiv

Software Autotuning for Sustainable Performance Portability

Scientific software applications are increasingly developed by large interdiscplinary teams operating on functional modules organized around a common software framework, which is capable of integrating new functional capabilities without modifying the core of the framework. In such environment, software correctness and modularity take precedence at the expense of code performance, which is an important concern during execution on supercomputing facilities, where the allocation of core-hours is a valuable resource. To alleviate the performance problems, we propose automated performance tuning (autotuning) of software to extract the maximum performance on a given hardware platform and to enable performance portability across heterogeneous hardware platforms. The resulting code remains generic without committing to a particular software stack and yet is compile-time specializable for maximal sustained performance.

preprint2012arXiv

Reliable Generation of High-Performance Matrix Algebra

Scientific programmers often turn to vendor-tuned Basic Linear Algebra Subprograms (BLAS) to obtain portable high performance. However, many numerical algorithms require several BLAS calls in sequence, and those successive calls result in suboptimal performance. The entire sequence needs to be optimized in concert. Instead of vendor-tuned BLAS, a programmer could start with source code in Fortran or C (e.g., based on the Netlib BLAS) and use a state-of-the-art optimizing compiler. However, our experiments show that optimizing compilers often attain only one-quarter the performance of hand-optimized code. In this paper we present a domain-specific compiler for matrix algebra, the Build to Order BLAS (BTO), that reliably achieves high performance using a scalable search algorithm for choosing the best combination of loop fusion, array contraction, and multithreading for data parallelism. The BTO compiler generates code that is between 16% slower and 39% faster than hand-optimized code.

Boyana Norris

What is connected

Connect this record

See the researcher in context

Building this map preview

12 published item(s)

Guiding Effort Allocation in Open-Source Software Projects Using Bus Factor Analysis

Optimizing the Hit Finding Algorithm for Liquid Argon TPC Neutrino Detectors Using Parallel Architectures

Parallelizing the Unpacking and Clustering of Detector Data for Reconstruction of Charged Particle Tracks on Multi-core CPUs and Many-core GPUs

Distributed-Memory Vertex-Centric Network Embedding for Large-Scale Graphs

Guiding Optimizations with Meliora: A Deep Walk down Memory Lane

Reconstruction for Liquid Argon TPC Neutrino Detectors Using Parallel Architectures

Reconstruction of Charged Particle Tracks in Realistic Detector Geometry Using a Vectorized and Parallelized Kalman Filter Algorithm

Parallelized Kalman-Filter-Based Reconstruction of Particle Tracks on Many-Core Architectures with the CMS Detector

A Roofline Visualization Framework

Lighthouse: A User-Centered Web Service for Linear Algebra Software

Software Autotuning for Sustainable Performance Portability

Reliable Generation of High-Performance Matrix Algebra