Source author record

Michel A. Kinsy

Michel A. Kinsy appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Cryptography and Security Distributed, Parallel, and Cluster Computing Hardware Architecture Machine Learning Performance

Catalog footprint

What is connected

4works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

A Taxonomy of Error Sources in HPC I/O Machine Learning Models

I/O efficiency is crucial to productivity in scientific computing, but the increasing complexity of the system and the applications makes it difficult for practitioners to understand and optimize I/O behavior at scale. Data-driven machine learning-based I/O throughput models offer a solution: they can be used to identify bottlenecks, automate I/O tuning, or optimize job scheduling with minimal human intervention. Unfortunately, current state-of-the-art I/O models are not robust enough for production use and underperform after being deployed. We analyze multiple years of application, scheduler, and storage system logs on two leadership-class HPC platforms to understand why I/O models underperform in practice. We propose a taxonomy consisting of five categories of I/O modeling errors: poor application and system modeling, inadequate dataset coverage, I/O contention, and I/O noise. We develop litmus tests to quantify each category, allowing researchers to narrow down failure modes, enhance I/O throughput models, and improve future generations of HPC logging and analysis tools.

preprint2022arXiv

Zeno: A Scalable Capability-Based Secure Architecture

Despite the numerous efforts of security researchers, memory vulnerabilities remain a top issue for modern computing systems. Capability-based solutions aim to solve whole classes of memory vulnerabilities at the hardware level by encoding access permissions with each memory reference. While some capability systems have seen commercial adoption, little work has been done to apply a capability model to datacenter-scale systems. Cloud and high-performance computing often require programs to share memory across many compute nodes. This presents a challenge for existing capability models, as capabilities must be enforceable across multiple nodes. Each node must agree on what access permissions a capability has and overheads of remote memory access must remain manageable. To address these challenges, we introduce Zeno, a new capability-based architecture. Zeno supports a Namespace-based capability model to support globally shareable capabilities in a large-scale, multi-node system. In this work, we describe the Zeno architecture, define Zeno's security properties, evaluate the scalability of Zeno as a large-scale capability architecture, and measure the hardware overhead with an FPGA implementation.

preprint2020arXiv

Fast Arithmetic Hardware Library For RLWE-Based Homomorphic Encryption

In this work, we propose an open-source, first-of-its-kind, arithmetic hardware library with a focus on accelerating the arithmetic operations involved in Ring Learning with Error (RLWE)-based somewhat homomorphic encryption (SHE). We design and implement a hardware accelerator consisting of submodules like Residue Number System (RNS), Chinese Remainder Theorem (CRT), NTT-based polynomial multiplication, modulo inverse, modulo reduction, and all the other polynomial and scalar operations involved in SHE. For all of these operations, wherever possible, we include a hardware-cost efficient serial and a fast parallel implementation in the library. A modular and parameterized design approach helps in easy customization and also provides flexibility to extend these operations for use in most homomorphic encryption applications that fit well into emerging FPGA-equipped cloud architectures. Using the submodules from the library, we prototype a hardware accelerator on FPGA. The evaluation of this hardware accelerator shows a speed up of approximately 4200x and 2950x to evaluate a homomorphic multiplication and addition respectively when compared to an existing software implementation.

preprint2020arXiv

NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks

Long training times of deep neural networks are a bottleneck in machine learning research. The major impediment to fast training is the quadratic growth of both memory and compute requirements of dense and convolutional layers with respect to their information bandwidth. Recently, training `a priori' sparse networks has been proposed as a method for allowing layers to retain high information bandwidth, while keeping memory and compute low. However, the choice of which sparse topology should be used in these networks is unclear. In this work, we provide a theoretical foundation for the choice of intra-layer topology. First, we derive a new sparse neural network initialization scheme that allows us to explore the space of very deep sparse networks. Next, we evaluate several topologies and show that seemingly similar topologies can often have a large difference in attainable accuracy. To explain these differences, we develop a data-free heuristic that can evaluate a topology independently from the dataset the network will be trained on. We then derive a set of requirements that make a good topology, and arrive at a single topology that satisfies all of them.

Michel A. Kinsy

What is connected

Connect this record

See the researcher in context

Building this map preview

4 published item(s)

A Taxonomy of Error Sources in HPC I/O Machine Learning Models

Zeno: A Scalable Capability-Based Secure Architecture

Fast Arithmetic Hardware Library For RLWE-Based Homomorphic Encryption

NeuroFabric: Identifying Ideal Topologies for Training A Priori Sparse Networks