Source author record

Michael Jones

Michael Jones appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

21works

24topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

pPython Performance Study

pPython seeks to provide a parallel capability that provides good speed-up without sacrificing the ease of programming in Python by implementing partitioned global array semantics (PGAS) on top of a simple file-based messaging library (PythonMPI) in pure Python. pPython follows a SPMD (single program multiple data) model of computation. pPython runs on a single-node (e.g., a laptop) running Windows, Linux, or MacOS operating systems or on any combination of heterogeneous systems that support Python, including on a cluster through a Slurm scheduler interface so that pPython can be executed in a massively parallel computing environment. It is interesting to see what performance pPython can achieve compared to the traditional socket-based MPI communication because of its unique file-based messaging implementation. In this paper, we present the point-to-point and collective communication performances of pPython and compare them with those obtained by using mpi4py with OpenMPI. For large messages, pPython demonstrates comparable performance as compared to mpi4py.

preprint2022arXiv

An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning

In this paper we address the application of pre-processing techniques to multi-channel time series data with varying lengths, which we refer to as the alignment problem, for downstream machine learning. The misalignment of multi-channel time series data may occur for a variety of reasons, such as missing data, varying sampling rates, or inconsistent collection times. We consider multi-channel time series data collected from the MIT SuperCloud High Performance Computing (HPC) center, where different job start times and varying run times of HPC jobs result in misaligned data. This misalignment makes it challenging to build AI/ML approaches for tasks such as compute workload classification. Building on previous supervised classification work with the MIT SuperCloud Dataset, we address the alignment problem via three broad, low overhead approaches: sampling a fixed subset from a full time series, performing summary statistics on a full time series, and sampling a subset of coefficients from time series mapped to the frequency domain. Our best performing models achieve a classification accuracy greater than 95%, outperforming previous approaches to multi-channel time series classification with the MIT SuperCloud Dataset by 5%. These results indicate our low overhead approaches to solving the alignment problem, in conjunction with standard machine learning techniques, are able to achieve high levels of classification accuracy, and serve as a baseline for future approaches to addressing the alignment problem, such as kernel methods.

preprint2022arXiv

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Deep learning (DL) workflows demand an ever-increasing budget of compute and energy in order to achieve outsized gains. Neural architecture searches, hyperparameter sweeps, and rapid prototyping consume immense resources that can prevent resource-constrained researchers from experimenting with large models and carry considerable environmental impact. As such, it becomes essential to understand how different deep neural networks (DNNs) and training leverage increasing compute and energy resources -- especially specialized computationally-intensive models across different domains and applications. In this paper, we conduct over 3,400 experiments training an array of deep networks representing various domains/tasks -- natural language processing, computer vision, and chemistry -- on up to 424 graphics processing units (GPUs). During training, our experiments systematically vary compute resource characteristics and energy-saving mechanisms such as power utilization and GPU clock rate limits to capture and illustrate the different trade-offs and scaling behaviors each representative model exhibits under various resource and energy-constrained regimes. We fit power law models that describe how training time scales with available compute resources and energy constraints. We anticipate that these findings will help inform and guide high-performance computing providers in optimizing resource utilization, by selectively reducing energy consumption for different deep learning tasks/workflows with minimal impact on training.

preprint2022arXiv

Converse: A Tree-Based Modular Task-Oriented Dialogue System

Creating a system that can have meaningful conversations with humans to help accomplish tasks is one of the ultimate goals of Artificial Intelligence (AI). It has defined the meaning of AI since the beginning. A lot has been accomplished in this area recently, with voice assistant products entering our daily lives and chat bot systems becoming commonplace in customer service. At first glance there seems to be no shortage of options for dialogue systems. However, the frequently deployed dialogue systems today seem to all struggle with a critical weakness - they are hard to build and harder to maintain. At the core of the struggle is the need to script every single turn of interactions between the bot and the human user. This makes the dialogue systems more difficult to maintain as the tasks become more complex and more tasks are added to the system. In this paper, we propose Converse, a flexible tree-based modular task-oriented dialogue system. Converse uses an and-or tree structure to represent tasks and offers powerful multi-task dialogue management. Converse supports task dependency and task switching, which are unique features compared to other open-source dialogue frameworks. At the same time, Converse aims to make the bot building process easy and simple, for both professional and non-professional software developers. The code is available at https://github.com/salesforce/Converse.

preprint2022arXiv

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Internet analysis is a major challenge due to the volume and rate of network traffic. In lieu of analyzing traffic as raw packets, network analysts often rely on compressed network flows (netflows) that contain the start time, stop time, source, destination, and number of packets in each direction. However, many traffic analyses benefit from temporal aggregation of multiple simultaneous netflows, which can be computationally challenging. To alleviate this concern, a novel netflow compression and resampling method has been developed leveraging GraphBLAS hyperspace traffic matrices that preserve anonymization while enabling subrange analysis. Standard multitemporal spatial analyses are then performed on each subrange to generate detailed statistical aggregates of the source packets, source fan-out, unique links, destination fan-in, and destination packets of each subrange which can then be used for background modeling and anomaly detection. A simple file format based on GraphBLAS sparse matrices is developed for storing these statistical aggregates. This method is scale tested on the MIT SuperCloud using a 50 trillion packet netflow corpus from several hundred sites collected over several months. The resulting compression achieved is significant (<0.1 bit per packet) enabling extremely large netflow analyses to be stored and transported. The single node parallel performance is analyzed in terms of both processors and threads showing that a single node can perform hundreds of simultaneous analyses at over a million packets/sec (roughly equivalent to a 10 Gigabit link).

preprint2022arXiv

Temporal Correlation of Internet Observatories and Outposts

The Internet has become a critical component of modern civilization requiring scientific exploration akin to endeavors to understand the land, sea, air, and space environments. Understanding the baseline statistical distributions of traffic are essential to the scientific understanding of the Internet. Correlating data from different Internet observatories and outposts can be a useful tool for gaining insights into these distributions. This work compares observed sources from the largest Internet telescope (the CAIDA darknet telescope) with those from a commercial outpost (the GreyNoise honeyfarm). Neither of these locations actively emit Internet traffic and provide distinct observations of unsolicited Internet traffic (primarily botnets and scanners). Newly developed GraphBLAS hyperspace matrices and D4M associative array technologies enable the efficient analysis of these data on significant scales. The CAIDA sources are well approximated by a Zipf-Mandelbrot distribution. Over a 6-month period 70\% of the brightest (highest frequency) sources in the CAIDA telescope are consistently detected by coeval observations in the GreyNoise honeyfarm. This overlap drops as the sources dim (reduce frequency) and as the time difference between the observations grows. The probability of seeing a CAIDA source is proportional to the logarithm of the brightness. The temporal correlations are well described by a modified Cauchy distribution. These observations are consistent with a correlated high frequency beam of sources that drifts on a time scale of a month.

preprint2022arXiv

The MIT Supercloud Workload Classification Challenge

High-Performance Computing (HPC) centers and cloud providers support an increasingly diverse set of applications on heterogenous hardware. As Artificial Intelligence (AI) and Machine Learning (ML) workloads have become an increasingly larger share of the compute workloads, new approaches to optimized resource usage, allocation, and deployment of new AI frameworks are needed. By identifying compute workloads and their utilization characteristics, HPC systems may be able to better match available resources with the application demand. By leveraging datacenter instrumentation, it may be possible to develop AI-based approaches that can identify workloads and provide feedback to researchers and datacenter operators for improving operational efficiency. To enable this research, we released the MIT Supercloud Dataset, which provides detailed monitoring logs from the MIT Supercloud cluster. This dataset includes CPU and GPU usage by jobs, memory usage, and file system logs. In this paper, we present a workload classification challenge based on this dataset. We introduce a labelled dataset that can be used to develop new approaches to workload classification and present initial results based on existing approaches. The goal of this challenge is to foster algorithmic innovations in the analysis of compute workloads that can achieve higher accuracy than existing methods. Data and code will be made publicly available via the Datacenter Challenge website : https://dcc.mit.edu.

preprint2022arXiv

Zero Botnets: An Observe-Pursue-Counter Approach

Adversarial Internet robots (botnets) represent a growing threat to the safe use and stability of the Internet. Botnets can play a role in launching adversary reconnaissance (scanning and phishing), influence operations (upvoting), and financing operations (ransomware, market manipulation, denial of service, spamming, and ad click fraud) while obfuscating tailored tactical operations. Reducing the presence of botnets on the Internet, with the aspirational target of zero, is a powerful vision for galvanizing policy action. Setting a global goal, encouraging international cooperation, creating incentives for improving networks, and supporting entities for botnet takedowns are among several policies that could advance this goal. These policies raise significant questions regarding proper authorities/access that cannot be answered in the abstract. Systems analysis has been widely used in other domains to achieve sufficient detail to enable these questions to be dealt with in concrete terms. Defeating botnets using an observe-pursue-counter architecture is analyzed, the technical feasibility is affirmed, and the authorities/access questions are significantly narrowed. Recommended next steps include: supporting the international botnet takedown community, expanding network observatories, enhancing the underlying network science at scale, conducting detailed systems analysis, and developing appropriate policy frameworks.

preprint2021arXiv

3D Real-Time Supercomputer Monitoring

Supercomputers are complex systems producing vast quantities of performance data from multiple sources and of varying types. Performance data from each of the thousands of nodes in a supercomputer tracks multiple forms of storage, memory, networks, processors, and accelerators. Optimization of application performance is critical for cost effective usage of a supercomputer and requires efficient methods for effectively viewing performance data. The combination of supercomputing analytics and 3D gaming visualization enables real-time processing and visual data display of massive amounts of information that humans can process quickly with little training. Our system fully utilizes the capabilities of modern 3D gaming environments to create novel representations of computing hardware which intuitively represent the physical attributes of the supercomputer while displaying real-time alerts and component utilization. This system allows operators to quickly assess how the supercomputer is being used, gives users visibility into the resources they are consuming, and provides instructors new ways to interactively teach the computing architecture concepts necessary for efficient computing

preprint2020arXiv

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

The SuiteSparse GraphBLAS C-library implements high performance hypersparse matrices with bindings to a variety of languages (Python, Julia, and Matlab/Octave). GraphBLAS provides a lightweight in-memory database implementation of hypersparse matrices that are ideal for analyzing many types of network data, while providing rigorous mathematical guarantees, such as linearity. Streaming updates of hypersparse matrices put enormous pressure on the memory hierarchy. This work benchmarks an implementation of hierarchical hypersparse matrices that reduces memory pressure and dramatically increases the update rate into a hypersparse matrices. The parameters of hierarchical hypersparse matrices rely on controlling the number of entries in each level in the hierarchy before an update is cascaded. The parameters are easily tunable to achieve optimal performance for a variety of applications. Hierarchical hypersparse matrices achieve over 1,000,000 updates per second in a single instance. Scaling to 31,000 instances of hierarchical hypersparse matrices arrays on 1,100 server nodes on the MIT SuperCloud achieved a sustained update rate of 75,000,000,000 updates per second. This capability allows the MIT SuperCloud to analyze extremely large streaming network data sets.

preprint2020arXiv

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

A Multigrid Full Approximation Storage algorithm for solving Deep Residual Networks is developed to enable neural network parallelized layer-wise training and concurrent computational kernel execution on GPUs. This work demonstrates a 10.2x speedup over traditional layer-wise model parallelism techniques using the same number of compute units.

preprint2020arXiv

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

Modern face alignment methods have become quite accurate at predicting the locations of facial landmarks, but they do not typically estimate the uncertainty of their predicted locations nor predict whether landmarks are visible. In this paper, we present a novel framework for jointly predicting landmark locations, associated uncertainties of these predicted locations, and landmark visibilities. We model these as mixed random variables and estimate them using a deep network trained with our proposed Location, Uncertainty, and Visibility Likelihood (LUVLi) loss. In addition, we release an entirely new labeling of a large face alignment dataset with over 19,000 face images in a full range of head poses. Each face is manually labeled with the ground-truth locations of 68 landmarks, with the additional information of whether each landmark is unoccluded, self-occluded (due to extreme head poses), or externally occluded. Not only does our joint estimation yield accurate estimates of the uncertainty of predicted landmark locations, but it also yields state-of-the-art estimates for the landmark locations themselves on multiple standard face alignment datasets. Our method's estimates of the uncertainty of predicted landmark locations could be used to automatically identify input images on which face alignment fails, which can be critical for downstream tasks.

preprint2020arXiv

Permittivity and permeability of epoxy-magnetite powder composites at microwave frequencies

Radio, millimetre and sub-millimetre astronomy experiments as well as remote sensing applications often require castable absorbers with well known electromagnetic properties to design and realize calibration targets. In this context, we fabricated and characterized two samples using different ratios of two easily commercially available materials: epoxy (Stycast 2850FT) and magnetite ($\mathrm{Fe_{3}O_{4}}$) powder. We performed transmission and reflection measurements from 7 GHz up to 170 GHz with a VNA equipped with a series of standard horn antennas. Using an empirical model we analysed the data to extract complex permittivity and permeability from transmission data; then we used reflection data to validate the results. In this paper we present the sample fabrication procedure, analysis method, parameter extraction pipeline, and results for two samples with different epoxy-powder mass ratios.

preprint2020arXiv

Street Scene: A new dataset and evaluation protocol for video anomaly detection

Progress in video anomaly detection research is currently slowed by small datasets that lack a wide variety of activities as well as flawed evaluation criteria. This paper aims to help move this research effort forward by introducing a large and varied new dataset called Street Scene, as well as two new evaluation criteria that provide a better estimate of how an algorithm will perform in practice. In addition to the new dataset and evaluation criteria, we present two variations of a novel baseline video anomaly detection algorithm and show they are much more accurate on Street Scene than two state-of-the-art algorithms from the literature.

preprint2020arXiv

Survey of Machine Learning Accelerators

New machine learning accelerators are being announced and released each month for a variety of applications from speech recognition, video object detection, assisted driving, and many data center applications. This paper updates the survey of of AI accelerators and processors from last year's IEEE-HPEC paper. This paper collects and summarizes the current accelerators that have been publicly announced with performance and power consumption numbers. The performance and power values are plotted on a scatter graph and a number of dimensions and observations from the trends on this plot are discussed and analyzed. For instance, there are interesting trends in the plot regarding power consumption, numerical precision, and inference versus training. This year, there are many more announced accelerators that are implemented with many more architectures and technologies from vector engines, dataflow engines, neuromorphic designs, flash-based analog memory processing, and photonic-based processing.

preprint2019arXiv

Measuring the Gain of a Micro-Channel Plate/Phosphor Assembly Using a Convolutional Neural Network

This paper presents a technique to measure the gain of a single-plate micro-channel plate (MCP)/phosphor assembly by using a convolutional neural network to analyse images of the phosphor screen, recorded by a charge coupled device. The neural network reduces the background noise in the images sufficiently that individual electron events can be identified. From the denoised images, an algorithm determines the average intensity recorded on the phosphor associated with a single electron hitting the MCP. From this average single-particle-intensity, along with measurements of the charge of bunches after amplification by the MCP, we were able to deduce the gain curve of the MCP.

preprint2016arXiv

Benchmarking SciDB Data Import on HPC Systems

SciDB is a scalable, computational database management system that uses an array model for data storage. The array data model of SciDB makes it ideally suited for storing and managing large amounts of imaging data. SciDB is designed to support advanced analytics in database, thus reducing the need for extracting data for analysis. It is designed to be massively parallel and can run on commodity hardware in a high performance computing (HPC) environment. In this paper, we present the performance of SciDB using simulated image data. The Dynamic Distributed Dimensional Data Model (D4M) software is used to implement the benchmark on a cluster running the MIT SuperCloud software stack. A peak performance of 2.2M database inserts per second was achieved on a single node of this system. We also show that SciDB and the D4M toolbox provide more efficient ways to access random sub-volumes of massive datasets compared to the traditional approaches of reading volumetric data from individual files. This work describes the D4M and SciDB tools we developed and presents the initial performance results. This performance was achieved by using parallel inserts, a in-database merging of arrays as well as supercomputing techniques, such as distributed arrays and single-program-multiple-data programming.

preprint2016arXiv

Enhancing HPC Security with a User-Based Firewall

HPC systems traditionally allow their users unrestricted use of their internal network. While this network is normally controlled enough to guarantee privacy without the need for encryption, it does not provide a method to authenticate peer connections. Protocols built upon this internal network must provide their own authentication. Many methods have been employed to perform this authentication. However, support for all of these methods requires the HPC application developer to include support and the user to configure and enable these services. The user-based firewall capability we have prototyped enables a set of rules governing connections across the HPC internal network to be put into place using Linux netfilter. By using an operating system-level capability, the system is not reliant on any developer or user actions to enable security. The rules we have chosen and implemented are crafted to not impact the vast majority of users and be completely invisible to them.

preprint2015arXiv

SKA synergy with Microwave Background studies

The extremely high sensitivity and resolution of the Square Kilometre Array (SKA) will be useful for addressing a wide set of themes relevant for cosmology, in synergy with current and future cosmic microwave background (CMB) projects. Many of these themes also have a link with future optical-IR and X-ray observations. We discuss the scientific perspectives for these goals, the instrumental requirements and the observational and data analysis approaches, and identify several topics that are important for cosmology and astrophysics at different cosmic epochs.

preprint2014arXiv

The ALFALFA "Almost Darks" Campaign: Pilot VLA HI Observations of Five High Mass-to-Light Ratio Systems

We present VLA HI spectral line imaging of 5 sources discovered by ALFALFA. These targets are drawn from a larger sample of systems that were not uniquely identified with optical counterparts during ALFALFA processing, and as such have unusually high HI mass to light ratios. These candidate "Almost Dark" objects fall into 4 categories: 1) objects with nearby HI neighbors that are likely of tidal origin; 2) objects that appear to be part of a system of multiple HI sources, but which may not be tidal in origin; 3) objects isolated from nearby ALFALFA HI detections, but located near a gas-poor early-type galaxy; 4) apparently isolated sources, with no object of coincident redshift within ~400 kpc. Roughly 75% of the 200 objects without identified counterparts in the $α$.40 database (Haynes et al. 2011) fall into category 1. This pilot sample contains the first five sources observed as part of a larger effort to characterize HI sources with no readily identifiable optical counterpart at single dish resolution. These objects span a range of HI mass [7.41 < log(M$_{\rm HI}$) < 9.51] and HI mass to B-band luminosity ratios (3 < M$_{\rm HI}$/L$_{\rm B}$ < 9). We compare the HI total intensity and velocity fields to SDSS optical imaging and to archival GALEX UV imaging. Four of the sources with uncertain or no optical counterpart in the ALFALFA data are identified with low surface brightness optical counterparts in SDSS imaging when compared with VLA HI intensity maps, and appear to be galaxies with clear signs of ordered rotation. One source (AGC 208602) is likely tidal in nature. We find no "dark galaxies" in this limited sample. The present observations reveal complex sources with suppressed star formation, highlighting both the observational difficulties and the necessity of synthesis follow-up observations to understand these extreme objects. (abridged)

preprint2011arXiv

The Relationship between Accretion Disc Age and Stellar Age and its Consequences for Proto-Stellar Discs

We show that for young stars which are still accreting and for which measurements of stellar age, disc mass and accretion rate are available, nominal disc age (Disc Age = Disc Mass / Accretion Rate) is approximately equal to the stellar age, at least within the considerable observational scatter. We then consider theoretical models of proto-stellar discs through analytic and numerical models. A variety of viscosity prescriptions including empirical power laws, magnetohydrodynamic turbulence and gravitational instability were considered within models describing the disc phenomena of dead zones, photoevaporation and planet formation. These models are generally poor fits to the observational data, showing values of 'Disc Age' which are too high by factors of 3 - 10. We then ask whether a systematic error in the measurement of one of the observational quantities might provide a reasonable explanation for this discrepancy. We show that for the observed systems only disc mass shows a systematic dependence on the value of 'Disc Age / Stellar Age' and we note that a systematic underestimate of the value of disc mass by a factor of around 3 - 5, would account for the discrepancy between theory and observations.

Michael Jones

What is connected

Connect this record

See the researcher in context

Building this map preview

21 published item(s)

pPython Performance Study

An Evaluation of Low Overhead Time Series Preprocessing Techniques for Downstream Machine Learning

Benchmarking Resource Usage for Efficient Distributed Deep Learning

Converse: A Tree-Based Modular Task-Oriented Dialogue System

Hypersparse Network Flow Analysis of Packets with GraphBLAS

Temporal Correlation of Internet Observatories and Outposts

The MIT Supercloud Workload Classification Challenge

Zero Botnets: An Observe-Pursue-Counter Approach

3D Real-Time Supercomputer Monitoring

75,000,000,000 Streaming Inserts/Second Using Hierarchical Hypersparse GraphBLAS Matrices

Layer-Parallel Training with GPU Concurrency of Deep Residual Neural Networks via Nonlinear Multigrid

LUVLi Face Alignment: Estimating Landmarks' Location, Uncertainty, and Visibility Likelihood

Permittivity and permeability of epoxy-magnetite powder composites at microwave frequencies

Street Scene: A new dataset and evaluation protocol for video anomaly detection

Survey of Machine Learning Accelerators

Measuring the Gain of a Micro-Channel Plate/Phosphor Assembly Using a Convolutional Neural Network

Benchmarking SciDB Data Import on HPC Systems

Enhancing HPC Security with a User-Based Firewall

SKA synergy with Microwave Background studies

The ALFALFA "Almost Darks" Campaign: Pilot VLA HI Observations of Five High Mass-to-Light Ratio Systems

The Relationship between Accretion Disc Age and Stellar Age and its Consequences for Proto-Stellar Discs