Source author record

Alex Jones

Alex Jones appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computation and Language cond-mat.mtrl-sci Distributed, Parallel, and Cluster Computing Hardware Architecture Information Theory math.FA math.IT

Catalog footprint

What is connected

6works

9topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC accelerators have emerged as promising platforms. For example, the AMD/Xilinx Versal ACAP architecture combines general-purpose CPU cores and programmable logic with AI Engine processors optimized for AI/ML. With 400 AIEs, it provides up to 6.4 TFLOPs performance for 32-bit floating-point data. However, machine learning models often contain both large and small MM operations. While large MM operations can be parallelized efficiently across many cores, small MM operations typically cannot. We observe that executing some small MM layers from the BERT natural language processing model on a large, monolithic MM accelerator in Versal ACAP achieved less than 5% of the theoretical peak performance. Therefore, one key question arises: How can we design accelerators to fully use the abundant computation resources under limited communication bandwidth for applications with multiple MM layers of diverse sizes? We identify the biggest system throughput bottleneck resulting from the mismatch of massive computation resources of one monolithic accelerator and the various MM layers of small sizes in the application. To resolve this problem, we propose the CHARM framework to compose multiple diverse MM accelerator architectures working concurrently towards different layers in one application. We deploy the CHARM framework for four different applications, including BERT, ViT, NCF, MLP, on the AMD Versal ACAP VCK190 evaluation board. Our experiments show that we achieve 1.46 TFLOPs, 1.61 TFLOPs, 1.74 TFLOPs, and 2.94 TFLOPs inference throughput for BERT, ViT, NCF and MLP, which obtain 5.40x, 32.51x, 1.00x and 1.00x throughput gains compared to one monolithic accelerator.

preprint2022arXiv

Finetuning a Kalaallisut-English machine translation system using web-crawled data

West Greenlandic, known by native speakers as Kalaallisut, is an extremely low-resource polysynthetic language spoken by around 56,000 people in Greenland. Here, we attempt to finetune a pretrained Kalaallisut-to-English neural machine translation (NMT) system using web-crawled pseudoparallel sentences from around 30 multilingual websites. We compile a corpus of over 93,000 Kalaallisut sentences and over 140,000 Danish sentences, then use cross-lingual sentence embeddings and approximate nearest-neighbors search in an attempt to mine near-translations from these corpora. Finally, we translate the Danish sentence to English to obtain a synthetic Kalaallisut-English aligned corpus. Although the resulting dataset is too small and noisy to improve the pretrained MT model, we believe that with additional resources, we could construct a better pseudoparallel corpus and achieve more promising results on MT. We also note other possible uses of the monolingual Kalaallisut data and discuss directions for future work. We make the code and data for our experiments publicly available.

preprint2022arXiv

H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness

The complex nature of real-world problems calls for heterogeneity in both machine learning (ML) models and hardware systems. The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i.e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns. The heterogeneity in systems comes from diverse processing components, as it becomes the prevailing method to integrate multiple dedicated accelerators into one system. Therefore, a new problem emerges: heterogeneous model to heterogeneous system mapping (H2H). While previous mapping algorithms mostly focus on efficient computations, in this work, we argue that it is indispensable to consider computation and communication simultaneously for better system efficiency. We propose a novel H2H mapping algorithm with both computation and communication awareness; by slightly trading computation for communication, the system overall latency and energy consumption can be largely reduced. The superior performance of our work is evaluated based on MAESTRO modeling, demonstrating 15%-74% latency reduction and 23%-64% energy reduction compared with existing computation-prioritized mapping algorithms.

preprint2022arXiv

Learning Deep Graph Representations via Convolutional Neural Networks

Graph-structured data arise in many scenarios. A fundamental problem is to quantify the similarities of graphs for tasks such as classification. R-convolution graph kernels are positive-semidefinite functions that decompose graphs into substructures and compare them. One problem in the effective implementation of this idea is that the substructures are not independent, which leads to high-dimensional feature space. In addition, graph kernels cannot capture the high-order complex interactions between vertices. To mitigate these two problems, we propose a framework called DeepMap to learn deep representations for graph feature maps. The learned deep representation for a graph is a dense and low-dimensional vector that captures complex high-order interactions in a vertex neighborhood. DeepMap extends Convolutional Neural Networks (CNNs) to arbitrary graphs by generating aligned vertex sequences and building the receptive field for each vertex. We empirically validate DeepMap on various graph classification benchmarks and demonstrate that it achieves state-of-the-art performance.

preprint2016arXiv

Analyzing the structure of multidimensional compressed sensing problems through coherence

Recently it has been established that asymptotic incoherence can be used to facilitate subsampling, in order to optimize reconstruction quality, in a variety of continuous compressed sensing problems, and the coherence structure of certain one-dimensional Fourier sampling problems was determined. This paper extends the analysis of asymptotic incoherence to cover multidimensional reconstruction problems. It is shown that Fourier sampling and separable wavelet sparsity in any dimension can yield the same optimal asymptotic incoherence as in one dimensional case. Moreover in two dimensions the coherence structure is compatible with many standard two dimensional sampling schemes that are currently in use. However, in higher dimensional problems with poor wavelet smoothness we demonstrate that there are considerable restrictions on how one can subsample from the Fourier basis with optimal incoherence. This can be remedied by using a sufficiently smooth generating wavelet. It is also shown that using tensor bases will always provide suboptimal decay marred by problems associated with dimensionality. The impact of asymptotic incoherence on the ability to subsample is demonstrated with some simple two dimensional numerical experiments.

preprint2015arXiv

Continuous Compressed Sensing of Inelastic and Quasielastic Helium Atom Scattering Spectra

Helium atom scattering (HAS) is a well established technique for examining the surface structure and dynamics of materials at atomic sized resolution. The HAS technique Helium spin-echo spectroscopy opens up the possibility of compressing the data acquisition process. Compressed sensing (CS) methods demonstrating the compressibility of spin-echo spectra are presented. In addition, wavelet based CS approximations, founded on a new continuous CS approach, are used to construct continuous spectra that are compatible with variable transformations to the energy/momentum transfer domain. Moreover, recent developments on structured multilevel sampling that are empirically and theoretically shown to substantially improve upon the state of the art CS techniques are implemented. These techniques are demonstrated on several examples including phonon spectra from a gold surface.

Alex Jones

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

CHARM: Composing Heterogeneous Accelerators for Matrix Multiply on Versal ACAP Architecture

Finetuning a Kalaallisut-English machine translation system using web-crawled data

H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness

Learning Deep Graph Representations via Convolutional Neural Networks

Analyzing the structure of multidimensional compressed sensing problems through coherence

Continuous Compressed Sensing of Inelastic and Quasielastic Helium Atom Scattering Spectra