Source author record

Michael O'Boyle

Michael O'Boyle appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Programming Languages Computer Vision Distributed, Parallel, and Cluster Computing Software Engineering

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Neural Architecture Search as Program Transformation Exploration

Improving the performance of deep neural networks (DNNs) is important to both the compiler and neural architecture search (NAS) communities. Compilers apply program transformations in order to exploit hardware parallelism and memory hierarchy. However, legality concerns mean they fail to exploit the natural robustness of neural networks. In contrast, NAS techniques mutate networks by operations such as the grouping or bottlenecking of convolutions, exploiting the resilience of DNNs. In this work, we express such neural architecture operations as program transformations whose legality depends on a notion of representational capacity. This allows them to be combined with existing transformations into a unified optimization framework. This unification allows us to express existing NAS operations as combinations of simpler transformations. Crucially, it allows us to generate and explore new tensor convolutions. We prototyped the combined framework in TVM and were able to find optimizations across different DNNs, that significantly reduce inference time - over 3$\times$ in the majority of cases. Furthermore, our scheme dramatically reduces NAS search time. Code is available at~\href{https://github.com/jack-willturner/nas-as-program-transformation-exploration}{this https url}.

preprint2020arXiv

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

The desire to map neural networks to varying-capacity devices has led to the development of a wealth of compression techniques, many of which involve replacing standard convolutional blocks in a large network with cheap alternative blocks. However, not all blocks are created equally; for a required compute budget there may exist a potent combination of many different cheap blocks, though exhaustively searching for such a combination is prohibitively expensive. In this work, we develop BlockSwap: a fast algorithm for choosing networks with interleaved block types by passing a single minibatch of training data through randomly initialised networks and gauging their Fisher potential. These networks can then be used as students and distilled with the original large network as a teacher. We demonstrate the effectiveness of the chosen networks across CIFAR-10 and ImageNet for classification, and COCO for detection, and provide a comprehensive ablation study of our approach. BlockSwap quickly explores possible block configurations using a simple architecture ranking system, yielding highly competitive networks in orders of magnitude less time than most architecture search techniques (e.g. under 5 minutes on a single GPU for CIFAR-10). Code is available at https://github.com/BayesWatch/pytorch-blockswap.

preprint2020arXiv

M3: Semantic API Migrations

Library migration is a challenging problem, where most existing approaches rely on prior knowledge. This can be, for example, information derived from changelogs or statistical models of API usage. This paper addresses a different API migration scenario where there is no prior knowledge of the target library. We have no historical changelogs and no access to its internal representation. To tackle this problem, this paper proposes a novel approach (M$^3$), where probabilistic program synthesis is used to semantically model the behavior of library functions. Then, we use an SMT-based code search engine to discover similar code in user applications. These discovered instances provide potential locations for API migrations. We evaluate our approach against 7 well-known libraries from varied application domains, learning correct implementations for 94 functions. Our approach is integrated with standard compiler tooling, and we use this integration to evaluate migration opportunities in 9 existing C/C++ applications with over 1MLoC. We discover over 7,000 instances of these functions, of which more than 2,000 represent migration opportunities.

preprint2020arXiv

Optimizing Grouped Convolutions on Edge Devices

When deploying a deep neural network on constrained hardware, it is possible to replace the network's standard convolutions with grouped convolutions. This allows for substantial memory savings with minimal loss of accuracy. However, current implementations of grouped convolutions in modern deep learning frameworks are far from performing optimally in terms of speed. In this paper we propose Grouped Spatial Pack Convolutions (GSPC), a new implementation of grouped convolutions that outperforms existing solutions. We implement GSPC in TVM, which provides state-of-the-art performance on edge devices. We analyze a set of networks utilizing different types of grouped convolutions and evaluate their performance in terms of inference time on several edge devices. We observe that our new implementation scales well with the number of groups and provides the best inference times in all settings, improving the existing implementations of grouped convolutions in TVM, PyTorch and TensorFlow Lite by 3.4x, 8x and 4x on average respectively. Code is available at https://github.com/gecLAB/tvm-GSPC/

preprint2020arXiv

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Convolutional Neural Networks (CNN) are becoming a common presence in many applications and services, due to their superior recognition accuracy. They are increasingly being used on mobile devices, many times just by porting large models designed for server space, although several model compression techniques have been considered. One model compression technique intended to reduce computations is channel pruning. Mobile and embedded systems now have GPUs which are ideal for the parallel computations of neural networks and for their lower energy cost per operation. Specialized libraries perform these neural network computations through highly optimized routines. As we find in our experiments, these libraries are optimized for the most common network shapes, making uninstructed channel pruning inefficient. We evaluate higher level libraries, which analyze the input characteristics of a convolutional layer, based on which they produce optimized OpenCL (Arm Compute Library and TVM) and CUDA (cuDNN) code. However, in reality, these characteristics and subsequent choices intended for optimization can have the opposite effect. We show that a reduction in the number of convolutional channels, pruning 12% of the initial size, is in some cases detrimental to performance, leading to 2x slowdown. On the other hand, we also find examples where performance-aware pruning achieves the intended results, with performance speedups of 3x with cuDNN and above 10x with Arm Compute Library and TVM. Our findings expose the need for hardware-instructed neural network pruning.

preprint2020arXiv

Retrofitting Symbolic Holes to LLVM IR

Symbolic holes are one of the fundamental building blocks of solver-aided and interactive programming. Unknown values can be soundly integrated into programs, and automated tools such as SAT solvers can be used to prove properties of programs containing them. However, supporting symbolic holes in a programming language is challenging; specifying interactions of holes with the type system and execution semantics requires careful design. This paper motivates and introduces the implementation of symbolic holes with unknown type to LLVM IR, a strongly-typed compiler intermediate language. We describe how such holes can be implemented safely by abstracting unsound and type-unsafe details behind a new primitive IR manipulation. Our implementation co-operates well with existing features such as type and dependency checking. Finally, we highlight potentially fruitful areas for investigation using our implementation.

Michael O'Boyle

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Neural Architecture Search as Program Transformation Exploration

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

M3: Semantic API Migrations

Optimizing Grouped Convolutions on Edge Devices

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Retrofitting Symbolic Holes to LLVM IR

Michael O&#39;Boyle

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Neural Architecture Search as Program Transformation Exploration

BlockSwap: Fisher-guided Block Substitution for Network Compression on a Budget

M3: Semantic API Migrations

Optimizing Grouped Convolutions on Edge Devices

Performance Aware Convolutional Neural Network Channel Pruning for Embedded GPUs

Retrofitting Symbolic Holes to LLVM IR

Michael O'Boyle