Source author record

Edward H. Lee

Edward H. Lee appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Computer Vision Cryptography and Security Distributed, Parallel, and Cluster Computing Emerging Technologies Neural and Evolutionary Computing

Catalog footprint

What is connected

3works

6topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

NanoBatch Privacy: Enabling fast Differentially Private learning on the IPU

Differentially private SGD (DPSGD) has recently shown promise in deep learning. However, compared to non-private SGD, the DPSGD algorithm places computational overheads that can undo the benefit of batching in GPUs. Micro-batching is a common method to alleviate this and is fully supported in the TensorFlow Privacy library (TFDP). However, it degrades accuracy. We propose NanoBatch Privacy, a lightweight add-on to TFDP to be used on Graphcore IPUs by leveraging batch size of 1 (without microbatching) and gradient accumulation. This allows us to achieve large total batch sizes with minimal impacts to throughput. Second, we illustrate using Cifar-10 how larger batch sizes are not necessarily optimal from a privacy versus utility perspective. On ImageNet, we achieve more than 15x speedup over TFDP versus 8x A100s and significant speedups even across libraries such as Opacus. We also provide two extensions: 1) DPSGD for pipelined models and 2) per-layer clipping that is 15x faster than the Opacus implementation on 8x A100s. Finally as an application case study, we apply NanoBatch training for use on private Covid-19 chest CT prediction.

preprint2016arXiv

Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing

A switched-capacitor matrix multiplier is presented for approximate computing and machine learning applications. The multiply-and-accumulate operations perform discrete-time charge-domain signal processing using passive switches and 300 aF unit capacitors. The computation is digitized with a 6 b asynchronous successive approximation register analog-to-digital converter. The analyses of incomplete charge accumulation and thermal noise are discussed. The design was fabricated in 40 nm CMOS, and experimental measurements of multiplication are illustrated using matched filtering and image convolutions to analyze noise and offset. Two applications are highlighted: 1) energy-efficient feature extraction layer performing both compression and classification in a neural network for an analog front end and 2) analog acceleration for solving optimization problems that are traditionally performed in the digital domain. The chip obtains measured efficiencies of 8.7 TOPS/W at 1 GHz for the first application and 7.7 TOPS/W at 2.5 GHz for the second application.

preprint2016arXiv

Convolutional Neural Networks using Logarithmic Data Representation

Recent advances in convolutional neural networks have considered model complexity and hardware efficiency to enable deployment onto embedded systems and mobile devices. For example, it is now well-known that the arithmetic operations of deep networks can be encoded down to 8-bit fixed-point without significant deterioration in performance. However, further reduction in precision down to as low as 3-bit fixed-point results in significant losses in performance. In this paper we propose a new data representation that enables state-of-the-art networks to be encoded to 3 bits with negligible loss in classification performance. To perform this, we take advantage of the fact that the weights and activations in a trained network naturally have non-uniform distributions. Using non-uniform, base-2 logarithmic representation to encode weights, communicate activations, and perform dot-products enables networks to 1) achieve higher classification accuracies than fixed-point at the same resolution and 2) eliminate bulky digital multipliers. Finally, we propose an end-to-end training procedure that uses log representation at 5-bits, which achieves higher final test accuracy than linear at 5-bits.

Edward H. Lee

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

NanoBatch Privacy: Enabling fast Differentially Private learning on the IPU

Analysis and Design of a Passive Switched-Capacitor Matrix Multiplier for Approximate Computing

Convolutional Neural Networks using Logarithmic Data Representation