Researcher profile

Ken Kreutz-Delgado

Ken Kreutz-Delgado contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 21 - EmergingVerification L1Unclaimed author
6works
0followers
11topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

6 published item(s)

preprint2021arXiv

A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?

When trained as generative models, Deep Learning algorithms have shown exceptional performance on tasks involving high dimensional data such as image denoising and super-resolution. In an increasingly connected world dominated by mobile and edge devices, there is surging demand for these algorithms to run locally on embedded platforms. FPGAs, by virtue of their reprogrammability and low-power characteristics, are ideal candidates for these edge computing applications. As such, we design a spatio-temporally parallelized hardware architecture capable of accelerating a deconvolution algorithm optimized for power-efficient inference on a resource-limited FPGA. We propose this FPGA-based accelerator to be used for Deconvolutional Neural Network (DCNN) inference in low-power edge computing applications. To this end, we develop methods that systematically exploit micro-architectural innovations, design space exploration, and statistical analysis. Using a Xilinx PYNQ-Z2 FPGA, we leverage our architecture to accelerate inference for two DCNNs trained on the MNIST and CelebA datasets using the Wasserstein GAN framework. On these networks, our FPGA design achieves a higher throughput to power ratio with lower run-to-run variation when compared to the NVIDIA Jetson TX1 edge computing GPU.

preprint2021arXiv

Generative and Discriminative Deep Belief Network Classifiers: Comparisons Under an Approximate Computing Framework

The use of Deep Learning hardware algorithms for embedded applications is characterized by challenges such as constraints on device power consumption, availability of labeled data, and limited internet bandwidth for frequent training on cloud servers. To enable low power implementations, we consider efficient bitwidth reduction and pruning for the class of Deep Learning algorithms known as Discriminative Deep Belief Networks (DDBNs) for embedded-device classification tasks. We train DDBNs with both generative and discriminative objectives under an approximate computing framework and analyze their power-at-performance for supervised and semi-supervised applications. We also investigate the out-of-distribution performance of DDBNs when the inference data has the same class structure yet is statistically different from the training data owing to dynamic real-time operating environments. Based on our analysis, we provide novel insights and recommendations for choice of training objectives, bitwidth values, and accuracy sensitivity with respect to the amount of labeled data for implementing DDBN inference with minimum power consumption on embedded hardware platforms subject to accuracy tolerances.

preprint2015arXiv

Gibbs Sampling with Low-Power Spiking Digital Neurons

Restricted Boltzmann Machines and Deep Belief Networks have been successfully used in a wide variety of applications including image classification and speech recognition. Inference and learning in these algorithms uses a Markov Chain Monte Carlo procedure called Gibbs sampling. A sigmoidal function forms the kernel of this sampler which can be realized from the firing statistics of noisy integrate-and-fire neurons on a neuromorphic VLSI substrate. This paper demonstrates such an implementation on an array of digital spiking neurons with stochastic leak and threshold properties for inference tasks and presents some key performance metrics for such a hardware-based sampler in both the generative and discriminative contexts.

preprint2015arXiv

Mean Time-to-Fire for the Noisy LIF Neuron - A Detailed Derivation of the Siegert Formula

When stimulated by a very large number of Poisson-like presynaptic current input spikes, the temporal dynamics of the soma membrane potential $V(t)$ of a leaky integrate-and-fire (LIF) neuron is typically modeled in the diffusion limit and treated as a Ornstein-Uhlenbeck process (OUP). When the potential reaches a threshold value $θ$, $V(t) = θ$, the LIF neuron fires and the membrane potential is reset to a resting value, $V_0 < θ$, and clamped to this value for a specified (non-stochastic) absolute refractory period $T_r \ge 0$, after which the cycle is repeated. The time between firings is given by the random variable $T_f = T_r+ T$ where $T$ is the random time which elapses between the &#34;unpinning&#34; of the membrane potential clamp and the next, subsequent firing of the neuron. The mean time-to-fire, $\widehat{T}_f = \text{E}(T_f) = T_r + \text{E}(T) = T_r + \widehat{T}$, provides a measure $ρ$ of the average firing rate of the neuron, \[ ρ= \widehat{T}_f^{-1} = \frac{1}{T_r + \widehat{T}} . \] This note briefly discusses some aspects of the OUP model and derives the Siegert formula giving the firing rate, $ρ= ρ(I_0)$ as a function of an injected current, $I_0$. This is a well-known classical result and no claim to originality is made. The derivation of the firing rate given in this report, which closely follows the derivation outlined in the textbook by Gardiner, minimizes the required mathematical background and is done in some pedagogic detail to facilitate study by graduate students and others who are new to the subject. Knowledge of the material presented in the first five chapters of Gardiner should provide an adequate background for following the derivation given in this note.

preprint2012arXiv

Analysis of Trim Commands on Overprovisioning and Write Amplification in Solid State Drives

This paper presents a performance model of the ATA/ATAPI SSD Trim command under various types of user workloads, including a uniform random workload, a workload with hot and cold data, and a workload with N temperatures of data. We first examine the Trim-modified uniform random workload to predict utilization, then use this result to compute the resultant level of effective overprovisioning. This allows modification of models previously suggested to predict write amplification of a non-Trim uniform random workload under greedy garbage collection. Finally, we expand the theory to cover a workload consisting of hot and cold data (and also N temperatures of data), providing formulas to predict write amplification in these scenarios.

preprint2012arXiv

Solid State Disk Object-Based Storage with Trim Commands

This paper presents a model of NAND flash SSD utilization and write amplification when the ATA/ATAPI SSD Trim command is incorporated into object-based storage under a variety of user workloads, including a uniform random workload with objects of fixed size and a uniform random workload with objects of varying sizes. We first summarize the existing models for write amplification in SSDs for workloads with and without the Trim command, then propose an alteration of the models that utilizes a framework of object-based storage. The utilization of objects and pages in the SSD is derived, with the analytic results compared to simulation. Finally, the effect of objects on write amplification and its computation is discussed along with a potential application to optimization of SSD usage through object storage metadata servers that allocate object classes of distinct object size.