Source author record

Walter Stechele

Walter Stechele appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Robotics Artificial Intelligence Distributed, Parallel, and Cluster Computing Machine Learning

Catalog footprint

What is connected

5works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

L2PF -- Learning to Prune Faster

Various applications in the field of autonomous driving are based on convolutional neural networks (CNNs), especially for processing camera data. The optimization of such CNNs is a major challenge in continuous development. Newly learned features must be brought into vehicles as quickly as possible, and as such, it is not feasible to spend redundant GPU hours during compression. In this context, we present Learning to Prune Faster which details a multi-task, try-and-learn method, discretely learning redundant filters of the CNN and a continuous action of how long the layers have to be fine-tuned. This allows us to significantly speed up the convergence process of learning how to find an embedded-friendly filter-wise pruned CNN. For ResNet20, we have achieved a compression ratio of 3.84 x with minimal accuracy degradation. Compared to the state-of-the-art pruning method, we reduced the GPU hours by 1.71 x.

preprint2020arXiv

ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks

Closing the gap between the hardware requirements of state-of-the-art convolutional neural networks and the limited resources constraining embedded applications is the next big challenge in deep learning research. The computational complexity and memory footprint of such neural networks are typically daunting for deployment in resource constrained environments. Model compression techniques, such as pruning, are emphasized among other optimization methods for solving this problem. Most existing techniques require domain expertise or result in irregular sparse representations, which increase the burden of deploying deep learning applications on embedded hardware accelerators. In this paper, we propose the autoencoder-based low-rank filter-sharing technique technique (ALF). When applied to various networks, ALF is compared to state-of-the-art pruning methods, demonstrating its efficient compression capabilities on theoretical metrics as well as on an accurate, deterministic hardware-model. In our experiments, ALF showed a reduction of 70\% in network parameters, 61\% in operations and 41\% in execution time, with minimal loss in accuracy.

preprint2020arXiv

Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving

Driveable area detection is a key component for various applications in the field of autonomous driving (AD), such as ground-plane detection, obstacle detection and maneuver planning. Additionally, bulky and over-parameterized networks can be easily forgone and replaced with smaller networks for faster inference on embedded systems. The driveable area detection, posed as a two class segmentation task, can be efficiently modeled with slim binary networks. This paper proposes a novel binarized driveable area detection network (binary DAD-Net), which uses only binary weights and activations in the encoder, the bottleneck, and the decoder part. The latent space of the bottleneck is efficiently increased (x32 -> x16 downsampling) through binary dilated convolutions, learning more complex features. Along with automatically generated training data, the binary DAD-Net outperforms state-of-the-art semantic segmentation networks on public datasets. In comparison to a full-precision model, our approach has a x14.3 reduced compute complexity on an FPGA and it requires only 0.9MB memory resources. Therefore, commodity SIMD-based AD-hardware is capable of accelerating the binary DAD-Net.

preprint2014arXiv

Resource Prediction for Humanoid Robots

Humanoid robots are designed to operate in human centered environments where they execute a multitude of challenging tasks, each differing in complexity, resource requirements, and execution time. In such highly dynamic surroundings it is desirable to anticipate upcoming situations in order to predict future resource requirements such as CPU or memory usage. Resource prediction information is essential for detecting upcoming resource bottlenecks or conflicts and can be used enhance resource negotiation processes or to perform speculative resource allocation. In this paper we present a prediction model based on Markov chains for predicting the behavior of the humanoid robot ARMAR-III in human robot interaction scenarios. Robot state information required by the prediction algorithm is gathered through self-monitoring and combined with environmental context information. Adding resource profiles allows generating probability distributions of possible future resource demands. Online learning of model parameters is made possible through disclosure mechanisms provided by the robot framework ArmarX.

preprint2014arXiv

Resource-Aware Programming for Robotic Vision

Humanoid robots are designed to operate in human centered environments. They face changing, dynamic environments in which they need to fulfill a multitude of challenging tasks. Such tasks differ in complexity, resource requirements, and execution time. Latest computer architectures of humanoid robots consist of several industrial PCs containing single- or dual-core processors. According to the SIA roadmap for semiconductors, many-core chips with hundreds to thousands of cores are expected to be available in the next decade. Utilizing the full power of a chip with huge amounts of resources requires new computing paradigms and methodologies. In this paper, we analyze a resource-aware computing methodology named Invasive Computing, to address these challenges. The benefits and limitations of the new programming model is analyzed using two widely used computer vision algorithms, the Harris Corner detector and SIFT (Scale Invariant Feature Transform) feature matching. The result indicate that the new programming model together with the extensions within the application layer, makes them highly adaptable; leading to better quality in the results obtained.

Walter Stechele

What is connected

Connect this record

See the researcher in context

Building this map preview

5 published item(s)

L2PF -- Learning to Prune Faster

ALF: Autoencoder-based Low-rank Filter-sharing for Efficient Convolutional Neural Networks

Binary DAD-Net: Binarized Driveable Area Detection Network for Autonomous Driving

Resource Prediction for Humanoid Robots

Resource-Aware Programming for Robotic Vision