Source author record

Antonio Rodriguez-Sanchez

Antonio Rodriguez-Sanchez appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Computer Vision Machine Learning Artificial Intelligence Computation and Language

Catalog footprint

What is connected

4works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2022arXiv

Greedy-layer Pruning: Speeding up Transformer Models for Natural Language Processing

Fine-tuning transformer models after unsupervised pre-training reaches a very high performance on many different natural language processing tasks. Unfortunately, transformers suffer from long inference times which greatly increases costs in production. One possible solution is to use knowledge distillation, which solves this problem by transferring information from large teacher models to smaller student models. Knowledge distillation maintains high performance and reaches high compression rates, nevertheless, the size of the student model is fixed after pre-training and can not be changed individually for a given downstream task and use-case to reach a desired performance/speedup ratio. Another solution to reduce the size of models in a much more fine-grained and computationally cheaper fashion is to prune layers after the pre-training. The price to pay is that the performance of layer-wise pruning algorithms is not on par with state-of-the-art knowledge distillation methods. In this paper, Greedy-layer pruning is introduced to (1) outperform current state-of-the-art for layer-wise pruning, (2) close the performance gap when compared to knowledge distillation, while (3) providing a method to adapt the model size dynamically to reach a desired performance/speedup tradeoff without the need of additional pre-training phases. Our source code is available on https://github.com/deepopinion/greedy-layer-pruning.

preprint2021arXiv

Auto-tuning of Deep Neural Networks by Conflicting Layer Removal

Designing neural network architectures is a challenging task and knowing which specific layers of a model must be adapted to improve the performance is almost a mystery. In this paper, we introduce a novel methodology to identify layers that decrease the test accuracy of trained models. Conflicting layers are detected as early as the beginning of training. In the worst-case scenario, we prove that such a layer could lead to a network that cannot be trained at all. A theoretical analysis is provided on what is the origin of those layers that result in a lower overall network performance, which is complemented by our extensive empirical evaluation. More precisely, we identified those layers that worsen the performance because they would produce what we name conflicting training bundles. We will show that around 60% of the layers of trained residual networks can be completely removed from the architecture with no significant increase in the test-error. We will further present a novel neural-architecture-search (NAS) algorithm that identifies conflicting layers at the beginning of the training. Architectures found by our auto-tuning algorithm achieve competitive accuracy values when compared against more complex state-of-the-art architectures, while drastically reducing memory consumption and inference time for different computer vision tasks. The source code is available on https://github.com/peerdavid/conflicting-bundles

preprint2021arXiv

Limitation of capsule networks

A recently proposed method in deep learning groups multiple neurons to capsules such that each capsule represents an object or part of an object. Routing algorithms route the output of capsules from lower-level layers to upper-level layers. In this paper, we prove that state-of-the-art routing procedures decrease the expressivity of capsule networks. More precisely, it is shown that EM-routing and routing-by-agreement prevent capsule networks from distinguishing inputs and their negative counterpart. Therefore, only symmetric functions can be expressed by capsule networks, and it can be concluded that they are not universal approximators. We also theoretically motivate and empirically show that this limitation affects the training of deep capsule networks negatively. Therefore, we present an incremental improvement for state-of-the-art routing algorithms that solves the aforementioned limitation and stabilizes the training of capsule networks.

preprint2016arXiv

Learning Abstract Classes using Deep Learning

Humans are generally good at learning abstract concepts about objects and scenes (e.g.\ spatial orientation, relative sizes, etc.). Over the last years convolutional neural networks have achieved almost human performance in recognizing concrete classes (i.e.\ specific object categories). This paper tests the performance of a current CNN (GoogLeNet) on the task of differentiating between abstract classes which are trivially differentiable for humans. We trained and tested the CNN on the two abstract classes of horizontal and vertical orientation and determined how well the network is able to transfer the learned classes to other, previously unseen objects.