Source author record

Eric Guizzo

Eric Guizzo appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

eess.AS Sound Machine Learning Neural and Evolutionary Computing

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2021arXiv

Anti-Transfer Learning for Task Invariance in Convolutional Neural Networks for Speech Processing

We introduce the novel concept of anti-transfer learning for speech processing with convolutional neural networks. While transfer learning assumes that the learning process for a target task will benefit from re-using representations learned for another task, anti-transfer avoids the learning of representations that have been learned for an orthogonal task, i.e., one that is not relevant and potentially misleading for the target task, such as speaker identity for speech recognition or speech content for emotion recognition. In anti-transfer learning, we penalize similarity between activations of a network being trained and another one previously trained on an orthogonal task, which yields more suitable representations. This leads to better generalization and provides a degree of control over correlations that are spurious or undesirable, e.g. to avoid social bias. We have implemented anti-transfer for convolutional neural networks in different configurations with several similarity metrics and aggregation functions, which we evaluate and analyze with several speech and audio tasks and settings, using six datasets. We show that anti-transfer actually leads to the intended invariance to the orthogonal task and to more appropriate features for the target task at hand. Anti-transfer learning consistently improves classification accuracy in all test cases. While anti-transfer creates computation and memory cost at training time, there is relatively little computation cost when using pre-trained models for orthogonal tasks. Anti-transfer is widely applicable and particularly useful where a specific invariance is desirable or where trained models are available and labeled data for orthogonal tasks are difficult to obtain.

preprint2020arXiv

A Neural Network Based Framework for Archetypical Sound Synthesis

This paper describes a preliminary approach to algorithmically reproduce the archetypical structure adopted by humans to classify sounds. In particular, we propose an approach to predict the human perceived chaos/order level in a sound and synthesize new timbres that present the desired amount of this feature. We adopted a Neural Network based method, in order to exploit its inner predisposition to model perceptive and abstract features. We finally discuss the obtained accuracy and possible implications in creative contexts.

preprint2020arXiv

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals

Robustness against temporal variations is important for emotion recognition from speech audio, since emotion is ex-pressed through complex spectral patterns that can exhibit significant local dilation and compression on the time axis depending on speaker and context. To address this and potentially other tasks, we introduce the multi-time-scale (MTS) method to create flexibility towards temporal variations when analyzing time-frequency representations of audio data. MTS extends convolutional neural networks with convolution kernels that are scaled and re-sampled along the time axis, to increase temporal flexibility without increasing the number of trainable parameters compared to standard convolutional layers. We evaluate MTS and standard convolutional layers in different architectures for emotion recognition from speech audio, using 4 datasets of different sizes. The results show that the use of MTS layers consistently improves the generalization of networks of different capacity and depth, compared to standard convolution, especially on smaller datasets

Eric Guizzo

What is connected

Connect this record

See the researcher in context

Building this map preview

3 published item(s)

Anti-Transfer Learning for Task Invariance in Convolutional Neural Networks for Speech Processing

A Neural Network Based Framework for Archetypical Sound Synthesis

Multi-Time-Scale Convolution for Emotion Recognition from Speech Audio Signals