Source author record

Felix Friedrich

Felix Friedrich appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Computation and Language Machine Learning Computer Vision cs.CY hep-ex math.ST Statistics Theory

Catalog footprint

What is connected

5works

8topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

EmoNet-Voice: A Fine-Grained, Expert-Verified Benchmark for Speech Emotion Detection

Speech emotion recognition (SER) systems are constrained by existing datasets that typically cover only 6-10 basic emotions, lack scale and diversity, and face ethical challenges when collecting sensitive emotional states. We introduce EMONET-VOICE, a comprehensive resource addressing these limitations through two components: (1) EmoNet-Voice Big, a 5,000-hour multilingual pre-training dataset spanning 40 fine-grained emotion categories across 11 voices and 4 languages, and (2) EmoNet-Voice Bench, a rigorously validated benchmark of 4,7k samples with unanimous expert consensus on emotion presence and intensity levels. Using state-of-the-art synthetic voice generation, our privacy-preserving approach enables ethical inclusion of sensitive emotions (e.g., pain, shame) while maintaining controlled experimental conditions. Each sample underwent validation by three psychology experts. We demonstrate that our Empathic Insight models trained on our synthetic data achieve strong real-world dataset generalization, as tested on EmoDB and RAVDESS. Furthermore, our comprehensive evaluation reveals that while high-arousal emotions (e.g., anger: 95% accuracy) are readily detected, the benchmark successfully exposes the difficulty of distinguishing perceptually similar emotions (e.g., sadness vs. distress: 63% discrimination), providing quantifiable metrics for advancing nuanced emotion AI. EMONET-VOICE establishes a new paradigm for large-scale, ethically-sourced, fine-grained SER research.

preprint2024arXiv

Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis

Models for text-to-image synthesis, such as DALL-E~2 and Stable Diffusion, have recently drawn a lot of interest from academia and the general public. These models are capable of producing high-quality images that depict a variety of concepts and styles when conditioned on textual descriptions. However, these models adopt cultural characteristics associated with specific Unicode scripts from their vast amount of training data, which may not be immediately apparent. We show that by simply inserting single non-Latin characters in a textual description, common models reflect cultural stereotypes and biases in their generated images. We analyze this behavior both qualitatively and quantitatively, and identify a model's text encoder as the root cause of the phenomenon. Additionally, malicious users or service providers may try to intentionally bias the image generation to create racist stereotypes by replacing Latin characters with similarly-looking characters from non-Latin scripts, so-called homoglyphs. To mitigate such unnoticed script attacks, we propose a novel homoglyph unlearning method to fine-tune a text encoder, making it robust against homoglyph manipulations.

preprint2022arXiv

Interactively Providing Explanations for Transformer Language Models

Transformer language models are state of the art in a multitude of NLP tasks. Despite these successes, their opaqueness remains problematic. Recent methods aiming to provide interpretability and explainability to black-box models primarily focus on post-hoc explanations of (sometimes spurious) input-output correlations. Instead, we emphasize using prototype networks directly incorporated into the model architecture and hence explain the reasoning process behind the network's decisions. Our architecture performs on par with several language models and, moreover, enables learning from user interactions. This not only offers a better understanding of language models but uses human capabilities to incorporate knowledge outside of the rigid range of purely data-driven approaches.

preprint2013arXiv

Complexity $L^0$-penalized M-Estimation: Consistency in More Dimensions

We study the asymptotics in $L^2$ for complexity penalized least squares regression for the discrete approximation of finite-dimensional signals on continuous domains - e.g. images - by piecewise smooth functions. We introduce a fairly general setting which comprises most of the presently popular partitions of signal- or image- domains like interval-, wedgelet- or related partitions, as well as Delaunay triangulations. Then we prove consistency and derive convergence rates. Finally, we illustrate by way of relevant examples that the abstract results are useful for many applications.

preprint2012arXiv

Tau Lepton Reconstruction and Identification at ATLAS

Tau leptons play an important role in the physics program at the LHC. They are used in searches for new phenomena like the Higgs boson or Supersymmetry and in electroweak measurements. Identifying hadronically decaying tau leptons with good performance is an essential part of these analyses. We present the current status of the tau reconstruction and identification at the LHC with the ATLAS detector. The tau identification efficiencies and their systematic uncertainties are measured using W to tau nu and Z to tau tau events, and compared with the predictions from Monte Carlo simulations.