Source author record

Elisabeth André

Elisabeth André appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Human-Computer Interaction Machine Learning Computer Vision Neural and Evolutionary Computing cs.CY eess.AS eess.IV Multimedia Robotics Sound

Catalog footprint

What is connected

14works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Video Joint-Embedding Predictive Architectures for Facial Expression Recognition

This paper introduces a novel application of Video Joint-Embedding Predictive Architectures (V-JEPAs) for Facial Expression Recognition (FER). Departing from conventional pre-training methods for video understanding that rely on pixel-level reconstructions, V-JEPAs learn by predicting embeddings of masked regions from the embeddings of unmasked regions. This enables the trained encoder to not capture irrelevant information about a given video like the color of a region of pixels in the background. Using a pre-trained V-JEPA video encoder, we train shallow classifiers using the RAVDESS and CREMA-D datasets, achieving state-of-the-art performance on RAVDESS and outperforming all other vision-based methods on CREMA-D (+1.48 WAR). Furthermore, cross-dataset evaluations reveal strong generalization capabilities, demonstrating the potential of purely embedding-based pre-training approaches to advance FER. We release our code at https://github.com/lennarteingunia/vjepa-for-fer.

preprint2022arXiv

"GAN I hire you?" -- A System for Personalized Virtual Job Interview Training

Job interviews are usually high-stakes social situations where professional and behavioral skills are required for a satisfactory outcome. Professional job interview trainers give educative feedback about the shown behavior according to common standards. This feedback can be helpful concerning the improvement of behavioral skills needed for job interviews. A technological approach for generating such feedback might be a playful and low-key starting point for job interview training. Therefore, we extended an interactive virtual job interview training system with a Generative Adversarial Network (GAN)-based approach that first detects behavioral weaknesses and subsequently generates personalized feedback. To evaluate the usefulness of the generated feedback, we conducted a mixed-methods pilot study using mock-ups from the job interview training system. The overall study results indicate that the GAN-based generated behavioral feedback is helpful. Moreover, participants assessed that the feedback would improve their job interview performance.

preprint2022arXiv

Alterfactual Explanations -- The Relevance of Irrelevance for Explaining AI Systems

Explanation mechanisms from the field of Counterfactual Thinking are a widely-used paradigm for Explainable Artificial Intelligence (XAI), as they follow a natural way of reasoning that humans are familiar with. However, all common approaches from this field are based on communicating information about features or characteristics that are especially important for an AI's decision. We argue that in order to fully understand a decision, not only knowledge about relevant features is needed, but that the awareness of irrelevant information also highly contributes to the creation of a user's mental model of an AI system. Therefore, we introduce a new way of explaining AI systems. Our approach, which we call Alterfactual Explanations, is based on showing an alternative reality where irrelevant features of an AI's input are altered. By doing so, the user directly sees which characteristics of the input data can change arbitrarily without influencing the AI's decision. We evaluate our approach in an extensive user study, revealing that it is able to significantly contribute to the participants' understanding of an AI. We show that alterfactual explanations are suited to convey an understanding of different aspects of the AI's reasoning than established counterfactual explanation methods.

preprint2022arXiv

Alternative Data Augmentation for Industrial Monitoring using Adversarial Learning

Visual inspection software has become a key factor in the manufacturing industry for quality control and process monitoring. Semantic segmentation models have gained importance since they allow for more precise examination. These models, however, require large image datasets in order to achieve a fair accuracy level. In some cases, training data is sparse or lacks of sufficient annotation, a fact that especially applies to highly specialized production environments. Data augmentation represents a common strategy to extend the dataset. Still, it only varies the image within a narrow range. In this article, a novel strategy is proposed to augment small image datasets. The approach is applied to surface monitoring of carbon fibers, a specific industry use case. We apply two different methods to create binary labels: a problem-tailored trigonometric function and a WGAN model. Afterwards, the labels are translated into color images using pix2pix and used to train a U-Net. The results suggest that the trigonometric function is superior to the WGAN model. However, a precise examination of the resulting images indicate that WGAN and image-to-image translation achieve good segmentation results and only deviate to a small degree from traditional data augmentation. In summary, this study examines an industry application of data synthesization using generative adversarial networks and explores its potential for monitoring systems of production environments. \keywords{Image-to-Image Translation, Carbon Fiber, Data Augmentation, Computer Vision, Industrial Monitoring, Adversarial Learning.

preprint2022arXiv

Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents

One of the most prominent methods for explaining the behavior of Deep Reinforcement Learning (DRL) agents is the generation of saliency maps that show how much each pixel attributed to the agents' decision. However, there is no work that computationally evaluates and compares the fidelity of different saliency map approaches specifically for DRL agents. It is particularly challenging to computationally evaluate saliency maps for DRL agents since their decisions are part of an overarching policy. For instance, the output neurons of value-based DRL algorithms encode both the value of the current state as well as the value of doing each action in this state. This ambiguity should be considered when evaluating saliency maps for such agents. In this paper, we compare five popular perturbation-based approaches to create saliency maps for DRL agents trained on four different Atari 2600 games. The approaches are compared using two computational metrics: dependence on the learned parameters of the agent (sanity checks) and fidelity to the agent's reasoning (input degradation). During the sanity checks, we encounter issues with one approach and propose a solution to fix these issues. For fidelity, we identify two main factors that influence which saliency approach should be chosen in which situation.

preprint2022arXiv

Do Deep Neural Networks Forget Facial Action Units? -- Exploring the Effects of Transfer Learning in Health Related Facial Expression Recognition

In this paper, we present a process to investigate the effects of transfer learning for automatic facial expression recognition from emotions to pain. To this end, we first train a VGG16 convolutional neural network to automatically discern between eight categorical emotions. We then fine-tune successively larger parts of this network to learn suitable representations for the task of automatic pain recognition. Subsequently, we apply those fine-tuned representations again to the original task of emotion recognition to further investigate the differences in performance between the models. In the second step, we use Layer-wise Relevance Propagation to analyze predictions of the model that have been predicted correctly previously but are now wrongly classified. Based on this analysis, we rely on the visual inspection of a human observer to generate hypotheses about what has been forgotten by the model. Finally, we test those hypotheses quantitatively utilizing concept embedding analysis methods. Our results show that the network, which was fully fine-tuned for pain recognition, indeed payed less attention to two action units that are relevant for expression recognition but not for pain recognition.

preprint2022arXiv

Employing Socially Interactive Agents for Robotic Neurorehabilitation Training

In today's world, many patients with cognitive impairments and motor dysfunction seek the attention of experts to perform specific conventional therapies to improve their situation. However, due to a lack of neurorehabilitation professionals, patients suffer from severe effects that worsen their condition. In this paper, we present a technological approach for a novel robotic neurorehabilitation training system. It relies on a combination of a rehabilitation device, signal classification methods, supervised machine learning models for training adaptation, training exercises, and socially interactive agents as a user interface. Together with a professional, the system can be trained towards the patient's specific needs. Furthermore, after a training phase, patients are enabled to train independently at home without the assistance of a physical therapist with a socially interactive agent in the role of a coaching assistant.

preprint2022arXiv

Intercategorical Label Interpolation for Emotional Face Generation with Conditional Generative Adversarial Networks

Generative adversarial networks offer the possibility to generate deceptively real images that are almost indistinguishable from actual photographs. Such systems however rely on the presence of large datasets to realistically replicate the corresponding domain. This is especially a problem if not only random new images are to be generated, but specific (continuous) features are to be co-modeled. A particularly important use case in \emph{Human-Computer Interaction} (HCI) research is the generation of emotional images of human faces, which can be used for various use cases, such as the automatic generation of avatars. The problem hereby lies in the availability of training data. Most suitable datasets for this task rely on categorical emotion models and therefore feature only discrete annotation labels. This greatly hinders the learning and modeling of smooth transitions between displayed affective states. To overcome this challenge, we explore the potential of label interpolation to enhance networks trained on categorical datasets with the ability to generate images conditioned on continuous features.

preprint2022arXiv

VoiceMe: Personalized voice generation in TTS

Novel text-to-speech systems can generate entirely new voices that were not seen during training. However, it remains a difficult task to efficiently create personalized voices from a high-dimensional speaker space. In this work, we use speaker embeddings from a state-of-the-art speaker verification model (SpeakerNet) trained on thousands of speakers to condition a TTS model. We employ a human sampling paradigm to explore this speaker latent space. We show that users can create voices that fit well to photos of faces, art portraits, and cartoons. We recruit online participants to collectively manipulate the voice of a speaking face. We show that (1) a separate group of human raters confirms that the created voices match the faces, (2) speaker gender apparent from the face is well-recovered in the voice, and (3) people are consistently moving towards the real voice prototype for the given face. Our results demonstrate that this technology can be applied in a wide number of applications including character voice development in audiobooks and games, personalized speech assistants, and individual voices for people with speech impairment.

preprint2021arXiv

Dynamic Difficulty Adjustment in Virtual Reality Exergames through Experience-driven Procedural Content Generation

Virtual Reality (VR) games that feature physical activities have been shown to increase players' motivation to do physical exercise. However, for such exercises to have a positive healthcare effect, they have to be repeated several times a week. To maintain player motivation over longer periods of time, games often employ Dynamic Difficulty Adjustment (DDA) to adapt the game's challenge according to the player's capabilities. For exercise games, this is mostly done by tuning specific in-game parameters like the speed of objects. In this work, we propose to use experience-driven Procedural Content Generation for DDA in VR exercise games by procedurally generating levels that match the player's current capabilities. Not only finetuning specific parameters but creating completely new levels has the potential to decrease repetition over longer time periods and allows for the simultaneous adaptation of the cognitive and physical challenge of the exergame. As a proof-of-concept, we implement an initial prototype in which the player must traverse a maze that includes several exercise rooms, whereby the generation of the maze is realized by a neural network. Passing those exercise rooms requires the player to perform physical activities. To match the player's capabilities, we use Deep Reinforcement Learning to adjust the structure of the maze and to decide which exercise rooms to include in the maze. We evaluate our prototype in an exploratory user study utilizing both biodata and subjective questionnaires.

preprint2021arXiv

GANterfactual - Counterfactual Explanations for Medical Non-Experts using Generative Adversarial Learning

With the ongoing rise of machine learning, the need for methods for explaining decisions made by artificial intelligence systems is becoming a more and more important topic. Especially for image classification tasks, many state-of-the-art tools to explain such classifiers rely on visual highlighting of important areas of the input data. Contrary, counterfactual explanation systems try to enable a counterfactual reasoning by modifying the input image in a way such that the classifier would have made a different prediction. By doing so, the users of counterfactual explanation systems are equipped with a completely different kind of explanatory information. However, methods for generating realistic counterfactual explanations for image classifiers are still rare. Especially in medical contexts, where relevant information often consists of textural and structural information, high-quality counterfactual images have the potential to give meaningful insights into decision processes. In this work, we present GANterfactual, an approach to generate such counterfactual image explanations based on adversarial image-to-image translation techniques. Additionally, we conduct a user study to evaluate our approach in an exemplary medical use case. Our results show that, in the chosen medical use-case, counterfactual explanations lead to significantly better results regarding mental models, explanation satisfaction, trust, emotions, and self-efficacy than two state-of-the-art systems that work with saliency maps, namely LIME and LRP.

preprint2020arXiv

Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps

With advances in reinforcement learning (RL), agents are now being developed in high-stakes application domains such as healthcare and transportation. Explaining the behavior of these agents is challenging, as the environments in which they act have large state spaces, and their decision-making can be affected by delayed rewards, making it difficult to analyze their behavior. To address this problem, several approaches have been developed. Some approaches attempt to convey the $\textit{global}$ behavior of the agent, describing the actions it takes in different states. Other approaches devised $\textit{local}$ explanations which provide information regarding the agent's decision-making in a particular state. In this paper, we combine global and local explanation methods, and evaluate their joint and separate contributions, providing (to the best of our knowledge) the first user study of combined local and global explanations for RL agents. Specifically, we augment strategy summaries that extract important trajectories of states from simulations of the agent with saliency maps which show what information the agent attends to. Our results show that the choice of what states to include in the summary (global information) strongly affects people's understanding of agents: participants shown summaries that included important states significantly outperformed participants who were presented with agent behavior in a randomly set of chosen world-states. We find mixed results with respect to augmenting demonstrations with saliency maps (local information), as the addition of saliency maps did not significantly improve performance in most cases. However, we do find some evidence that saliency maps can help users better understand what information the agent relies on in its decision making, suggesting avenues for future work that can further improve explanations of RL agents.

preprint2020arXiv

Resonating Experiences of Self and Others enabled by a Tangible Somaesthetic Design

Digitalization is penetrating every aspect of everyday life including a human's heart beating, which can easily be sensed by wearable sensors and displayed for others to see, feel, and potentially "bodily resonate" with. Previous work in studying human interactions and interaction designs with physiological data, such as a heart's pulse rate, have argued that feeding it back to the users may, for example support users' mindfulness and self-awareness during various everyday activities and ultimately support their wellbeing. Inspired by Somaesthetics as a discipline, which focuses on an appreciation of the living body's role in all our experiences, we designed and explored mobile tangible heart beat displays, which enable rich forms of bodily experiencing oneself and others in social proximity. In this paper, we first report on the design process of tangible heart displays and then present results of a field study with 30 pairs of participants. Participants were asked to use the tangible heart displays during watching movies together and report their experience in three different heart display conditions (i.e., displaying their own heart beat, their partner's heart beat, and watching a movie without a heart display). We found, for example that participants reported significant effects in experiencing sensory immersion when they felt their own heart beats compared to the condition without any heart beat display, and that feeling their partner's heart beats resulted in significant effects on social experience. We refer to resonance theory to discuss the results, highlighting the potential of how ubiquitous technology could utilize physiological data to provide resonance in a modern society facing social acceleration.

preprint2014arXiv

Interpreting social cues to generate credible affective reactions of virtual job interviewers

In this paper we describe a mechanism of generating credible affective reactions in a virtual recruiter during an interaction with a user. This is done using communicative performance computation based on the behaviours of the user as detected by a recognition module. The proposed software pipeline is part of the TARDIS system which aims to aid young job seekers in acquiring job interview related social skills. In this context, our system enables the virtual recruiter to realistically adapt and react to the user in real-time.

Elisabeth André

What is connected

Connect this record

See the researcher in context

Building this map preview

14 published item(s)

Video Joint-Embedding Predictive Architectures for Facial Expression Recognition

"GAN I hire you?" -- A System for Personalized Virtual Job Interview Training

Alterfactual Explanations -- The Relevance of Irrelevance for Explaining AI Systems

Alternative Data Augmentation for Industrial Monitoring using Adversarial Learning

Benchmarking Perturbation-based Saliency Maps for Explaining Atari Agents

Do Deep Neural Networks Forget Facial Action Units? -- Exploring the Effects of Transfer Learning in Health Related Facial Expression Recognition

Employing Socially Interactive Agents for Robotic Neurorehabilitation Training

Intercategorical Label Interpolation for Emotional Face Generation with Conditional Generative Adversarial Networks

VoiceMe: Personalized voice generation in TTS

Dynamic Difficulty Adjustment in Virtual Reality Exergames through Experience-driven Procedural Content Generation

GANterfactual - Counterfactual Explanations for Medical Non-Experts using Generative Adversarial Learning

Local and Global Explanations of Agent Behavior: Integrating Strategy Summaries with Saliency Maps

Resonating Experiences of Self and Others enabled by a Tangible Somaesthetic Design

Interpreting social cues to generate credible affective reactions of virtual job interviewers