Source author record

Dietmar Jannach

Dietmar Jannach appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Information Retrieval Artificial Intelligence Computation and Language Human-Computer Interaction Machine Learning Neural and Evolutionary Computing Computational Complexity Computer Vision Social and Information Networks Software Engineering

Catalog footprint

What is connected

17works

10topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Do Composed Image Retrieval Benchmarks Require Multimodal Composition?

Composed Image Retrieval (CIR) is a multimodal retrieval task where a query consists of a reference image and a textual modification, and the goal is to retrieve a target image satisfying both. In principle, strong performance on CIR benchmarks is assumed to require multimodal composition, i.e., combining complementary information from reference image and textual modification. In this work, we show that this assumption does not always hold. Across four widely used CIR benchmarks and eleven Generalist Multimodal Embedding models, a large fraction of queries can be solved using a single modality (from 32.2% to 83.6%), revealing pervasive unimodal shortcuts. Thus, high CIR performance can arise from unimodal signals rather than true multimodal composition. To better understand this issue, we perform a two-stage audit. First, we identify shortcut-solvable queries through cross-model analysis. Second, we conduct human validation on 4,741 shortcut-free queries, of which only 1,689 are well-formed, with common issues including ambiguous edits and mismatched targets. Re-evaluating models on this validated subset reveals qualitatively different behaviour: queries can no longer be solved with a single modality, and successful retrieval requires combining both inputs. While accuracy decreases, reliance on multimodal information increases. Overall, current CIR benchmarks conflate shortcut-solvable, noisy, and genuinely compositional queries, leading to an overestimation of model capability in multimodal composition.

preprint2022arXiv

Are Query-Based Ontology Debuggers Really Helping Knowledge Engineers?

Real-world semantic or knowledge-based systems, e.g., in the biomedical domain, can become large and complex. Tool support for the localization and repair of faults within knowledge bases of such systems can therefore be essential for their practical success. Correspondingly, a number of knowledge base debugging approaches, in particular for ontology-based systems, were proposed throughout recent years. Query-based debugging is a comparably recent interactive approach that localizes the true cause of an observed problem by asking knowledge engineers a series of questions. Concrete implementations of this approach exist, such as the OntoDebug plug-in for the ontology editor Protégé. To validate that a newly proposed method is favorable over an existing one, researchers often rely on simulation-based comparisons. Such an evaluation approach however has certain limitations and often cannot fully inform us about a method's true usefulness. We therefore conducted different user studies to assess the practical value of query-based ontology debugging. One main insight from the studies is that the considered interactive approach is indeed more efficient than an alternative algorithmic debugging based on test cases. We also observed that users frequently made errors in the process, which highlights the importance of a careful design of the queries that users need to answer.

preprint2022arXiv

Balancing Consumer and Business Value of Recommender Systems: A Simulation-based Analysis

Automated recommendations can nowadays be found on many e-commerce platforms, and such recommendations can create substantial value for consumers and providers. Often, however, not all recommendable items have the same profit margin, and providers might thus be tempted to promote items that maximize their profit. In the short run, consumers might accept non-optimal recommendations, but they may lose their trust in the long run. Ultimately, this leads to the problem of designing balanced recommendation strategies, which consider both consumer and provider value and lead to sustained business success. This work proposes a simulation framework based on agent-based modeling designed to help providers explore longitudinal dynamics of different recommendation strategies. In our model, consumer agents receive recommendations from providers, and the perceived quality of the recommendations influences the consumers' trust over time. We design several recommendation strategies which either give more weight on provider profit or on consumer utility. Our simulations show that a hybrid strategy that puts more weight on consumer utility but without ignoring profitability considerations leads to the highest cumulative profit in the long run. This hybrid strategy results in a profit increase of about 20 % compared to pure consumer or profit oriented strategies. We also find that social media can reinforce the observed phenomena. In case when consumers heavily rely on social media, the cumulative profit of the best strategy further increases. To ensure reproducibility and foster future research, we publicly share our flexible simulation framework.

preprint2022arXiv

Conversational Recommendation: A Grand AI Challenge

Animated avatars, which look and talk like humans, are iconic visions of the future of AI-powered systems. Through many sci-fi movies we are acquainted with the idea of speaking to such virtual personalities as if they were humans. Today, we talk more and more to machines like Apple's Siri, e.g., to ask them for the weather forecast. However, when asked for recommendations, e.g., for a restaurant to go to, the limitations of such devices quickly become obvious. They do not engage in a conversation to find out what we might prefer, they often do not provide explanations for what they recommend, and they may have difficulties remembering what was said one minute earlier. Conversational recommender systems promise to address these limitations. In this paper, we review existing approaches to build such systems, which developments we observe today, which challenges are still open and why the development of conversational recommenders represents one of the next grand challenges of AI.

preprint2022arXiv

Conversational Recommendation: Theoretical Model and Complexity Analysis

Recommender systems are software applications that help users find items of interest in situations of information overload in a personalized way, using knowledge about the needs and preferences of individual users. In conversational recommendation approaches, these needs and preferences are acquired by the system in an interactive, multi-turn dialog. A common approach in the literature to drive such dialogs is to incrementally ask users about their preferences regarding desired and undesired item features or regarding individual items. A central research goal in this context is efficiency, evaluated with respect to the number of required interactions until a satisfying item is found. This is usually accomplished by making inferences about the best next question to ask to the user. Today, research on dialog efficiency is almost entirely empirical, aiming to demonstrate, for example, that one strategy for selecting questions is better than another one in a given application. With this work, we complement empirical research with a theoretical, domain-independent model of conversational recommendation. This model, which is designed to cover a range of application scenarios, allows us to investigate the efficiency of conversational approaches in a formal way, in particular with respect to the computational complexity of devising optimal interaction strategies. Through such a theoretical analysis we show that finding an efficient conversational strategy is NP-hard, and in PSPACE in general, but for particular kinds of catalogs the upper bound lowers to POLYLOGSPACE. From a practical point of view, this result implies that catalog characteristics can strongly influence the efficiency of individual conversational strategies and should therefore be considered when designing new strategies. A preliminary empirical analysis on datasets derived from a real-world one aligns with our findings.

preprint2022arXiv

INFACT: An Online Human Evaluation Framework for Conversational Recommendation

Conversational recommender systems (CRS) are interactive agents that support their users in recommendation-related goals through multi-turn conversations. Generally, a CRS can be evaluated in various dimensions. Today's CRS mainly rely on offline(computational) measures to assess the performance of their algorithms in comparison to different baselines. However, offline measures can have limitations, for example, when the metrics for comparing a newly generated response with a ground truth do not correlate with human perceptions, because various alternative generated responses might be suitable too in a given dialog situation. Current research on machine learning-based CRS models therefore acknowledges the importance of humans in the evaluation process, knowing that pure offline measures may not be sufficient in evaluating a highly interactive system like a CRS.

preprint2022arXiv

INSPIRED2: An Improved Dataset for Sociable Conversational Recommendation

Conversational recommender systems (CRS) that are able to interact with users in natural language often utilize recommendation dialogs which were previously collected with the help of paired humans, where one plays the role of a seeker and the other as a recommender. These recommendation dialogs include items and entities that indicate the users' preferences. In order to precisely model the seekers' preferences and respond consistently, CRS typically rely on item and entity annotations. A recent example of such a dataset is INSPIRED, which consists of recommendation dialogs for sociable conversational recommendation, where items and entities were annotated using automatic keyword or pattern matching techniques. An analysis of this dataset unfortunately revealed that there is a substantial number of cases where items and entities were either wrongly annotated or annotations were missing at all. This leads to the question to what extent automatic techniques for annotations are effective. Moreover, it is important to study impact of annotation quality on the overall effectiveness of a CRS in terms of the quality of the system's responses. To study these aspects, we manually fixed the annotations in INSPIRED. We then evaluated the performance of several benchmark CRS using both versions of the dataset. Our analyses suggest that the improved version of the dataset, i.e., INSPIRED2, helped increase the performance of several benchmark CRS, emphasizing the importance of data quality both for end-to-end learning and retrieval-based approaches to conversational recommendation. We release our improved dataset (INSPIRED2) publicly at https://github.com/ahtsham58/INSPIRED2.

preprint2022arXiv

Top-N Recommendation Algorithms: A Quest for the State-of-the-Art

Research on recommender systems algorithms, like other areas of applied machine learning, is largely dominated by efforts to improve the state-of-the-art, typically in terms of accuracy measures. Several recent research works however indicate that the reported improvements over the years sometimes "don't add up", and that methods that were published several years ago often outperform the latest models when evaluated independently. Different factors contribute to this phenomenon, including that some researchers probably often only fine-tune their own models but not the baselines. In this paper, we report the outcomes of an in-depth, systematic, and reproducible comparison of ten collaborative filtering algorithms - covering both traditional and neural models - on several common performance measures on three datasets which are frequently used for evaluation in the recent literature. Our results show that there is no consistent winner across datasets and metrics for the examined top-n recommendation task. Moreover, we find that for none of the accuracy measurements any of the considered neural models led to the best performance. Regarding the performance ranking of algorithms across the measurements, we found that linear models, nearest-neighbor methods, and traditional matrix factorization consistently perform well for the evaluated modest-sized, but commonly-used datasets. Our work shall therefore serve as a guideline for researchers regarding existing baselines to consider in future performance comparisons. Moreover, by providing a set of fine-tuned baseline models for different datasets, we hope that our work helps to establish a common understanding of the state-of-the-art for top-n recommendation tasks.

preprint2021arXiv

A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research

The design of algorithms that generate personalized ranked item lists is a central topic of research in the field of recommender systems. In the past few years, in particular, approaches based on deep learning (neural) techniques have become dominant in the literature. For all of them, substantial progress over the state-of-the-art is claimed. However, indications exist of certain problems in today's research practice, e.g., with respect to the choice and optimization of the baselines used for comparison, raising questions about the published claims. In order to obtain a better understanding of the actual progress, we have tried to reproduce recent results in the area of neural recommendation approaches based on collaborative filtering. The worrying outcome of the analysis of these recent works-all were published at prestigious scientific conferences between 2015 and 2018-is that 11 out of the 12 reproducible neural approaches can be outperformed by conceptually simple methods, e.g., based on the nearest-neighbor heuristics. None of the computationally complex neural methods was actually consistently better than already existing learning-based techniques, e.g., using matrix factorization or linear models. In our analysis, we discuss common issues in today's research practice, which, despite the many papers that are published on the topic, have apparently led the field to a certain level of stagnation.

preprint2021arXiv

Digital Nudging with Recommender Systems: Survey and Future Directions

Recommender systems are nowadays a pervasive part of our online user experience, where they either serve as information filters or provide us with suggestions for additionally relevant content. These systems thereby influence which information is easily accessible to us and thus affect our decision-making processes though the automated selection and ranking of the presented content. Automated recommendations can therefore be seen as digital nudges, because they determine different aspects of the choice architecture for users. In this work, we examine the relationship between digital nudging and recommender systems, topics that so far were mostly investigated in isolation. Through a systematic literature search, we first identified 87 nudging mechanisms, which we categorize in a novel taxonomy. A subsequent analysis then shows that only a small part of these nudging mechanisms was previously investigated in the context of recommender systems. This indicates that there is a huge potential to develop future recommender systems that leverage the power of digital nudging in order to influence the decision-making of users. In this work, we therefore outline potential ways of integrating nudging mechanisms into recommender systems.

preprint2021arXiv

Towards Retrieval-based Conversational Recommendation

Conversational recommender systems have attracted immense attention recently. The most recent approaches rely on neural models trained on recorded dialogs between humans, implementing an end-to-end learning process. These systems are commonly designed to generate responses given the user's utterances in natural language. One main challenge is that these generated responses both have to be appropriate for the given dialog context and must be grammatically and semantically correct. An alternative to such generation-based approaches is to retrieve responses from pre-recorded dialog data and to adapt them if needed. Such retrieval-based approaches were successfully explored in the context of general conversational systems, but have received limited attention in recent years for CRS. In this work, we re-assess the potential of such approaches and design and evaluate a novel technique for response retrieval and ranking. A user study (N=90) revealed that the responses by our system were on average of higher quality than those of two recent generation-based systems. We furthermore found that the quality ranking of the two generation-based approaches is not aligned with the results from the literature, which points to open methodological questions. Overall, our research underlines that retrieval-based approaches should be considered an alternative or complement to language generation approaches.

preprint2020arXiv

A systematic review and taxonomy of explanations in decision support and recommender systems

With the recent advances in the field of artificial intelligence, an increasing number of decision-making tasks are delegated to software systems. A key requirement for the success and adoption of such systems is that users must trust system choices or even fully automated decisions. To achieve this, explanation facilities have been widely investigated as a means of establishing trust in these systems since the early years of expert systems. With today's increasingly sophisticated machine learning algorithms, new challenges in the context of explanations, accountability, and trust towards such systems constantly arise. In this work, we systematically review the literature on explanations in advice-giving systems. This is a family of systems that includes recommender systems, which is one of the most successful classes of advice-giving software in practice. We investigate the purposes of explanations as well as how they are generated, presented to users, and evaluated. As a result, we derive a novel comprehensive taxonomy of aspects to be considered when designing explanation facilities for current and future decision support systems. The taxonomy includes a variety of different facets, such as explanation objective, responsiveness, content and presentation. Moreover, we identified several challenges that remain unaddressed so far, for example related to fine-grained issues associated with the presentation of explanations and how explanation facilities are evaluated.

preprint2020arXiv

Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems

In recent years, algorithm research in the area of recommender systems has shifted from matrix factorization techniques and their latent factor models to neural approaches. However, given the proven power of latent factor models, some newer neural approaches incorporate them within more complex network architectures. One specific idea, recently put forward by several researchers, is to consider potential correlations between the latent factors, i.e., embeddings, by applying convolutions over the user-item interaction map. However, contrary to what is claimed in these articles, such interaction maps do not share the properties of images where Convolutional Neural Networks (CNNs) are particularly useful. In this work, we show through analytical considerations and empirical evaluations that the claimed gains reported in the literature cannot be attributed to the ability of CNNs to model embedding correlations, as argued in the original papers. Moreover, additional performance evaluations show that all of the examined recent CNN-based models are outperformed by existing non-neural machine learning techniques or traditional nearest-neighbor approaches. On a more general level, our work points to major methodological issues in recommender systems research.

preprint2020arXiv

Exploring Longitudinal Effects of Session-based Recommendations

Session-based recommendation is a problem setting where the task of a recommender system is to make suitable item suggestions based only on a few observed user interactions in an ongoing session. The lack of long-term preference information about individual users in such settings usually results in a limited level of personalization, where a small set of popular items may be recommended to many users. This repeated exposure of such a subset of the items through the recommendations may in turn lead to a reinforcement effect over time, and to a system which is not able to help users discover new content anymore to the desirable extent. In this work, we investigate such potential longitudinal effects of session-based recommendations in a simulation-based approach. Specifically, we analyze to what extent algorithms of different types may lead to concentration effects over time. Our experiments in the music domain reveal that all investigated algorithms---both neural and heuristic ones---may lead to lower item coverage and to a higher concentration on a subset of the items. Additional simulation experiments however also indicate that relatively simple re-ranking strategies, e.g., by avoiding too many repeated recommendations in the music domain, may help to deal with this problem.

preprint2020arXiv

Hybrid Session-based News Recommendation using Recurrent Neural Networks

We describe a hybrid meta-architecture -- the CHAMELEON -- for session-based news recommendation that is able to leverage a variety of information types using Recurrent Neural Networks. We evaluated our approach on two public datasets, using a temporal evaluation protocol that simulates the dynamics of a news portal in a realistic way. Our results confirm the benefits of modeling the sequence of session clicks with RNNs and leveraging side information about users and articles, resulting in significantly higher recommendation accuracy and catalog coverage than other session-based algorithms.

preprint2019arXiv

Beyond Personalization: Research Directions in Multistakeholder Recommendation

Recommender systems are personalized information access applications; they are ubiquitous in today's online environment, and effective at finding items that meet user needs and tastes. As the reach of recommender systems has extended, it has become apparent that the single-minded focus on the user common to academic research has obscured other important aspects of recommendation outcomes. Properties such as fairness, balance, profitability, and reciprocity are not captured by typical metrics for recommender system evaluation. The concept of multistakeholder recommendation has emerged as a unifying framework for describing and understanding recommendation settings where the end user is not the sole focus. This article describes the origins of multistakeholder recommendation, and the landscape of system designs. It provides illustrative examples of current research, as well as outlining open questions and research directions for the field.

preprint2015arXiv

Using Calculation Fragments for Spreadsheet Testing and Debugging

A number of automated techniques and tools were proposed in the research literature over the years which aim to support the spreadsheet developer in the process of testing and debugging a faulty spreadsheet. One underlying assumption of many of these approaches is that the spreadsheet developer is capable of providing test cases or is at least reliably able to determine whether a calculated value in a certain cell is correct given the current set of inputs. Since real-world spreadsheets can be complex, we argue that these assumptions might be too strong in some situations. We therefore propose to support the user during testing and debugging by automatically computing spreadsheet fragments of manageable size. The spreadsheet developer can then verify the correctness of a smaller set of formulas for which the calculated output can be more easily validated.

Dietmar Jannach

What is connected

Connect this record

See the researcher in context

Building this map preview

17 published item(s)

Do Composed Image Retrieval Benchmarks Require Multimodal Composition?

Are Query-Based Ontology Debuggers Really Helping Knowledge Engineers?

Balancing Consumer and Business Value of Recommender Systems: A Simulation-based Analysis

Conversational Recommendation: A Grand AI Challenge

Conversational Recommendation: Theoretical Model and Complexity Analysis

INFACT: An Online Human Evaluation Framework for Conversational Recommendation

INSPIRED2: An Improved Dataset for Sociable Conversational Recommendation

Top-N Recommendation Algorithms: A Quest for the State-of-the-Art

A Troubling Analysis of Reproducibility and Progress in Recommender Systems Research

Digital Nudging with Recommender Systems: Survey and Future Directions

Towards Retrieval-based Conversational Recommendation

A systematic review and taxonomy of explanations in decision support and recommender systems

Critically Examining the Claimed Value of Convolutions over User-Item Embedding Maps for Recommender Systems

Exploring Longitudinal Effects of Session-based Recommendations

Hybrid Session-based News Recommendation using Recurrent Neural Networks

Beyond Personalization: Research Directions in Multistakeholder Recommendation

Using Calculation Fragments for Spreadsheet Testing and Debugging