Source author record

Volker Tresp

Volker Tresp appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence Computer Vision Computation and Language cs.CY Databases eess.IV Information Retrieval Multiagent Systems Neural and Evolutionary Computing quant-ph

Catalog footprint

What is connected

47works

11topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

Large Language Models (LLMs) have demonstrated impressive capabilities across a wide range of NLP tasks, but they remain fundamentally stateless, constrained by limited context windows that hinder long-horizon reasoning. Recent efforts to address this limitation often augment LLMs with an external memory bank, yet most existing pipelines are static and heuristic-driven, lacking a learned mechanism for deciding what to store, update, or retrieve. We present Memory-R1, a reinforcement learning (RL) framework that equips LLMs with the ability to actively manage and utilize external memory through two specialized agents: a Memory Manager that learns structured operations, including ADD, UPDATE, DELETE, and NOOP; and an Answer Agent that pre-selects and reasons over relevant entries. Both agents are fine-tuned with outcome-driven RL (PPO and GRPO), enabling adaptive memory management with minimal supervision. With only 152 training QA pairs, Memory-R1 outperforms strong baselines and generalizes across diverse question types, three benchmarks (LoCoMo, MSC, LongMemEval), and multiple model scales (3B-14B).

preprint2026arXiv

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Visual instruction tuning adapts pre-trained Multimodal Large Language Models (MLLMs) to follow human instructions for real-world applications. However, the rapid growth of these datasets introduces significant redundancy, leading to increased computational costs. Existing methods for selecting instruction data aim to prune this redundancy, but predominantly rely on computationally demanding techniques such as proxy-based inference or training-based metrics. Consequently, the substantial computational costs incurred by these selection processes often exacerbate the very efficiency bottlenecks they are intended to resolve, posing a significant challenge to the scalable and effective tuning of MLLMs. To address this challenge, we first identify a critical, yet previously overlooked, factor: the anisotropy inherent in visual feature distributions. We find that this anisotropy induces a \textit{Global Semantic Drift}, and overlooking this phenomenon is a key factor limiting the efficiency of current data selection methods. Motivated by this insight, we devise \textbf{PRISM}, the first training-free framework for efficient visual instruction selection. PRISM surgically removes the corrupting influence of global background features by modeling the intrinsic visual semantics via implicit re-centering. Empirically, PRISM reduces the end-to-end time for data selection and model tuning to just 30\% of conventional pipelines. More remarkably, it achieves this efficiency while simultaneously enhancing performance, surpassing models fine-tuned on the full dataset across eight multimodal and three language understanding benchmarks, culminating in a 101.7\% relative improvement over the baseline. The code is available for access via \href{https://github.com/bibisbar/PRISM}{this repository}.

preprint2024arXiv

Differentiable Quantum Architecture Search For Job Shop Scheduling Problem

The Job shop scheduling problem (JSSP) plays a pivotal role in industrial applications, such as signal processing (SP) and steel manufacturing, involving sequencing machines and jobs to maximize scheduling efficiency. Before, JSSP was solved using manually defined circuits by variational quantum algorithm (VQA). Finding a good circuit architecture is task-specific and time-consuming. Differentiable quantum architecture search (DQAS) is a gradient-based framework that can automatically design circuits. However, DQAS is only tested on quantum approximate optimization algorithm (QAOA) and error mitigation tasks. Whether DQAS applies to JSSP based on a more flexible algorithm, such as variational quantum eigensolver (VQE), is still open for optimization problems. In this work, we redefine the operation pool and extend DQAS to a framework JSSP-DQAS by evaluating circuits to generate circuits for JSSP automatically. The experiments conclude that JSSP-DQAS can automatically find noise-resilient circuit architectures that perform much better than manually designed circuits. It helps to improve the efficiency of solving JSSP.

preprint2023arXiv

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Vision-Language Models (VLMs) are expected to be capable of reasoning with commonsense knowledge as human beings. One example is that humans can reason where and when an image is taken based on their knowledge. This makes us wonder if, based on visual cues, Vision-Language Models that are pre-trained with large-scale image-text resources can achieve and even outperform human's capability in reasoning times and location. To address this question, we propose a two-stage \recognition\space and \reasoning\space probing task, applied to discriminative and generative VLMs to uncover whether VLMs can recognize times and location-relevant features and further reason about it. To facilitate the investigation, we introduce WikiTiLo, a well-curated image dataset compromising images with rich socio-cultural cues. In the extensive experimental studies, we find that although VLMs can effectively retain relevant features in visual encoders, they still fail to make perfect reasoning. We will release our dataset and codes to facilitate future studies.

preprint2023arXiv

Modeling the evolution of temporal knowledge graphs with uncertainty

Forecasting future events is a fundamental challenge for temporal knowledge graphs (tKG). As in real life predicting a mean function is most of the time not sufficient, but the question remains how confident can we be about our prediction? Thus, in this work, we will introduce a novel graph neural network architecture (WGP-NN) employing (weighted) Gaussian processes (GP) to jointly model the temporal evolution of the occurrence probability of events and their time-dependent uncertainty. Especially we employ Gaussian processes to model the uncertainty of future links by their ability to predict predictive variance. This is in contrast to existing works, which are only able to express uncertainties in the learned entity representations. Moreover, WGP-NN can model parameter-free complex temporal and structural dynamics of tKGs in continuous time. We further demonstrate the model's state-of-the-art performance on two real-world benchmark datasets.

preprint2022arXiv

A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

The link prediction task on knowledge graphs without explicit negative triples in the training data motivates the usage of rank-based metrics. Here, we review existing rank-based metrics and propose desiderata for improved metrics to address lack of interpretability and comparability of existing metrics to datasets of different sizes and properties. We introduce a simple theoretical framework for rank-based metrics upon which we investigate two avenues for improvements to existing metrics via alternative aggregation functions and concepts from probability theory. We finally propose several new rank-based metrics that are more easily interpreted and compared accompanied by a demonstration of their usage in a benchmarking of knowledge graph embedding models.

preprint2022arXiv

Are Vision Transformers Robust to Patch Perturbations?

Recent advances in Vision Transformer (ViT) have demonstrated its impressive performance in image classification, which makes it a promising alternative to Convolutional Neural Network (CNN). Unlike CNNs, ViT represents an input image as a sequence of image patches. The patch-based input image representation makes the following question interesting: How does ViT perform when individual input image patches are perturbed with natural corruptions or adversarial perturbations, compared to CNNs? In this work, we study the robustness of ViT to patch-wise perturbations. Surprisingly, we find that ViTs are more robust to naturally corrupted patches than CNNs, whereas they are more vulnerable to adversarial patches. Furthermore, we discover that the attention mechanism greatly affects the robustness of vision transformers. Specifically, the attention module can help improve the robustness of ViT by effectively ignoring natural corrupted patches. However, when ViTs are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake. Based on our analysis, we propose a simple temperature-scaling based method to improve the robustness of ViT against adversarial patches. Extensive qualitative and quantitative experiments are performed to support our findings, understanding, and improvement of ViT robustness to patch-wise perturbations across a set of transformer-based architectures.

preprint2022arXiv

Biologically Inspired Neural Path Finding

The human brain can be considered to be a graphical structure comprising of tens of billions of biological neurons connected by synapses. It has the remarkable ability to automatically re-route information flow through alternate paths in case some neurons are damaged. Moreover, the brain is capable of retaining information and applying it to similar but completely unseen scenarios. In this paper, we take inspiration from these attributes of the brain, to develop a computational framework to find the optimal low cost path between a source node and a destination node in a generalized graph. We show that our framework is capable of handling unseen graphs at test time. Moreover, it can find alternate optimal paths, when nodes are arbitrarily added or removed during inference, while maintaining a fixed prediction time. Code is available here: https://github.com/hangligit/pathfinding

preprint2022arXiv

Continuous Temporal Graph Networks for Event-Based Graph Data

There has been an increasing interest in modeling continuous-time dynamics of temporal graph data. Previous methods encode time-evolving relational information into a low-dimensional representation by specifying discrete layers of neural networks, while real-world dynamic graphs often vary continuously over time. Hence, we propose Continuous Temporal Graph Networks (CTGNs) to capture the continuous dynamics of temporal graph data. We use both the link starting timestamps and link duration as evolving information to model the continuous dynamics of nodes. The key idea is to use neural ordinary differential equations (ODE) to characterize the continuous dynamics of node representations over dynamic graphs. We parameterize ordinary differential equations using a novel graph neural network. The existing dynamic graph networks can be considered as a specific discretization of CTGNs. Experiment results on both transductive and inductive tasks demonstrate the effectiveness of our proposed approach over competitive baselines.

preprint2022arXiv

On Calibration of Graph Neural Networks for Node Classification

Graphs can model real-world, complex systems by representing entities and their interactions in terms of nodes and edges. To better exploit the graph structure, graph neural networks have been developed, which learn entity and edge embeddings for tasks such as node classification and link prediction. These models achieve good performance with respect to accuracy, but the confidence scores associated with the predictions might not be calibrated. That means that the scores might not reflect the ground-truth probabilities of the predicted events, which would be especially important for safety-critical applications. Even though graph neural networks are used for a wide range of tasks, the calibration thereof has not been sufficiently explored yet. We investigate the calibration of graph neural networks for node classification, study the effect of existing post-processing calibration methods, and analyze the influence of model capacity, graph density, and a new loss function on calibration. Further, we propose a topology-aware calibration method that takes the neighboring nodes into account and yields improved calibration compared to baseline methods.

preprint2022arXiv

Relationformer: A Unified Framework for Image-to-Graph Generation

A comprehensive representation of an image requires understanding objects and their mutual relationship, especially in image-to-graph generation, e.g., road network extraction, blood-vessel network extraction, or scene graph generation. Traditionally, image-to-graph generation is addressed with a two-stage approach consisting of object detection followed by a separate relation prediction, which prevents simultaneous object-relation interaction. This work proposes a unified one-stage transformer-based framework, namely Relationformer, that jointly predicts objects and their relations. We leverage direct set-based object prediction and incorporate the interaction among the objects to learn an object-relation representation jointly. In addition to existing [obj]-tokens, we propose a novel learnable token, namely [rln]-token. Together with [obj]-tokens, [rln]-token exploits local and global semantic reasoning in an image through a series of mutual associations. In combination with the pair-wise [obj]-token, the [rln]-token contributes to a computationally efficient relation prediction. We achieve state-of-the-art performance on multiple, diverse and multi-domain datasets that demonstrate our approach's effectiveness and generalizability.

preprint2022arXiv

Temporal Knowledge Graph Forecasting with Neural ODE

There has been an increasing interest in inferring future links on temporal knowledge graphs (KG). While links on temporal KGs vary continuously over time, the existing approaches model the temporal KGs in discrete state spaces. To this end, we propose a novel continuum model by extending the idea of neural ordinary differential equations (ODEs) to multi-relational graph convolutional networks. The proposed model preserves the continuous nature of dynamic multi-relational graph data and encodes both temporal and structural information into continuous-time dynamic embeddings. In addition, a novel graph transition layer is applied to capture the transitions on the dynamic graph, i.e., edge formation and dissolution. We perform extensive experiments on five benchmark datasets for temporal KG reasoning, showing our model's superior performance on the future link forecasting task.

preprint2022arXiv

TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs

Conventional static knowledge graphs model entities in relational data as nodes, connected by edges of specific relation types. However, information and knowledge evolve continuously, and temporal dynamics emerge, which are expected to influence future situations. In temporal knowledge graphs, time information is integrated into the graph by equipping each edge with a timestamp or a time range. Embedding-based methods have been introduced for link prediction on temporal knowledge graphs, but they mostly lack explainability and comprehensible reasoning chains. Particularly, they are usually not designed to deal with link forecasting -- event prediction involving future timestamps. We address the task of link forecasting on temporal knowledge graphs and introduce TLogic, an explainable framework that is based on temporal logical rules extracted via temporal random walks. We compare TLogic with state-of-the-art baselines on three benchmark datasets and show better overall performance while our method also provides explanations that preserve time consistency. Furthermore, in contrast to most state-of-the-art embedding-based methods, TLogic works well in the inductive setting where already learned rules are transferred to related datasets with a common vocabulary.

preprint2021arXiv

Effective and Efficient Vote Attack on Capsule Networks

Standard Convolutional Neural Networks (CNNs) can be easily fooled by images with small quasi-imperceptible artificial perturbations. As alternatives to CNNs, the recently proposed Capsule Networks (CapsNets) are shown to be more robust to white-box attacks than CNNs under popular attack protocols. Besides, the class-conditional reconstruction part of CapsNets is also used to detect adversarial examples. In this work, we investigate the adversarial robustness of CapsNets, especially how the inner workings of CapsNets change when the output capsules are attacked. The first observation is that adversarial examples misled CapsNets by manipulating the votes from primary capsules. Another observation is the high computational cost, when we directly apply multi-step attack methods designed for CNNs to attack CapsNets, due to the computationally expensive routing mechanism. Motivated by these two observations, we propose a novel vote attack where we attack votes of CapsNets directly. Our vote attack is not only effective but also efficient by circumventing the routing process. Furthermore, we integrate our vote attack into the detection-aware attack paradigm, which can successfully bypass the class-conditional reconstruction based detection method. Extensive experiments demonstrate the superior attack performance of our vote attack on CapsNets.

preprint2021arXiv

Few-Shot One-Class Classification via Meta-Learning

Although few-shot learning and one-class classification (OCC), i.e., learning a binary classifier with data from only one class, have been separately well studied, their intersection remains rather unexplored. Our work addresses the few-shot OCC problem and presents a method to modify the episodic data sampling strategy of the model-agnostic meta-learning (MAML) algorithm to learn a model initialization particularly suited for learning few-shot OCC tasks. This is done by explicitly optimizing for an initialization which only requires few gradient steps with one-class minibatches to yield a performance increase on class-balanced test data. We provide a theoretical analysis that explains why our approach works in the few-shot OCC scenario, while other meta-learning algorithms fail, including the unmodified MAML. Our experiments on eight datasets from the image and time-series domains show that our method leads to better results than classical OCC and few-shot classification approaches, and demonstrate the ability to learn unseen tasks from only few normal class samples. Moreover, we successfully train anomaly detectors for a real-world application on sensor readings recorded during industrial manufacturing of workpieces with a CNC milling machine, by using few normal examples. Finally, we empirically demonstrate that the proposed data sampling technique increases the performance of more recent meta-learning algorithms in few-shot OCC and yields state-of-the-art results in this problem setting.

preprint2021arXiv

Interpretable Graph Capsule Networks for Object Recognition

Capsule Networks, as alternatives to Convolutional Neural Networks, have been proposed to recognize objects from images. The current literature demonstrates many advantages of CapsNets over CNNs. However, how to create explanations for individual classifications of CapsNets has not been well explored. The widely used saliency methods are mainly proposed for explaining CNN-based classifications; they create saliency map explanations by combining activation values and the corresponding gradients, e.g., Grad-CAM. These saliency methods require a specific architecture of the underlying classifiers and cannot be trivially applied to CapsNets due to the iterative routing mechanism therein. To overcome the lack of interpretability, we can either propose new post-hoc interpretation methods for CapsNets or modifying the model to have build-in explanations. In this work, we explore the latter. Specifically, we propose interpretable Graph Capsule Networks (GraCapsNets), where we replace the routing part with a multi-head attention-based Graph Pooling approach. In the proposed model, individual classification explanations can be created effectively and efficiently. Our model also demonstrates some unexpected benefits, even though it replaces the fundamental part of CapsNets. Our GraCapsNets achieve better classification performance with fewer parameters and better adversarial robustness, when compared to CapsNets. Besides, GraCapsNets also keep other advantages of CapsNets, namely, disentangled representations and affine transformation robustness.

Volker Tresp

What is connected

Connect this record

See the researcher in context

Building this map preview

47 published item(s)

Memory-R1: Enhancing Large Language Model Agents to Manage and Utilize Memories via Reinforcement Learning

PRISM: Self-Pruning Intrinsic Selection Method for Training-Free Multimodal Data Selection

Differentiable Quantum Architecture Search For Job Shop Scheduling Problem

Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning

Modeling the evolution of temporal knowledge graphs with uncertainty

A Unified Framework for Rank-based Evaluation Metrics for Link Prediction in Knowledge Graphs

Are Vision Transformers Robust to Patch Perturbations?

Biologically Inspired Neural Path Finding

Continuous Temporal Graph Networks for Event-Based Graph Data

On Calibration of Graph Neural Networks for Node Classification

Relationformer: A Unified Framework for Image-to-Graph Generation

Temporal Knowledge Graph Forecasting with Neural ODE

TLogic: Temporal Logical Rules for Explainable Link Forecasting on Temporal Knowledge Graphs

Effective and Efficient Vote Attack on Capsule Networks

Few-Shot One-Class Classification via Meta-Learning

Interpretable Graph Capsule Networks for Object Recognition

ARCADe: A Rapid Continual Anomaly Detector

Contextual Prediction Difference Analysis for Explaining Individual Image Classifications

Curiosity-Driven Experience Prioritization via Density Estimation

Debate Dynamics for Human-comprehensible Fact-checking on Knowledge Graphs

Efficient Dialog Policy Learning via Positive Memory Retention

Energy-Based Hindsight Experience Prioritization

Graph Hawkes Neural Network for Forecasting on Temporal Knowledge Graphs

Improving the Robustness of Capsule Networks to Image Affine Transformations

Integrating Logical Rules Into Neural Multi-Hop Reasoning for Drug Repurposing

Introspective Learning by Distilling Knowledge from Online Self-explanation

Learning Goal-Oriented Visual Dialog via Tempered Policy Gradient

Maximum Entropy-Regularized Multi-Goal Reinforcement Learning

Mutual Information-based State-Control for Intrinsically Motivated Reinforcement Learning

PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings

Reasoning on Knowledge Graphs with Debate Dynamics

Scene Graph Reasoning for Visual Question Answering

Search for Better Students to Learn Distilled Knowledge

The Tensor Brain: Semantic Decoding for Perception and Memory

Walking the Tightrope: An Investigation of the Convolutional Autoencoder Bottleneck

Going Digital: A Survey on Digitalization and Large Scale Data Analytics in Healthcare

Learning with Memory Embeddings

Predicting Clinical Events by Combining Static and Dynamic Information Using Recurrent Neural Networks

Predictive Clinical Decision Support System with RNN Encoding and Tensor Decoding

Relational Models

A Review of Relational Machine Learning for Knowledge Graphs

Type-Constrained Representation Learning in Knowledge Graphs

Logistic Tensor Factorization for Multi-Relational Data

Mixture Approximations to Bayesian Networks

Towards a New Science of a Clinical Data Intelligence

Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes

Infinite Hidden Relational Models