Source author record

Andreas Schwung

Andreas Schwung appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Computer Science and Game Theory Computer Vision

Catalog footprint

What is connected

6works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems

This paper proposes a policy-based deep reinforcement learning hyper-heuristic framework for solving the Job Shop Scheduling Problem. The hyper-heuristic agent learns to switch scheduling rules based on the system state dynamically. We extend the hyper-heuristic framework with two key mechanisms. First, action prefiltering restricts decision-making to feasible low-level actions, enabling low-level heuristics to be evaluated independently of environmental constraints and providing an unbiased assessment. Second, a commitment mechanism regulates the frequency of heuristic switching. We investigate the impact of different commitment strategies, from step-wise switching to full-episode commitment, on both training behavior and makespan. Additionally, we compare two action selection strategies at the policy level: deterministic greedy selection and stochastic sampling. Computational experiments on standard JSSP benchmarks demonstrate that the proposed approach outperforms traditional heuristics, metaheuristics, and recent neural network-based scheduling methods

preprint2026arXiv

Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures

We present a novel framework for solving Dynamic Job Shop Scheduling Problems under uncertainty, addressing the challenges introduced by stochastic job arrivals and unexpected machine breakdowns. Our approach follows a model-based paradigm, using Coloured Timed Petri Nets to represent the scheduling environment, and Maskable Proximal Policy Optimization to enable dynamic decision-making while restricting the agent to feasible actions at each decision point. To simulate realistic industrial conditions, dynamic job arrivals are modeled using a Gamma distribution, which captures complex temporal patterns such as bursts, clustering, and fluctuating workloads. Machine failures are modeled using a Weibull distribution to represent age-dependent degradation and wear-out dynamics. These stochastic models enable the framework to reflect real-world manufacturing scenarios better. In addition, we study two action-masking strategies: a non-gradient approach that overrides the probabilities of invalid actions, and a gradient-based approach that assigns negative gradients to invalid actions within the policy network. We conduct extensive experiments on dynamic JSSP benchmarks, demonstrating that our method consistently outperforms traditional heuristic and rule-based approaches in terms of makespan minimization. The results highlight the strength of combining interpretable Petri-net-based models with adaptive reinforcement learning policies, yielding a resilient, scalable, and explainable framework for real-time scheduling in dynamic and uncertain manufacturing environments.

preprint2025arXiv

Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems

This paper presents a novel online transfer learning approach in state-based potential games (TL-SbPGs) for distributed self-optimization in manufacturing systems. The approach targets practical industrial scenarios where knowledge sharing among similar players enhances learning in large-scale and decentralized environments. TL-SbPGs enable players to reuse learned policies from others, which improves learning outcomes and accelerates convergence. To accomplish this goal, we develop transfer learning concepts and similarity criteria for players, which offer two distinct settings: (a) predefined similarities between players and (b) dynamically inferred similarities between players during training. The applicability of the SbPG framework to transfer learning is formally established. Furthermore, we present a method to optimize the timing and weighting of knowledge transfer. Experimental results from a laboratory-scale testbed show that TL-SbPGs improve production efficiency and reduce power consumption compared to vanilla SbPGs.

preprint2022arXiv

AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection

Object detection networks have reached an impressive performance level, yet a lack of suitable data in specific applications often limits it in practice. Typically, additional data sources are utilized to support the training task. In these, however, domain gaps between different data sources pose a challenge in deep learning. GAN-based image-to-image style-transfer is commonly applied to shrink the domain gap, but is unstable and decoupled from the object detection task. We propose AWADA, an Attention-Weighted Adversarial Domain Adaptation framework for creating a feedback loop between style-transformation and detection task. By constructing foreground object attention maps from object detector proposals, we focus the transformation on foreground object regions and stabilize style-transfer training. In extensive experiments and ablation studies, we show that AWADA reaches state-of-the-art unsupervised domain adaptation object detection performance in the commonly used benchmarks for tasks such as synthetic-to-real, adverse weather and cross-camera adaptation.

preprint2020arXiv

Gradient Monitored Reinforcement Learning

This paper presents a novel neural network training approach for faster convergence and better generalization abilities in deep reinforcement learning. Particularly, we focus on the enhancement of training and evaluation performance in reinforcement learning algorithms by systematically reducing gradient's variance and thereby providing a more targeted learning process. The proposed method which we term as Gradient Monitoring(GM), is an approach to steer the learning in the weight parameters of a neural network based on the dynamic development and feedback from the training process itself. We propose different variants of the GM methodology which have been proven to increase the underlying performance of the model. The one of the proposed variant, Momentum with Gradient Monitoring (M-WGM), allows for a continuous adjustment of the quantum of back-propagated gradients in the network based on certain learning parameters. We further enhance the method with Adaptive Momentum with Gradient Monitoring (AM-WGM) method which allows for automatic adjustment between focused learning of certain weights versus a more dispersed learning depending on the feedback from the rewards collected. As a by-product, it also allows for automatic derivation of the required deep network sizes during training as the algorithm automatically freezes trained weights. The approach is applied to two discrete (Multi-Robot Co-ordination problem and Atari games) and one continuous control task (MuJoCo) using Advantage Actor-Critic (A2C) and Proximal Policy Optimization (PPO) respectively. The results obtained particularly underline the applicability and performance improvements of the methods in terms of generalization capability.

preprint2020arXiv

Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural Networks with Quaternion Attention

We propose a novel neural network architecture based on dual quaternions which allow for a compact representation of informations with a main focus on describing rigid body movements. To cover the dynamic behavior inherent to rigid body movements, we propose recurrent architectures in the neural network. To further model the interactions between individual rigid bodies as well as external inputs efficiently, we incorporate a novel attention mechanism employing dual quaternion algebra. The introduced architecture is trainable by means of gradient based algorithms. We apply our approach to a parcel prediction problem where a rigid body with an initial position, orientation, velocity and angular velocity moves through a fixed simulation environment which exhibits rich interactions between the parcel and the boundaries.

Andreas Schwung

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Policy-Based Deep Reinforcement Learning Hyperheuristics for Job-Shop Scheduling Problems

Policy-Based Reinforcement Learning with Action Masking for Dynamic Job Shop Scheduling under Uncertainty: Handling Random Arrivals and Machine Failures

Transfer learning of state-based potential games for process optimization in decentralized manufacturing systems

AWADA: Attention-Weighted Adversarial Domain Adaptation for Object Detection

Gradient Monitored Reinforcement Learning

Predicting Rigid Body Dynamics using Dual Quaternion Recurrent Neural Networks with Quaternion Attention