Source author record

Balázs Kulcsár

Balázs Kulcsár appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Artificial Intelligence eess.SY Systems and Control

Catalog footprint

What is connected

3works

4topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Two-Stage Learned Decomposition for Scalable Routing on Multigraphs

Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node permutation stage and an edge selection stage. To enable the decomposition, we introduce a pre-encoding edge aggregation scheme and a non-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state-of-the-art in terms of solution quality, while being significantly faster in training and inference.

preprint2023arXiv

A Predictive Chance Constraint Rebalancing Approach to Mobility-on-Demand Services

This paper considers the problem of supply-demand imbalances in Mobility-on-Demand (MoD) services, such as Uber or DiDi Rider. Such imbalances are due to uneven stochastic travel demand and can be prevented by proactively rebalance empty vehicles. To this end we propose a method that include estimated stochastic travel demand patterns into stochastic model predictive control (SMPC) for rebalancing of empty vehicles MoD ride-hailing service. More precisely, we first estimate passenger travel demand using Gaussian Process Regression (GPR), which provides demand uncertainty bounds for time pattern prediction. We then formulate a SMPC for the autonomous ride-hailing service and integrate demand predictions with uncertainty bounds into a receding horizon MoD optimization. In order to guarantee constraint satisfaction in the above optimization under estimated stochastic demand prediction, we employ a probabilistic constraining method with user defined confidence interval. Receding horizon MoD optimization with probabilistic constraints thereby calls for Chance Constrained Model Predictive Control (CCMPC). The benefits of the proposed method are twofold. First, travel demand uncertainty prediction from data can naturally be embedded into the MoD optimization framework. We show that for a given minimal fleet size the imbalance in each station can be kept below a certain threshold with a user defined probability. Second, CCMPC can further be relaxed into a Mixed-Integer-LP (MILP) and we show that the MILP can be solved as a corresponding Linear-Program which always admits a integral solution. Finally, we demonstrate through high-fidelity transportation simulations, that by tuning the confidence bound on the chance constraint close to optimal oracle performance can be achieved. The corresponding median customer wait time is reduced by 4% compared to using only the mean prediction of the GPR.

preprint2022arXiv

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the help of the policy gradient theorem and the neural tangent kernel. Then, this enables us the evaluation of the policy at arbitrary states too. In the same spirit, learning can be guided, ensuring safety via augmenting episode batches with states where the desired action probabilities are prescribed. Finally, exogenous discounted sum of future rewards (returns) can be computed at these specific state-action pairs such that the policy network satisfies constraints. Computing the returns is based on solving a system of linear equations (equality constraints) or a constrained quadratic program (inequality constraints, regional constraints). Simulation results suggest that adding constraints (external information) to the learning can improve learning in terms of speed and transparency reasonably if constraints are appropriately selected. The efficiency of the constrained learning was demonstrated with a shallow and wide ReLU network in the Cartpole and Lunar Lander OpenAI gym environments. The main novelty of the paper is giving a practical use of the neural tangent kernel in reinforcement learning.