Researcher profile

Balázs Kulcsár

Balázs Kulcsár contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
4topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Two-Stage Learned Decomposition for Scalable Routing on Multigraphs

Most neural methods for Vehicle Routing Problems (VRPs) are limited to Euclidean settings or simple graphs. In this work, we instead consider multigraphs, where parallel edges represent distinct travel options with varying trade-offs (e.g., distance vs time). Few methods are designed for such formulations and those that do exist face major scalability issues. We mitigate these scalability issues via a Node-Edge Policy Factorization (NEPF) approach, which splits the routing policy into a node permutation stage and an edge selection stage. To enable the decomposition, we introduce a pre-encoding edge aggregation scheme and a non-autoregressive architecture for the edge stage, as well as a hierarchical reinforcement learning method to train the stages jointly. Our experiments across six VRP variants demonstrate that NEPF matches or outperforms the state-of-the-art in terms of solution quality, while being significantly faster in training and inference.

preprint2023arXiv

A Predictive Chance Constraint Rebalancing Approach to Mobility-on-Demand Services

This paper considers the problem of supply-demand imbalances in Mobility-on-Demand (MoD) services, such as Uber or DiDi Rider. Such imbalances are due to uneven stochastic travel demand and can be prevented by proactively rebalance empty vehicles. To this end we propose a method that include estimated stochastic travel demand patterns into stochastic model predictive control (SMPC) for rebalancing of empty vehicles MoD ride-hailing service. More precisely, we first estimate passenger travel demand using Gaussian Process Regression (GPR), which provides demand uncertainty bounds for time pattern prediction. We then formulate a SMPC for the autonomous ride-hailing service and integrate demand predictions with uncertainty bounds into a receding horizon MoD optimization. In order to guarantee constraint satisfaction in the above optimization under estimated stochastic demand prediction, we employ a probabilistic constraining method with user defined confidence interval. Receding horizon MoD optimization with probabilistic constraints thereby calls for Chance Constrained Model Predictive Control (CCMPC). The benefits of the proposed method are twofold. First, travel demand uncertainty prediction from data can naturally be embedded into the MoD optimization framework. We show that for a given minimal fleet size the imbalance in each station can be kept below a certain threshold with a user defined probability. Second, CCMPC can further be relaxed into a Mixed-Integer-LP (MILP) and we show that the MILP can be solved as a corresponding Linear-Program which always admits a integral solution. Finally, we demonstrate through high-fidelity transportation simulations, that by tuning the confidence bound on the chance constraint close to optimal oracle performance can be achieved. The corresponding median customer wait time is reduced by 4% compared to using only the mean prediction of the GPR.

preprint2022arXiv

Constrained Policy Gradient Method for Safe and Fast Reinforcement Learning: a Neural Tangent Kernel Based Approach

This paper presents a constrained policy gradient algorithm. We introduce constraints for safe learning with the following steps. First, learning is slowed down (lazy learning) so that the episodic policy change can be computed with the help of the policy gradient theorem and the neural tangent kernel. Then, this enables us the evaluation of the policy at arbitrary states too. In the same spirit, learning can be guided, ensuring safety via augmenting episode batches with states where the desired action probabilities are prescribed. Finally, exogenous discounted sum of future rewards (returns) can be computed at these specific state-action pairs such that the policy network satisfies constraints. Computing the returns is based on solving a system of linear equations (equality constraints) or a constrained quadratic program (inequality constraints, regional constraints). Simulation results suggest that adding constraints (external information) to the learning can improve learning in terms of speed and transparency reasonably if constraints are appropriately selected. The efficiency of the constrained learning was demonstrated with a shallow and wide ReLU network in the Cartpole and Lunar Lander OpenAI gym environments. The main novelty of the paper is giving a practical use of the neural tangent kernel in reinforcement learning.