Source author record

Victor-Alexandru Darvariu

Victor-Alexandru Darvariu appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Robotics

Catalog footprint

What is connected

2works

3topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Quantile-Coupled Flow Matching for Distributional Reinforcement Learning

Unlike standard expected-return Reinforcement Learning (RL), Distributional RL (DRL) models the full return distribution, making it better-suited for uncertainty-aware and risk-sensitive decision-making. Conditional Flow Matching (CFM) critics have recently attracted attention for modelling continuous, multi-modal return distributions. Despite this interest, there remains a substantial metric mismatch: DRL theory relies on the distributional Bellman operator being contractive in the $p$-Wasserstein distance, yet existing CFM critics are trained with arbitrary source-target couplings, so their flow-matching losses are not Wasserstein-aligned surrogates for matching Bellman target return distributions. In this work, we address this mismatch by proposing FlowIQN, a CFM critic that sorts source and Bellman target samples within each mini-batch to approximate the monotone optimal transport coupling, replacing arbitrary pairings with quantile-aligned flow paths. We prove that the loss of our quantile-coupled CFM critic yields a Wasserstein-aligned approximate projection compatible with the foundations of DRL. To our knowledge, FlowIQN is the first flow-matching distributional critic with an explicit Wasserstein-aligned projection guarantee. We further extend FlowIQN with shortcut models for efficient inference. Empirical results show that FlowIQN improves Wasserstein return-distribution accuracy over other CFM critics. It also yields competitive performance on offline RL benchmarks across multiple policy extraction methods, providing a theoretically grounded CFM critic that is readily compatible with DRL pipelines. Code: https://github.com/ori-goals/flowIQN.

preprint2022arXiv

Planning Spatial Networks with Monte Carlo Tree Search

We tackle the problem of goal-directed graph construction: given a starting graph, a budget of modifications, and a global objective function, the aim is to find a set of edges whose addition to the graph achieves the maximum improvement in the objective (e.g., communication efficiency). This problem emerges in many networks of great importance for society such as transportation and critical infrastructure networks. We identify two significant shortcomings with present methods. Firstly, they focus exclusively on network topology while ignoring spatial information; however, in many real-world networks, nodes are embedded in space, which yields different global objectives and governs the range and density of realizable connections. Secondly, existing RL methods scale poorly to large networks due to the high cost of training a model and the scaling factors of the action space and global objectives. In this work, we formulate this problem as a deterministic MDP. We adopt the Monte Carlo Tree Search framework for planning in this domain, prioritizing the optimality of final solutions over the speed of policy evaluation. We propose several improvements over the standard UCT algorithm for this family of problems, addressing their single-agent nature, the trade-off between the costs of edges and their contribution to the objective, and an action space linear in the number of nodes. We demonstrate the suitability of this approach for improving the global efficiency and attack resilience of a variety of synthetic and real-world networks, including Internet backbone networks and metro systems. Our approach obtains a 24% improvement in these metrics compared to UCT on the largest networks tested and scalability superior to previous methods.