Researcher profile

Behzad Khamidehi

Behzad Khamidehi contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 19 - UnverifiedVerification L1Unclaimed author
5works
0followers
9topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

5 published item(s)

preprint2026arXiv

RLFTSim: Realistic and Controllable Multi-Agent Traffic Simulation via Reinforcement Learning Fine-Tuning

Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.

preprint2021arXiv

Dynamic Resource Management for Providing QoS in Drone Delivery Systems

Drones have been considered as an alternative means of package delivery to reduce the delivery cost and time. Due to the battery limitations, the drones are best suited for last-mile delivery, i.e., the delivery from the package distribution centers (PDCs) to the customers. Since a typical delivery system consists of multiple PDCs, each having random and time-varying demands, the dynamic drone-to-PDC allocation would be of great importance in meeting the demand in an efficient manner. In this paper, we study the dynamic UAV assignment problem for a drone delivery system with the goal of providing measurable Quality of Service (QoS) guarantees. We adopt a queueing theoretic approach to model the customer-service nature of the problem. Furthermore, we take a deep reinforcement learning approach to obtain a dynamic policy for the re-allocation of the UAVs. This policy guarantees a probabilistic upper-bound on the queue length of the packages waiting in each PDC, which is beneficial from both the service provider's and the customers' viewpoints. We evaluate the performance of our proposed algorithm by considering three broad arrival classes, including Bernoulli, Time-Varying Bernoulli, and Markov-Modulated Bernoulli arrivals. Our results show that the proposed method outperforms the baselines, particularly in scenarios with Time-Varying and Markov-Modulated Bernoulli arrivals, which are more representative of real-world demand patterns. Moreover, our algorithm satisfies the QoS constraints in all the studied scenarios while minimizing the average number of UAVs in use.

preprint2020arXiv

A Double Q-Learning Approach for Navigation of Aerial Vehicles with Connectivity Constraint

This paper studies the trajectory optimization problem for an aerial vehicle with the mission of flying between a pair of given initial and final locations. The objective is to minimize the travel time of the aerial vehicle ensuring that the communication connectivity constraint required for the safe operation of the aerial vehicle is satisfied. We consider two different criteria for the connectivity constraint of the aerial vehicle which leads to two different scenarios. In the first scenario, we assume that the maximum continuous time duration that the aerial vehicle is out of the coverage of the ground base stations (GBSs) is limited to a given threshold. In the second scenario, however, we assume that the total time periods that the aerial vehicle is not covered by the GBSs is restricted. Based on these two constraints, we formulate two trajectory optimization problems. To solve these non-convex problems, we use an approach based on the double Q-learning method which is a model-free reinforcement learning technique and unlike the existing algorithms does not need perfect knowledge of the environment. Moreover, in contrast to the well-known Q-learning technique, our double Q-learning algorithm does not suffer from the over-estimation issue. Simulation results show that although our algorithm does not require prior information of the environment, it works well and shows near optimal performance.

preprint2020arXiv

Federated Learning for Cellular-connected UAVs: Radio Mapping and Path Planning

To prolong the lifetime of the unmanned aerial vehicles (UAVs), the UAVs need to fulfill their missions in the shortest possible time. In addition to this requirement, in many applications, the UAVs require a reliable internet connection during their flights. In this paper, we minimize the travel time of the UAVs, ensuring that a probabilistic connectivity constraint is satisfied. To solve this problem, we need a global model of the outage probability in the environment. Since the UAVs have different missions and fly over different areas, their collected data carry local information on the network's connectivity. As a result, the UAVs can not rely on their own experiences to build the global model. This issue affects the path planning of the UAVs. To address this concern, we utilize a two-step approach. In the first step, by using Federated Learning (FL), the UAVs collaboratively build a global model of the outage probability in the environment. In the second step, by using the global model obtained in the first step and rapidly-exploring random trees (RRTs), we propose an algorithm to optimize UAVs' paths. Simulation results show the effectiveness of this two-step approach for UAV networks.

preprint2020arXiv

No Cross-Validation Required: An Analytical Framework for Regularized Mixed-Integer Problems (Extended Version)

This paper develops a method to obtain the optimal value for the regularization coefficient in a general mixed-integer problem (MIP). This approach eliminates the cross-validation performed in the existing penalty techniques to obtain a proper value for the regularization coefficient. We obtain this goal by proposing an alternating method to solve MIPs. First, via regularization, we convert the MIP into a more mathematically tractable form. Then, we develop an iterative algorithm to update the solution along with the regularization (penalty) coefficient. We show that our update procedure guarantees the convergence of the algorithm. Moreover, assuming the objective function is continuously differentiable, we derive the convergence rate, a lower bound on the value of regularization coefficient, and an upper bound on the number of iterations required for the convergence. We use a radio access technology (RAT) selection problem in a heterogeneous network to benchmark the performance of our method. Simulation results demonstrate near-optimality of the solution and consistency of the convergence behavior with obtained theoretical bounds.