Source author record

Ahmadreza Moradipari

Ahmadreza Moradipari appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Machine Learning Systems and Control eess.SY Artificial Intelligence Multiagent Systems

Catalog footprint

What is connected

8works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Agentic AI for Trip Planning Optimization Application

Trip planning for intelligent vehicles increasingly requires selecting optimal routes rather than merely producing feasible itineraries, as interacting factors such as travel time, energy consumption, and traffic conditions directly affect plan quality. Yet existing systems are largely designed for feasibility-oriented planning, and current benchmarks provide only reference answers without ground truth, preventing objective evaluation of optimization performance. In our paper, we address these limitations with an agentic AI framework that enables dynamic refinement through an orchestration agent coordinating specialized agents for traffic, charging, and points of interest, and with the Trip-planning Optimization Problems Dataset, which supplies definitive optimal solutions and category-level task structure for fine-grained analysis. Experiments show that our system achieves 77.4\% accuracy on the TOP Benchmark, significantly outperforming single-agent and workflow-based multi-agent baselines, demonstrating the importance of orchestrated agentic reasoning for robust trip planning optimization.

preprint2026arXiv

Formation and Investigation of Cooperative Platooning at the Early Stage of Connected and Automated Vehicles Deployment

Cooperative platooning, enabled by cooperative adaptive cruise control (CACC), is a cornerstone technology for connected automated vehicles (CAVs), offering significant improvements in safety, comfort, and traffic efficiency over traditional adaptive cruise control (ACC). This paper addresses a key challenge in the initial deployment phase of CAVs: the limited benefits of cooperative platooning due to the sparse distribution of CAVs on the road. To overcome this limitation, we propose an innovative control framework that enhances cooperative platooning in mixed traffic environments. Two techniques are utilized: (1) a mixed cooperative platooning strategy that integrates CACC with unconnected vehicles (CACCu), and (2) a strategic lane-change decision model designed to facilitate safe and efficient lane changes for platoon formation. Additionally, a surrounding vehicle identification system is embedded in the framework to enable CAVs to effectively identify and select potential platooning leaders. Simulation studies across various CV market penetration rates (MPRs) show that incorporating CACCu systems significantly improves safety, comfort, and traffic efficiency compared to existing systems with only CACC and ACC systems, even at CV penetration as low as 10%. The maximized platoon formation increases by up to 24%, accompanied by an 11% reduction in acceleration and a 7% decrease in fuel consumption. Furthermore, the strategic lane-change model enhances CAV performance, achieving notable improvements between 6% and 60% CV penetration, without adversely affecting overall traffic flow.

preprint2022arXiv

Collaborative Multi-agent Stochastic Linear Bandits

We study a collaborative multi-agent stochastic linear bandit setting, where $N$ agents that form a network communicate locally to minimize their overall regret. In this setting, each agent has its own linear bandit problem (its own reward parameter) and the goal is to select the best global action w.r.t. the average of their reward parameters. At each round, each agent proposes an action, and one action is randomly selected and played as the network action. All the agents observe the corresponding rewards of the played actions and use an accelerated consensus procedure to compute an estimate of the average of the rewards obtained by all the agents. We propose a distributed upper confidence bound (UCB) algorithm and prove a high probability bound on its $T$-round regret in which we include a linear growth of regret associated with each communication round. Our regret bound is of order $\mathcal{O}\Big(\sqrt{\frac{T}{N \log(1/|λ_2|)}}\cdot (\log T)^2\Big)$, where $λ_2$ is the second largest (in absolute value) eigenvalue of the communication matrix.

preprint2022arXiv

Feature and Parameter Selection in Stochastic Linear Bandits

We study two model selection settings in stochastic linear bandits (LB). In the first setting, which we refer to as feature selection, the expected reward of the LB problem is in the linear span of at least one of $M$ feature maps (models). In the second setting, the reward parameter of the LB problem is arbitrarily selected from $M$ models represented as (possibly) overlapping balls in $\mathbb R^d$. However, the agent only has access to misspecified models, i.e.,~estimates of the centers and radii of the balls. We refer to this setting as parameter selection. For each setting, we develop and analyze a computationally efficient algorithm that is based on a reduction from bandits to full-information problems. This allows us to obtain regret bounds that are not worse (up to a $\sqrt{\log M}$ factor) than the case where the true model is known. This is the best-reported dependence on the number of models $M$ in these settings. Finally, we empirically show the effectiveness of our algorithms using synthetic and real-world experiments.

preprint2022arXiv

Multi-Environment Meta-Learning in Stochastic Linear Bandits

In this work we investigate meta-learning (or learning-to-learn) approaches in multi-task linear stochastic bandit problems that can originate from multiple environments. Inspired by the work of [1] on meta-learning in a sequence of linear bandit problems whose parameters are sampled from a single distribution (i.e., a single environment), here we consider the feasibility of meta-learning when task parameters are drawn from a mixture distribution instead. For this problem, we propose a regularized version of the OFUL algorithm that, when trained on tasks with labeled environments, achieves low regret on a new task without requiring knowledge of the environment from which the new task originates. Specifically, our regret bound for the new algorithm captures the effect of environment misclassification and highlights the benefits over learning each task separately or meta-learning without recognition of the distinct mixture components.

preprint2020arXiv

Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints

We consider the problem of an aggregator attempting to learn customers' load flexibility models while implementing a load shaping program by means of broadcasting daily dispatch signals. We adopt a multi-armed bandit formulation to account for the stochastic and unknown nature of customers' responses to dispatch signals. We propose a constrained Thompson sampling heuristic, Con-TS-RTP, that accounts for various possible aggregator objectives (e.g., to reduce demand at peak hours, integrate more intermittent renewable generation, track a desired daily load profile, etc) and takes into account the operational constraints of a distribution system to avoid potential grid failures as a result of uncertainty in the customers' response. We provide a discussion on the regret bounds for our algorithm as well as a discussion on the operational reliability of the distribution system's constraints being upheld throughout the learning process.

preprint2020arXiv

Mobility-Aware Electric Vehicle Fast Charging Load Models with Geographical Price Variations

We study the traffic patterns as well as the charging patterns of a population of cost-minimizing EV owners traveling and charging within a transportation network equipped with fast charging stations (FCSs). Specifically, we study how the charging network operator (CNO) can influence where EV users charge in order to optimize the utilization of fast charging stations. These charging decisions of private EV owners affect aggregate congestion at stations (i.e., waiting time) as well as the aggregate EV charging load across the network. In this work, we capture the resulting equilibrium wait times and electricity load through a so-called \textit{traffic and charge assignment problem} (TCAP) in a fast charging station network. Our formulation allows us to: 1) Study the expected station wait times as well as the probability distribution of aggregate charging load of EVs in a FCS network in a mobility-aware fashion (an aspect unique to our work), while accounting for heterogeneities in users' travel patterns, energy demands, and geographically variant electricity prices. 2) Analytically characterize the special threshold-based structure that determines how EV owners choose where to charge their vehicle at equilibrium, in response to the FCS's charging price structure, users' energy demands, and users' mobility patterns. 3) Provide a convex optimization problem formulation to identify the network's unique equilibrium. Furthermore, we illustrate how to induce a socially optimal charging behavior by deriving the socially optimal plug-in fees and electricity prices at the charging stations.

preprint2020arXiv

Safe Linear Thompson Sampling with Side Information

The design and performance analysis of bandit algorithms in the presence of stage-wise safety or reliability constraints has recently garnered significant interest. In this work, we consider the linear stochastic bandit problem under additional \textit{linear safety constraints} that need to be satisfied at each round. We provide a new safe algorithm based on linear Thompson Sampling (TS) for this problem and show a frequentist regret of order $\mathcal{O} (d^{3/2}\log^{1/2}d \cdot T^{1/2}\log^{3/2}T)$, which remarkably matches the results provided by (Abeille et al., 2017) for the standard linear TS algorithm in the absence of safety constraints. We compare the performance of our algorithm with UCB-based safe algorithms and highlight how the inherently randomized nature of TS leads to a superior performance in expanding the set of safe actions the algorithm has access to at each round.

Ahmadreza Moradipari

What is connected

Connect this record

See the researcher in context

Building this map preview

8 published item(s)

Agentic AI for Trip Planning Optimization Application

Formation and Investigation of Cooperative Platooning at the Early Stage of Connected and Automated Vehicles Deployment

Collaborative Multi-agent Stochastic Linear Bandits

Feature and Parameter Selection in Stochastic Linear Bandits

Multi-Environment Meta-Learning in Stochastic Linear Bandits

Constrained Thompson Sampling for Real-Time Electricity Pricing with Grid Reliability Constraints

Mobility-Aware Electric Vehicle Fast Charging Load Models with Geographical Price Variations

Safe Linear Thompson Sampling with Side Information