Source author record

Riad Ahmed

Riad Ahmed appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Catalog footprint

What is connected

1works
1topics
0close collaborators

Actions

Connect this record

Log in to claim

Research graph

See the researcher in context

Open full explorer

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

1 published item(s)

preprint2026arXiv

Robust Multi-Agent Path Finding under Observation Attacks: A Principled Adversarial-Plus-Smoothing Training Recipe

Decentralized multi-agent path finding (MAPF) routes a team of agents on a shared grid, each acting from its own local view. The standard solution trains one shared neural policy with Proximal Policy Optimization (PPO), a popular on-policy reinforcement learning algorithm. Such a policy works well on clean observations, but a small input perturbation on one agent often changes its action, which then blocks a neighbour, and the team jams. In this paper we present two training recipes that keep the same network and the same deployment loop, yet make the policy hold up under perturbed observations. The first recipe, Adv-PPO, trains the shared policy against worst-case perturbations of its own input and selects the checkpoint by performance under adversarial perturbation. The second recipe, Adv-PPO+MACER, fine-tunes that checkpoint with a small on-policy smoothness term whose gradient follows the certified radius of randomized smoothing. On POGEMA with 8x8 maps and four agents, the unprotected PPO policy reaches 95.8% clean success but only 2.5% under the strongest attack. Adv-PPO recovers worst-case success to 59.2% at one percentage point of clean cost. Adv-PPO+MACER recovers it to 77.5% +/- 6.0% across three independent seeds at less than one percentage point of clean cost. We support these numbers with per-attack curves, a certified action-stability sanity check (which measures the smoothed-policy wrapper, not the deployed argmax policy), and side-by-side rollout storyboards that show the failure mode and the fix inside one environment instance.