Source author record

Eslam Eldeeb

Eslam Eldeeb appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Networking and Internet Architecture eess.SP Information Theory math.IT Multiagent Systems

Catalog footprint

What is connected

3works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2026arXiv

Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions

The next-generation wireless technologies, including beyond 5G and 6G networks, are paving the way for transformative applications such as vehicle platooning, smart cities, and remote surgery. These innovations are driven by a vast array of interconnected wireless entities, including IoT devices, access points, UAVs, and CAVs, which increase network complexity and demand more advanced decision-making algorithms. Artificial intelligence (AI) and machine learning (ML), especially reinforcement learning (RL), are key enablers for such networks, providing solutions to high-dimensional and complex challenges. However, as networks expand to multi-agent environments, traditional online RL approaches face cost, safety, and scalability limitations. Offline multi-agent reinforcement learning (MARL) offers a promising solution by utilizing pre-collected data, reducing the need for real-time interaction. This article introduces a novel offline MARL algorithm based on conservative Q-learning (CQL), ensuring safe and efficient training. We extend this with meta-learning to address dynamic environments and validate the approach through use cases in radio resource management and UAV networks. Our work highlights offline MARL's advantages, limitations, and future directions in wireless applications.

preprint2023arXiv

Multi-UAV Path Learning for Age and Power Optimization in IoT with UAV Battery Recharge

In many emerging Internet of Things (IoT) applications, the freshness of the is an important design criterion. Age of Information (AoI) quantifies the freshness of the received information or status update. This work considers a setup of deployed IoT devices in an IoT network; multiple unmanned aerial vehicles (UAVs) serve as mobile relay nodes between the sensors and the base station. We formulate an optimization problem to jointly plan the UAVs' trajectory, while minimizing the AoI of the received messages and the devices' energy consumption. The solution accounts for the UAVs' battery lifetime and flight time to recharging depots to ensure the UAVs' green operation. The complex optimization problem is efficiently solved using a deep reinforcement learning algorithm. In particular, we propose a deep Q-network, which works as a function approximation to estimate the state-action value function. The proposed scheme is quick to converge and results in a lower ergodic age and ergodic energy consumption when compared with benchmark algorithms such as greedy algorithm (GA), nearest neighbour (NN), and random-walk (RW).

preprint2022arXiv

Traffic Prediction and Fast Uplink for Hidden Markov IoT Models

In this work, we present a novel traffic prediction and fast uplink framework for IoT networks controlled by binary Markovian events. First, we apply the forward algorithm with hidden Markov models (HMM) in order to schedule the available resources to the devices with maximum likelihood activation probabilities via fast uplink grant. In addition, we evaluate the regret metric as the number of wasted transmission slots to evaluate the performance of the prediction. Next, we formulate a fairness optimization problem to minimize the age of information while keeping the regret as minimum as possible. Finally, we propose an iterative algorithm to estimate the model hyperparameters (activation probabilities) in a real-time application and apply an online-learning version of the proposed traffic prediction scheme. Simulation results show that the proposed algorithms outperform baseline models such as time division multiple access (TDMA) and grant-free (GF) random-access in terms of regret, the efficiency of system usage, and age of information.