Researcher profile

Eslam Eldeeb

Eslam Eldeeb contributes to research discovery and scholarly infrastructure.

ResearcherAffiliation not importedOpen to collaborate

Trust snapshot

Quick read

Trust 15 - UnverifiedVerification L1Unclaimed author
3works
0followers
5topics
4close collaborators

Actions

Decide how to stay connected

Follow researcher0

Identity and collaboration

How to connect with this researcher

Claiming links this public author record to a researcher profile and unlocks direct collaboration workflows.

Log in to claim

Direct collaboration

Open a focused conversation when the fit is right

Claim this author entity first to unlock direct invitations.

Research graph

See the researcher in context

Open full explorer

Inspect adjacent work, topics, institutions and collaborators without jumping out to a separate graph page.

Building this graph slice

BZPEER is loading the nearby papers, people, topics and institutions for this page.

Published work

3 published item(s)

preprint2026arXiv

Offline Multi-Agent Reinforcement Learning for 6G Communications: Fundamentals, Applications and Future Directions

The next-generation wireless technologies, including beyond 5G and 6G networks, are paving the way for transformative applications such as vehicle platooning, smart cities, and remote surgery. These innovations are driven by a vast array of interconnected wireless entities, including IoT devices, access points, UAVs, and CAVs, which increase network complexity and demand more advanced decision-making algorithms. Artificial intelligence (AI) and machine learning (ML), especially reinforcement learning (RL), are key enablers for such networks, providing solutions to high-dimensional and complex challenges. However, as networks expand to multi-agent environments, traditional online RL approaches face cost, safety, and scalability limitations. Offline multi-agent reinforcement learning (MARL) offers a promising solution by utilizing pre-collected data, reducing the need for real-time interaction. This article introduces a novel offline MARL algorithm based on conservative Q-learning (CQL), ensuring safe and efficient training. We extend this with meta-learning to address dynamic environments and validate the approach through use cases in radio resource management and UAV networks. Our work highlights offline MARL's advantages, limitations, and future directions in wireless applications.

preprint2023arXiv

Multi-UAV Path Learning for Age and Power Optimization in IoT with UAV Battery Recharge

In many emerging Internet of Things (IoT) applications, the freshness of the is an important design criterion. Age of Information (AoI) quantifies the freshness of the received information or status update. This work considers a setup of deployed IoT devices in an IoT network; multiple unmanned aerial vehicles (UAVs) serve as mobile relay nodes between the sensors and the base station. We formulate an optimization problem to jointly plan the UAVs' trajectory, while minimizing the AoI of the received messages and the devices' energy consumption. The solution accounts for the UAVs' battery lifetime and flight time to recharging depots to ensure the UAVs' green operation. The complex optimization problem is efficiently solved using a deep reinforcement learning algorithm. In particular, we propose a deep Q-network, which works as a function approximation to estimate the state-action value function. The proposed scheme is quick to converge and results in a lower ergodic age and ergodic energy consumption when compared with benchmark algorithms such as greedy algorithm (GA), nearest neighbour (NN), and random-walk (RW).

preprint2022arXiv

Traffic Prediction and Fast Uplink for Hidden Markov IoT Models

In this work, we present a novel traffic prediction and fast uplink framework for IoT networks controlled by binary Markovian events. First, we apply the forward algorithm with hidden Markov models (HMM) in order to schedule the available resources to the devices with maximum likelihood activation probabilities via fast uplink grant. In addition, we evaluate the regret metric as the number of wasted transmission slots to evaluate the performance of the prediction. Next, we formulate a fairness optimization problem to minimize the age of information while keeping the regret as minimum as possible. Finally, we propose an iterative algorithm to estimate the model hyperparameters (activation probabilities) in a real-time application and apply an online-learning version of the proposed traffic prediction scheme. Simulation results show that the proposed algorithms outperform baseline models such as time division multiple access (TDMA) and grant-free (GF) random-access in terms of regret, the efficiency of system usage, and age of information.