Source author record

Gal Dalal

Gal Dalal appears in the imported research catalog. Authorship, coauthor and topic links are available while profile ownership is still unclaimed.

ResearcherUnclaimed source record

Artificial Intelligence Machine Learning Applications Networking and Internet Architecture Systems and Control

Catalog footprint

What is connected

6works

5topics

4close collaborators

Actions

Connect this record

Open graph Browse works

Inspect adjacent papers, topics, institutions and collaborators without losing the researcher page.

Building this map preview

BZPEER is loading the nearby papers, people, topics and institutions for this page.

preprint2023arXiv

Planning and Learning with Adaptive Lookahead

Some of the most powerful reinforcement learning frameworks use planning for action selection. Interestingly, their planning horizon is either fixed or determined arbitrarily by the state visitation history. Here, we expand beyond the naive fixed horizon and propose a theoretically justified strategy for adaptive selection of the planning horizon as a function of the state-dependent value estimate. We propose two variants for lookahead selection and analyze the trade-off between iteration count and computational complexity per iteration. We then devise a corresponding deep Q-network algorithm with an adaptive tree search horizon. We separate the value estimation per depth to compensate for the off-policy discrepancy between depths. Lastly, we demonstrate the efficacy of our adaptive lookahead method in a maze environment and Atari.

preprint2022arXiv

Reinforcement Learning for Datacenter Congestion Control

We approach the task of network congestion control in datacenters using Reinforcement Learning (RL). Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. Evidently, the most popular recent deployments rely on rule-based heuristics that are tested on a predetermined set of benchmarks. Consequently, these heuristics do not generalize well to newly-seen scenarios. Contrarily, we devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We overcome challenges such as partial-observability, non-stationarity, and multi-objectiveness. We further propose a policy gradient algorithm that leverages the analytical structure of the reward function to approximate its derivative and improve stability. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training. Our experiments, conducted on a realistic simulator that emulates communication networks' behavior, exhibit improved performance concurrently on the multiple considered metrics compared to the popular algorithms deployed today in real datacenters. Our algorithm is being productized to replace heuristics in some of the largest datacenters in the world.

preprint2016arXiv

Distributed Scenario-Based Optimization for Asset Management in a Hierarchical Decision Making Environment

Asset management attempts to keep the power system in working conditions. It requires much coordination between multiple entities and long term planning often months in advance. In this work we introduce a mid-term asset management formulation as a stochastic optimization problem, that includes three hierarchical layers of decision making, namely the mid-term, short-term and real-time. We devise a tractable scenario approximation technique for efficiently assessing the complex implications a maintenance schedule inflicts on a power system. This is done using efficient Monte-Carlo simulations that trade-off between accuracy and tractability. We then present our implementation of a distributed scenario-based optimization algorithm for solving our formulation, and use an updated PJM 5-bus system to show a solution that is cheaper than other maintenance heuristics that are likely to be considered by TSOs.

preprint2016arXiv

Hierarchical Decision Making In Electricity Grid Management

The power grid is a complex and vital system that necessitates careful reliability management. Managing the grid is a difficult problem with multiple time scales of decision making and stochastic behavior due to renewable energy generations, variable demand and unplanned outages. Solving this problem in the face of uncertainty requires a new methodology with tractable algorithms. In this work, we introduce a new model for hierarchical decision making in complex systems. We apply reinforcement learning (RL) methods to learn a proxy, i.e., a level of abstraction, for real-time power grid reliability. We devise an algorithm that alternates between slow time-scale policy improvement, and fast time-scale value function approximation. We compare our results to prevailing heuristics, and show the strength of our method.

preprint2016arXiv

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

In this work we design and compare different supervised learning algorithms to compute the cost of Alternating Current Optimal Power Flow (ACOPF). The motivation for quick calculation of OPF cost outcomes stems from the growing need of algorithmic-based long-term and medium-term planning methodologies in power networks. Integrated in a multiple time-horizon coordination framework, we refer to this approximation module as a proxy for predicting short-term decision outcomes without the need of actual simulation and optimization of them. Our method enables fast approximate calculation of OPF cost with less than 1% error on average, achieved in run-times that are several orders of magnitude lower than of exact computation. Several test-cases such as IEEE-RTS96 are used to demonstrate the efficiency of our approach.

preprint2015arXiv

Reinforcement Learning for the Unit Commitment Problem

In this work we solve the day-ahead unit commitment (UC) problem, by formulating it as a Markov decision process (MDP) and finding a low-cost policy for generation scheduling. We present two reinforcement learning algorithms, and devise a third one. We compare our results to previous work that uses simulated annealing (SA), and show a 27% improvement in operation costs, with running time of 2.5 minutes (compared to 2.5 hours of existing state-of-the-art).

Gal Dalal

What is connected

Connect this record

See the researcher in context

Building this map preview

6 published item(s)

Planning and Learning with Adaptive Lookahead

Reinforcement Learning for Datacenter Congestion Control

Distributed Scenario-Based Optimization for Asset Management in a Hierarchical Decision Making Environment

Hierarchical Decision Making In Electricity Grid Management

Supervised Learning for Optimal Power Flow as a Real-Time Proxy

Reinforcement Learning for the Unit Commitment Problem