Multi-agent reinforcement learning for network routing in integrated access backhaul networks

Shahaf Yamin, Haim H. Permuter

Research output: Contribution to journalArticlepeer-review


In this study, we examine the problem of downlink wireless routing in integrated access backhaul (IAB) networks involving fiber-connected base stations, wireless base stations, and multiple users. Physical constraints prevent the use of a central controller, leaving base stations with limited access to real-time network conditions. These networks operate in a time-slotted regime, where base stations monitor network conditions and forward packets accordingly. Our objective is to maximize the arrival ratio of packets, while simultaneously minimizing their latency. To accomplish this, we formulate this problem as a multi-agent partially observed Markov Decision Process (POMDP). Moreover, we develop an algorithm that uses Multi-Agent Reinforcement Learning (MARL) combined with Advantage Actor Critic (A2C) to derive a joint routing policy on a distributed basis. Due to the importance of packet destinations for successful routing decisions, we utilize information about similar destinations as a basis for selecting specific-destination routing decisions. For portraying the similarity between those destinations, we rely on their relational base-station associations, i.e., which base station they are currently connected to. Therefore, the algorithm is referred to as Relational Advantage Actor Critic (Relational A2C). To the best of our knowledge, this is the first work that optimizes routing strategy for IAB networks. Further, we present three types of training paradigms for this algorithm in order to provide flexibility in terms of its performance and throughput. Through numerical experiments with different network scenarios, Relational A2C algorithms were demonstrated to be capable of achieving near-centralized performance even though they operate in a decentralized manner in the network of interest. Based on the results of those experiments, we compare Relational A2C to other reinforcement learning algorithms, like Q-Routing and Hybrid Routing. This comparison illustrates that solving the joint optimization problem increases network efficiency and reduces selfish agent behavior.

Original languageEnglish
Article number103347
JournalAd Hoc Networks
StatePublished - 1 Feb 2024


  • Integrated access backhaul
  • Multi-agent reinforcement learning
  • Network routing

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture
  • Computer Networks and Communications


Dive into the research topics of 'Multi-agent reinforcement learning for network routing in integrated access backhaul networks'. Together they form a unique fingerprint.

Cite this