TY - GEN
T1 - Team-Imitate-Synchronize for Solving Dec-POMDPs
AU - Abdoo, Eliran
AU - Brafman, Ronen I.
AU - Shani, Guy
AU - Soffair, Nitsan
N1 - Publisher Copyright:
© 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2023/1/1
Y1 - 2023/1/1
N2 - Multi-agent collaboration under partial observability is a difficult task. Multi-agent reinforcement learning (MARL) algorithms that do not leverage a model of the environment struggle with tasks that require sequences of collaborative actions, while Dec-POMDP algorithms that use such models to compute near-optimal policies, scale poorly. In this paper, we suggest the Team-Imitate-Synchronize (TIS) approach, a heuristic, model-based method for solving such problems. Our approach begins by solving the joint team problem, assuming that observations are shared. Then, for each agent we solve a single agent problem designed to imitate its behavior within the team plan. Finally, we adjust the single agent policies for better synchronization. Our experiments demonstrate that our method provides comparable solutions to Dec-POMDP solvers over small problems, while scaling to much larger problems, and provides collaborative plans that MARL algorithms are unable to identify.
AB - Multi-agent collaboration under partial observability is a difficult task. Multi-agent reinforcement learning (MARL) algorithms that do not leverage a model of the environment struggle with tasks that require sequences of collaborative actions, while Dec-POMDP algorithms that use such models to compute near-optimal policies, scale poorly. In this paper, we suggest the Team-Imitate-Synchronize (TIS) approach, a heuristic, model-based method for solving such problems. Our approach begins by solving the joint team problem, assuming that observations are shared. Then, for each agent we solve a single agent problem designed to imitate its behavior within the team plan. Finally, we adjust the single agent policies for better synchronization. Our experiments demonstrate that our method provides comparable solutions to Dec-POMDP solvers over small problems, while scaling to much larger problems, and provides collaborative plans that MARL algorithms are unable to identify.
UR - http://www.scopus.com/inward/record.url?scp=85151051584&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-26412-2_14
DO - 10.1007/978-3-031-26412-2_14
M3 - Conference contribution
AN - SCOPUS:85151051584
SN - 9783031264115
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 216
EP - 232
BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2022, Proceedings
A2 - Amini, Massih-Reza
A2 - Canu, Stéphane
A2 - Fischer, Asja
A2 - Guns, Tias
A2 - Kralj Novak, Petra
A2 - Tsoumakas, Grigorios
PB - Springer Science and Business Media Deutschland GmbH
T2 - 22nd Joint European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2022
Y2 - 19 September 2022 through 23 September 2022
ER -