TY - GEN
T1 - Learning an interpretable traffic signal control policy
AU - Ault, James
AU - Hanna, Josiah P.
AU - Sharon, Guni
N1 - Publisher Copyright:
© 2020 International Foundation for Autonomous.
PY - 2020/1/1
Y1 - 2020/1/1
N2 - Signalized intersections are managed by controllers that assign right of way (green, yellow, and red lights) to non-conflicting directions. Optimizing the actuation policy of such controllers is expected to alleviate traffic congestion and its adverse impact. Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human. This paper presents and analyzes several on-line optimization techniques for tuning interpretable control functions. Although these techniques are defined in a general way, this paper assumes a specific class of interpretable control functions (polynomial functions) for analysis purposes. We show that such an interpretable policy function can be as effective as a deep neural network for approximating an optimized signal actuation policy. We present empirical evidence that supports the use of value-based reinforcement learning for on-line training of the control function. Specifically, we present and study three variants of the Deep Q-learning algorithm that allow the training of an interpretable policy function. Our Deep Regulatable Hardmax Q-learning variant is shown to be particularly effective in optimizing our interpretable actuation policy, resulting in up to 19.4% reduced vehicles delay compared to commonly deployed actuated signal controllers.
AB - Signalized intersections are managed by controllers that assign right of way (green, yellow, and red lights) to non-conflicting directions. Optimizing the actuation policy of such controllers is expected to alleviate traffic congestion and its adverse impact. Given such a safety-critical domain, the affiliated actuation policy is required to be interpretable in a way that can be understood and regulated by a human. This paper presents and analyzes several on-line optimization techniques for tuning interpretable control functions. Although these techniques are defined in a general way, this paper assumes a specific class of interpretable control functions (polynomial functions) for analysis purposes. We show that such an interpretable policy function can be as effective as a deep neural network for approximating an optimized signal actuation policy. We present empirical evidence that supports the use of value-based reinforcement learning for on-line training of the control function. Specifically, we present and study three variants of the Deep Q-learning algorithm that allow the training of an interpretable policy function. Our Deep Regulatable Hardmax Q-learning variant is shown to be particularly effective in optimizing our interpretable actuation policy, resulting in up to 19.4% reduced vehicles delay compared to commonly deployed actuated signal controllers.
KW - Deep reinforcement learning
KW - Intelligent transportation
KW - Interpretable
UR - http://www.scopus.com/inward/record.url?scp=85096647945&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85096647945
T3 - Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS
SP - 88
EP - 96
BT - Proceedings of the 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
A2 - An, Bo
A2 - El Fallah Seghrouchni, Amal
A2 - Sukthankar, Gita
PB - International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
T2 - 19th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2020
Y2 - 19 May 2020
ER -