IEEE/CAA Journal of Automatica Sinica
Citation: | Y. Li, Y. Zhang, X. Li, and C. Sun, “Regional multi-agent cooperative reinforcement learning for city-level traffic grid signal control,” IEEE/CAA J. Autom. Sinica, vol. 11, no. 9, pp. 1987–1998, Sept. 2024. doi: 10.1109/JAS.2024.124365 |
This article studies the effective traffic signal control problem of multiple intersections in a city-level traffic system. A novel regional multi-agent cooperative reinforcement learning algorithm called RegionSTLight is proposed to improve the traffic efficiency. Firstly a regional multi-agent Q-learning framework is proposed, which can equivalently decompose the global Q value of the traffic system into the local values of several regions. Based on the framework and the idea of human-machine cooperation, a dynamic zoning method is designed to divide the traffic network into several strong-coupled regions according to real-time traffic flow densities. In order to achieve better cooperation inside each region, a lightweight spatio-temporal fusion feature extraction network is designed. The experiments in synthetic, real-world and city-level scenarios show that the proposed RegionSTLight converges more quickly, is more stable, and obtains better asymptotic performance compared to state-of-the-art models.
[1] |
K. N. Qureshi and A. H. Abdullah, “A survey on intelligent transportation systems,” Middle-East J. Scientific Research, vol. 15, no. 5, pp. 629–642, 2013.
|
[2] |
H. Wei, G. Zheng, V. Gayah, and Z. Li, “A survey on traffic signal control methods,” arXiv preprint arXiv: 1904.08117, 2019.
|
[3] |
A. J. Miller, “Settings for fixed-cycle traffic signals,” J. Operational Research Society, vol. 14, no. 4, pp. 373–386, 1963. doi: 10.1057/jors.1963.61
|
[4] |
A. Salkham, R. Cunningham, A. Garg, and V. Cahill, “A collaborative reinforcement learning approach to urban traffic control optimization,” in Proc. IEEE/WIC/ACM Int. Conf. Web Intelligence and Intelligent Agent Tech., 2008, vol. 2, pp. 560–566.
|
[5] |
G. F. Newell, “Approximation methods for queues with application to the fixed-cycle traffic light,” Siam Review, vol. 7, no. 2, pp. 223–240, 1965. doi: 10.1137/1007038
|
[6] |
P. Varaiya, “The max-pressure controller for arbitrary networks of signalized intersections,” in Advances in Dynamic Network Modeling in Complex Transportation Systems. New York, USA: Springer, 2013, pp. 27–66.
|
[7] |
X. Zang, H. Yao, G. Zheng, N. Xu, K. Xu, and Z. Li, “MetaLight: Value-based meta-reinforcement learning for traffic signal control,” in Proc. AAAI Conf. Artificial Intelligence, 2020, vol. 34, no. 1, pp. 1153–1160.
|
[8] |
B. Abdulhai, R. Pringle, and G. J. Karakoulas, “Reinforcement learning for true adaptive traffic signal control,” J. Transportation Engineering, vol. 129, no. 3, pp. 278–285, 2003.
|
[9] |
X. Liang, X. Du, G. Wang, and Z. Han, “A deep reinforcement learning network for traffic light cycle control,” IEEE Trans. Vehicular Technology, vol. 68, no. 2, pp. 1243–1253, 2019. doi: 10.1109/TVT.2018.2890726
|
[10] |
L. Li, Y. Lv, and F.-Y. Wang, “Traffic signal timing via deep reinforcement learning,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 247–254, 2016. doi: 10.1109/JAS.2016.7508798
|
[11] |
J. Wu and Y. Lou, “Efficient centralized traffic grid signal control based on meta-reinforcement learning,” IEEE/CAA J. Autom. Sinica, 2023. DOI: 10.1109/JAS.2023.123270
|
[12] |
L. Prashanth and S. Bhatnagar, “Reinforcement learning with function approximation for traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 12, no. 2, pp. 412–421, 2010.
|
[13] |
L. N. Alegre, T. Ziemke, and A. L. Bazzan, “Using reinforcement learning to control traffic signals in a real-world scenario: An approach based on linear function approximation,” IEEE Trans. Intelligent Transportation Systems, vol. 23, no. 7, pp. 9126–9135, 2021.
|
[14] |
Y. Liu, L. Liu, and W.-P. Chen, “Intelligent traffic light control using distributed multi-agent Q learning,” in Proc. IEEE 20th Int. Conf. Intelligent Transportation Systems, 2017, pp. 1–8.
|
[15] |
T. Chu, J. Wang, L. Codecà, and Z. Li, “Multi-agent deep reinforcement learning for large-scale traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 21, no. 3, pp. 1086–1095, 2019.
|
[16] |
X. Wang, L. Ke, Z. Qiao, and X. Chai, “Large-scale traffic signal control using a novel multiagent reinforcement learning,” IEEE Trans. Cybern., vol. 51, no. 1, pp. 174–187, 2020.
|
[17] |
Z. Li, H. Yu, G. Zhang, S. Dong, and C.-Z. Xu, “Network-wide traffic signal control optimization using a multi-agent deep reinforcement learning,” Transportation Research Part C: Emerging Technologies, vol. 125, p. 103059, 2021. doi: 10.1016/j.trc.2021.103059
|
[18] |
T. Chu, S. Qu, and J. Wang, “Large-scale traffic grid signal control with regional reinforcement learning,” in Proc. American Control Conf., 2016, pp. 815–820.
|
[19] |
T. Tan, F. Bao, Y. Deng, A. Jin, Q. Dai, and J. Wang, “Cooperative deep reinforcement learning for large-scale traffic grid signal control,” IEEE Trans. Cybern., vol. 50, no. 6, pp. 2687–2700, 2019.
|
[20] |
S. Jiang, Y. Huang, M. Jafari, and M. Jalayer, “A distributed multi-agent reinforcement learning with graph decomposition approach for large-scale adaptive traffic signal control,” IEEE Trans. Intelligent Transportation Systems, vol. 23, no. 9, pp. 14689–14701, 2023.
|
[21] |
L. Yan, L. Zhu, K. Song, Z. Yuan, Y. Yan, Y. Tang, and C. Peng, “Graph cooperation deep reinforcement learning for ecological urban traffic signal control,” Applied Intelligence, vol. 53, no. 6, pp. 6248–6265, 2023. doi: 10.1007/s10489-022-03208-w
|
[22] |
H. Wei, N. Xu, H. Zhang, G. Zheng, X. Zang, C. Chen, W. Zhang, Y. Zhu, K. Xu, and Z. Li, “Colight: Learning network-level cooperation for traffic signal control,” in Proc. 28th ACM Int. Conf. Information and Knowledge Management, 2019, pp. 1913–1922.
|
[23] |
L. Wu, M. Wang, D. Wu, and J. Wu, “DynSTGAT: DynAMIC spatial-temporal graph attention network for traffic signal control,” in Proc. 30th ACM Int. Conf. Inform. & Knowledge Management, 2021, pp. 2150–2159.
|
[24] |
H. Wei, C. Chen, G. Zheng, K. Wu, V. Gayah, K. Xu, and Z. Li, “PressLight: Learning max pressure control to coordinate traffic signals in arterial network,” in Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery & Data Mining, 2019, pp. 1290–1298.
|
[25] |
S. Guicheng and W. Yang, “Review on DEC-Pomdp model for MARL algorithms,” in Proc. Smart Communi., Intelligent Algorithms and Interactive Methods; 4th Int. Conf. Wireless Communi. and Appli., 2022, pp. 29–35.
|
[26] |
M. L. Littman, “Markov games as a framework for multi-agent reinforcement learning,” in Machine Learning Proceedings 1994. New Brunswick, USA: Elsevier, 1994, pp. 157–163.
|
[27] |
R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction. London, UK: MIT Press, 2018.
|
[28] |
R. Lowe, Y. I. Wu, A. Tamar, J. Harb, O. Pieter Abbeel, and I. Mordatch, “Multi-agent actor-critic for mixed cooperative-competitive environments,” in Proc. 31st Int. Conf. Neural Inform. Proc. Syst., 2017, vol. 30, pp. 6382–6393.
|
[29] |
P. Sunehag, G. Lever, A. Gruslys, et al., “Value-decomposition networks for cooperative multi-agent learning,” arXiv preprint arXiv: 1706.05296, 2017.
|
[30] |
T. Rashid, M. Samvelyan, C. Schroeder, G. Farquhar, J. Foerster, and S. Whiteson, “Qmix: Monotonic value function factorisation for deep multi-agent reinforcement learning,” in Proc. Int. Conf. Machine Learning, 2018, pp. 4295–4304.
|
[31] |
C. Guestrin, D. Koller, and R. Parr, “Multiagent planning with factored MDPs,” in Proc. 14th Int. Conf. Neural Inform. Processing Syst.: Natural and Synthetic, 2001 vol. 14, pp. 1523–1530.
|
[32] |
G. Tesauro, “Extending Q-learning to general adaptive multi-agent systems,” in Proc. 16th Int. Conf. Neural Inform. Proc. Syst., 2003 vol. 16, pp. 871–878.
|
[33] |
M. Tan, “Multi-agent reinforcement learning: Independent vs. cooperative agents,” in Proc. 10th Int. Conf. Machine Learning, 1993, pp. 330–337.
|
[34] |
P. Velickovic, G. Cucurull, A. Casanova, A. Romero, P. Lio, and Y. Bengio, “Graph attention networks,” Stat, vol. 150, p. 20, 2017.
|
[35] |
Q. Wang, B. Wu, P. Zhu, P. Li, and Q. Hu, “ECA-Net: Efficient channel attention for deep convolutional neural networks,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, 2020.
|
[36] |
H. Zhang, S. Feng, C. Liu, Y. Ding, Y. Zhu, Z. Zhou, W. Zhang, Y. Yu, H. Jin, and Z. Li, “CityFlow: A multi-agent reinforcement learning environment for large scale city traffic scenario,” in Proc. World Wide Web Conf., 2019, pp. 3620–3624.
|
[37] |
P. Varaiya, The Max-Pressure Controller for Arbitrary Networks of Signalized Intersections. New York, USA: Springer, 2013.
|
[38] |
T. Nishi, K. Otaki, K. Hayakawa, and T. Yoshimura, “Traffic signal control based on reinforcement learning with graph convolutional neural nets,” in Proc. 21st IEEE Int. Conf. Intelligent Transportation Systems, 2018.
|
[39] |
P. Zhou, X. Chen, Z. Liu, T. Braud, P. Hui, and J. Kangasharju, “DRLE: Decentralized reinforcement learning at the edge for traffic light control in the IOV,” IEEE Trans. Intelligent Transportation Systems, vol. 22, no. 4, pp. 2262–2273, 2020.
|