Citation: | R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125022 |
[1] |
J. Zhang, Y. Lv, J. Tao, F. Huang, and J. Zhang, “A robust realtime anchor-free traffic sign detector with one-level feature,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1437–1451, 2024. doi: 10.1109/TETCI.2024.3349464
[2] |
J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, “Cctsdb 2021: a more comprehensive traffic sign detection benchmark,” Human-centric Computing and Information Sciences, vol. 12, pp. 1–18, 2022.
[3] |
J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, and C. S. Lai, “Vehiclemounted adaptive traffic sign detector for small-sized signs in multiple working conditions,” IEEE Trans. Intelligent Transportation Systems, vol. 25, no. 1, pp. 710–724, 2024. doi: 10.1109/TITS.2023.3309644
[4] |
R. Wang, J. Hei, M. Liu, Z. Wan, J. Xu, X. Cao, X. He, Y. Ding, C. Gao, and N. Sang, “Cr2-net: Component relationship reasoning network for traffic text detection,” IEEE Trans. Intelligent Vehicles, pp. 1–15, 2024.
[5] |
R. Bagi, T. Dutta, N. Nigam, D. Verma, and H. P. Gupta, “Met-mlts: Leveraging smartphones for end-to-end spotting of multilingual oriented scene texts and traffic signs in adverse meteorological conditions,” IEEE Trans. Intelligent Transportation Systems, vol. 23, p. 8, 2022.
[6] |
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., “Icdar 2015 competition on robust reading,” in 13th international conference on document analysis and recognition (ICDAR), 2015, pp. 1156–1160.
[7] |
W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9328–9337.
[8] |
M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Processing, vol. 27, no. 8, pp. 3676–3690, 2018. doi: 10.1109/TIP.2018.2825107
[9] |
P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, “Single shot text detector with regional attention,” in Proc. the IEEE Int. Conf. on Computer Vision, 2017, pp. 3047–3055.
[10] |
M. Liao, Z. Zhu, B. Shi, G.-s. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918.
[11] |
S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. the European conference on computer vision (ECCV), 2018, pp. 20–36.
[12] |
X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “East: an efficient and accurate scene text detector,” in Proc. the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 5551–5560.
[13] |
Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11753–11762.
[14] |
S. Liu, Y. Xian, H. Li, and Z. Yu, “Text detection in natural scene images using morphological component analysis and laplacian dictionary,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 1, pp. 214–222, 2020. doi: 10.1109/JAS.2017.7510427
[15] |
W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in Proc. the IEEE/CVF international conference on computer vision, 2019, pp. 8440–8449.
[16] |
M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11474– 11481.
[17] |
W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5349–5367, 2022.
[18] |
M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-time scene text detection with differentiable binarization and adaptive scale fusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919–931, 2023. doi: 10.1109/TPAMI.2022.3155612
[19] |
N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE computer society conference on computer vision and pattern recognition, vol. 1, 2005, pp. 886–893.
[20] |
D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. the 7th IEEE Int. Conf. on Computer Vision, vol. 2, 1999, pp. 1150–1157.
[21] |
J. Zhang, W. Feng, T. Yuan, J. Wang, and A. K. Sangaiah, “Scstcf: spatial-channel selection and temporal regularized correlation filters for visual tracking,” Applied Soft Computing, vol. 118, p. 108485, 2022. doi: 10.1016/j.asoc.2022.108485
[22] |
S. Kan, Y. Cen, Z. He, Z. Zhang, L. Zhang, and Y. Wang, “Supervised deep feature embedding with handcrafted feature,” IEEE Trans. Image Processing, vol. 28, no. 12, pp. 5809–5823, 2019. doi: 10.1109/TIP.2019.2901407
[23] |
M. Cao, C. Zhang, D. Yang, and Y. Zou, “All you need is a second look: Towards arbitrary-shaped text detection,” IEEE Trans. Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 758–767, 2022. doi: 10.1109/TCSVT.2021.3068133
[24] |
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
[25] |
Q. Wan, H. Ji, and L. Shen, “Self-attention based text knowledge mining for text detection,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5979–5988.
[26] |
J. Greenhalgh and M. Mirmehdi, “Recognizing text-based traffic signs,” IEEE Trans. Intelligent Transportation Systems, vol. 16, no. 3, pp. 1360–1369, 2015. doi: 10.1109/TITS.2014.2363167
[27] |
X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I 14, 2016, pp. 109–121.
[28] |
Y. Zhu, M. Liao, M. Yang, and W. Liu, “Cascaded segmentationdetection networks for text-based traffic sign detection,” IEEE Trans. Intelligent Transportation Systems, vol. 19, no. 1, pp. 209–219, 2018. doi: 10.1109/TITS.2017.2768827
[29] |
Z. Zuo and P. Yang, “A traffic sign text detection system for pratical natural scenes,” in 24th Int. Conf. on Parallel and Distributed Systems (ICPADS), 2018, pp. 1069–1074.
[30] |
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in 14th European Conf. Computer Vision, 2016, pp. 21–37.
[31] |
X. He, R. Wang, X. Li, X. Chen, C. Guo, L. Wen, C. Gao, and L. Liu, “Htstl: Head-and-tail search network with scale-transfer layer for traffic sign text detection,” IEEE Access, vol. 7, pp. 118333–118342, 2019. doi: 10.1109/ACCESS.2019.2936540
[32] |
J.-B. Hou, X. Zhu, C. Liu, C. Yang, L.-H. Wu, H. Wang, and X.-C. Yin, “Detecting text in scene and traffic guide panels with attention anchor mechanism,” IEEE Trans. Intelligent Transportation Systems, vol. 22, no. 11, pp. 6890–6899, 2021. doi: 10.1109/TITS.2020.2996027
[33] |
M. Liang, X. Zhu, H. Zhou, J. Qin, and X.-C. Yin, “Hfenet: Hybrid feature enhancement network for detecting texts in scenes and traffic panels,” IEEE Trans. Intelligent Transportation Systems, vol. 24, p. 12, 2023.
[34] |
S. Khalid, J. H. Shah, M. Sharif, F. Dahan, R. Saleem, and A. Masood, “A robust intelligent system for text-based traffic signs detection and recognition in challenging weather conditions,” IEEE Access, vol. 12, pp. 78261–78274, 2024. doi: 10.1109/ACCESS.2024.3401044
[35] |
X. He, Z. Li, J. Lin, K. Nai, J. Yuan, Y. Li, and R. Wang, “Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes,” Computer Vision and Image Understanding, vol. 233, p. 103709, 2023. doi: 10.1016/j.cviu.2023.103709
[36] |
M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: a fast text detector with a single deep neural network,” in Proc. the 31th AAAI Conf. on Artificial Intelligence, 2017, pp. 4161–4167.
[37] |
J. Li, Y. Lin, R. Liu, C. M. Ho, and H. Shi, “Rsca: Real-time segmentation-based context-aware scene text detection,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 2349–2358.
[38] |
Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu, and H. Chen, “Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 8048–8064, 2022.
[39] |
X. Qin, P. Lyu, C. Zhang, Y. Zhou, K. Yao, P. Zhang, H. Lin, and W. Wang, “Towards robust real-time scene text detection: From semantic to instance representation learning,” in Proc. the 31st ACM Int. Conf. on Multimedia, 2023, p. 2025–2034.
[40] |
Y. Zhao, Y. Cai, W. Wu, and W. Wang, “Explore faster localization learning for scene text detection,” in IEEE Int. Conf. on Multimedia and Expo (ICME), 2023, pp. 156–161.
[41] |
Z. Huang, W. Xu, and K. Yu, “Bidirectional lstm-crf models for sequence tagging,” arXiv e-prints, pp. 565–571, 2015.
[42] |
F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 4th Int. Conf. on 3D Vision, 2016, pp. 565–571.
[43] |
A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
[44] |
Y. Chen and L. Huang, “Chinese traffic panels detection and recognition from street-level images,” MATEC Web of Conf.s, vol. 42, pp. 1–7, 2016. doi: 10.1051/matecconf/20164200001
[45] |
D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, “Icdar 2015 competition on robust reading,” in 13th Int. Conf. on Document Analysis and Recognition (ICDAR), 2015, pp. 1156–1160.
[46] |
L. Yuliang, J. Lianwen, Z. Shuaitao, and Z. Sheng, “Detecting Curve Text in the Wild: New Dataset and New Solution,” arXiv e-prints, pp. 1–9, 2017.
[47] |
A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2315–2324.
[48] |
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
[49] |
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv e-prints, pp. 1–15, 2014.
[50] |
J. Wang, S. Wu, H. Zhang, B. Yuan, C. Dai, and N. R. Pal, “Universal approximation abilities of a modular differentiable neural network,” IEEE Trans. Neural Networks and Learning Systems, pp. 1–15, 2024.
[51] |
J. Zhang, Y. He, W. Chen, L.-D. Kuang, and B. Zheng, “Corrformer: Context-aware tracking with cross-correlation and transformer,” Computers and Electrical Engineering, vol. 114, p. 109075, 2024. doi: 10.1016/j.compeleceng.2024.109075
[52] |
Y. Zhang, T. Zhang, C. Wu, and R. Tao, “Multi-scale spatiotemporal feature fusion network for video saliency prediction,” IEEE Trans. Multimedia, vol. 26, pp. 4183–4193, 2024. doi: 10.1109/TMM.2023.3321394