A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125022
Citation: R. Wang, Y. Zhu, Z. Zhu, L. Cui, Z. Wan, A. Zhu, Y. Ding, S. Qian, C. Gao, and N. Sang, “LTDNet: A lightweight text detector for real-time arbitrary-shape traffic text detection,” IEEE/CAA J. Autom. Sinica, 2025. doi: 10.1109/JAS.2024.125022

LTDNet: A Lightweight Text Detector for Real-Time Arbitrary-Shape Traffic Text Detection

doi: 10.1109/JAS.2024.125022
Funds:  This work was supported in part by the National Natural Science Foundation of China (61502164, 62176097), the Natural Science Foundation of Hunan Province (2020JJ4057), the Key Research and Development Program of Changsha Science and Technology Bureau (kq2004050), and the Scientific Research Foundation of the Education Department of Hunan Province of China (21A0052)
More Information
  • Traffic text detection plays a vital role in understanding traffic scenes. Traffic text, a distinct subset of natural scene text, faces specific challenges not found in natural scene text detection, including false alarms from non-traffic text sources, such as roadside advertisements and building signs. Existing state-of-the-art methods employ increasingly complex detection frameworks to pursue higher accuracy, leading to challenges with real-time performance. In response to this issue, we propose a real-time and efficient traffic text detector named LTDNet, which strikes a balance between accuracy and real-time capabilities. LTDNet integrates three essential techniques to address these challenges effectively. First, a cascaded multilevel feature fusion network is employed to mitigate the limitations of lightweight backbone networks, thereby enhancing detection accuracy. Second, a lightweight feature attention module is introduced to enhance inference speed without compromising accuracy. Finally, a novel point-to-edge distance vector loss function is proposed to precisely localize text instance boundaries within traffic contexts. The superiority of our method is validated through extensive experiments on five publicly available datasets, demonstrating its state-of-the-art performance. The code will be released at https://github.com/runminwang/LTDNet.

     

  • loading
  • [1]
    J. Zhang, Y. Lv, J. Tao, F. Huang, and J. Zhang, “A robust realtime anchor-free traffic sign detector with one-level feature,” IEEE Trans. Emerging Topics in Computational Intelligence, vol. 8, no. 2, pp. 1437–1451, 2024. doi: 10.1109/TETCI.2024.3349464
    [2]
    J. Zhang, X. Zou, L.-D. Kuang, J. Wang, R. S. Sherratt, and X. Yu, “Cctsdb 2021: a more comprehensive traffic sign detection benchmark,” Human-centric Computing and Information Sciences, vol. 12, pp. 1–18, 2022.
    [3]
    J. Wang, Y. Chen, X. Ji, Z. Dong, M. Gao, and C. S. Lai, “Vehiclemounted adaptive traffic sign detector for small-sized signs in multiple working conditions,” IEEE Trans. Intelligent Transportation Systems, vol. 25, no. 1, pp. 710–724, 2024. doi: 10.1109/TITS.2023.3309644
    [4]
    R. Wang, J. Hei, M. Liu, Z. Wan, J. Xu, X. Cao, X. He, Y. Ding, C. Gao, and N. Sang, “Cr2-net: Component relationship reasoning network for traffic text detection,” IEEE Trans. Intelligent Vehicles, pp. 1–15, 2024.
    [5]
    R. Bagi, T. Dutta, N. Nigam, D. Verma, and H. P. Gupta, “Met-mlts: Leveraging smartphones for end-to-end spotting of multilingual oriented scene texts and traffic signs in adverse meteorological conditions,” IEEE Trans. Intelligent Transportation Systems, vol. 23, p. 8, 2022.
    [6]
    D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu et al., “Icdar 2015 competition on robust reading,” in 13th international conference on document analysis and recognition (ICDAR), 2015, pp. 1156–1160.
    [7]
    W. Wang, E. Xie, X. Li, W. Hou, T. Lu, G. Yu, and S. Shao, “Shape robust text detection with progressive scale expansion network,” in 2019 IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 9328–9337.
    [8]
    M. Liao, B. Shi, and X. Bai, “Textboxes++: A single-shot oriented scene text detector,” IEEE Trans. Image Processing, vol. 27, no. 8, pp. 3676–3690, 2018. doi: 10.1109/TIP.2018.2825107
    [9]
    P. He, W. Huang, T. He, Q. Zhu, Y. Qiao, and X. Li, “Single shot text detector with regional attention,” in Proc. the IEEE Int. Conf. on Computer Vision, 2017, pp. 3047–3055.
    [10]
    M. Liao, Z. Zhu, B. Shi, G.-s. Xia, and X. Bai, “Rotation-sensitive regression for oriented scene text detection,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 5909–5918.
    [11]
    S. Long, J. Ruan, W. Zhang, X. He, W. Wu, and C. Yao, “Textsnake: A flexible representation for detecting text of arbitrary shapes,” in Proc. the European conference on computer vision (ECCV), 2018, pp. 20–36.
    [12]
    X. Zhou, C. Yao, H. Wen, Y. Wang, S. Zhou, W. He, and J. Liang, “East: an efficient and accurate scene text detector,” in Proc. the IEEE conference on Computer Vision and Pattern Recognition, 2017, pp. 5551–5560.
    [13]
    Y. Wang, H. Xie, Z.-J. Zha, M. Xing, Z. Fu, and Y. Zhang, “Contournet: Taking a further step toward accurate arbitrary-shaped scene text detection,” in proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 11753–11762.
    [14]
    S. Liu, Y. Xian, H. Li, and Z. Yu, “Text detection in natural scene images using morphological component analysis and laplacian dictionary,” IEEE/CAA Journal of Automatica Sinica, vol. 7, no. 1, pp. 214–222, 2020. doi: 10.1109/JAS.2017.7510427
    [15]
    W. Wang, E. Xie, X. Song, Y. Zang, W. Wang, T. Lu, G. Yu, and C. Shen, “Efficient and accurate arbitrary-shaped text detection with pixel aggregation network,” in Proc. the IEEE/CVF international conference on computer vision, 2019, pp. 8440–8449.
    [16]
    M. Liao, Z. Wan, C. Yao, K. Chen, and X. Bai, “Real-time scene text detection with differentiable binarization,” in Proc. the AAAI conference on artificial intelligence, vol. 34, no. 07, 2020, pp. 11474– 11481.
    [17]
    W. Wang, E. Xie, X. Li, X. Liu, D. Liang, Z. Yang, T. Lu, and C. Shen, “Pan++: Towards efficient and accurate end-to-end spotting of arbitrarily-shaped text,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 9, pp. 5349–5367, 2022.
    [18]
    M. Liao, Z. Zou, Z. Wan, C. Yao, and X. Bai, “Real-time scene text detection with differentiable binarization and adaptive scale fusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 45, no. 1, pp. 919–931, 2023. doi: 10.1109/TPAMI.2022.3155612
    [19]
    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in IEEE computer society conference on computer vision and pattern recognition, vol. 1, 2005, pp. 886–893.
    [20]
    D. G. Lowe, “Object recognition from local scale-invariant features,” in Proc. the 7th IEEE Int. Conf. on Computer Vision, vol. 2, 1999, pp. 1150–1157.
    [21]
    J. Zhang, W. Feng, T. Yuan, J. Wang, and A. K. Sangaiah, “Scstcf: spatial-channel selection and temporal regularized correlation filters for visual tracking,” Applied Soft Computing, vol. 118, p. 108485, 2022. doi: 10.1016/j.asoc.2022.108485
    [22]
    S. Kan, Y. Cen, Z. He, Z. Zhang, L. Zhang, and Y. Wang, “Supervised deep feature embedding with handcrafted feature,” IEEE Trans. Image Processing, vol. 28, no. 12, pp. 5809–5823, 2019. doi: 10.1109/TIP.2019.2901407
    [23]
    M. Cao, C. Zhang, D. Yang, and Y. Zou, “All you need is a second look: Towards arbitrary-shaped text detection,” IEEE Trans. Circuits and Systems for Video Technology, vol. 32, no. 2, pp. 758–767, 2022. doi: 10.1109/TCSVT.2021.3068133
    [24]
    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2016, pp. 770–778.
    [25]
    Q. Wan, H. Ji, and L. Shen, “Self-attention based text knowledge mining for text detection,” in IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 5979–5988.
    [26]
    J. Greenhalgh and M. Mirmehdi, “Recognizing text-based traffic signs,” IEEE Trans. Intelligent Transportation Systems, vol. 16, no. 3, pp. 1360–1369, 2015. doi: 10.1109/TITS.2014.2363167
    [27]
    X. Rong, C. Yi, and Y. Tian, “Recognizing text-based traffic guide panels with cascaded localization network,” in Computer Vision–ECCV 2016 Workshops: Amsterdam, The Netherlands, October 8-10 and 15-16, 2016, Proceedings, Part I 14, 2016, pp. 109–121.
    [28]
    Y. Zhu, M. Liao, M. Yang, and W. Liu, “Cascaded segmentationdetection networks for text-based traffic sign detection,” IEEE Trans. Intelligent Transportation Systems, vol. 19, no. 1, pp. 209–219, 2018. doi: 10.1109/TITS.2017.2768827
    [29]
    Z. Zuo and P. Yang, “A traffic sign text detection system for pratical natural scenes,” in 24th Int. Conf. on Parallel and Distributed Systems (ICPADS), 2018, pp. 1069–1074.
    [30]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in 14th European Conf. Computer Vision, 2016, pp. 21–37.
    [31]
    X. He, R. Wang, X. Li, X. Chen, C. Guo, L. Wen, C. Gao, and L. Liu, “Htstl: Head-and-tail search network with scale-transfer layer for traffic sign text detection,” IEEE Access, vol. 7, pp. 118333–118342, 2019. doi: 10.1109/ACCESS.2019.2936540
    [32]
    J.-B. Hou, X. Zhu, C. Liu, C. Yang, L.-H. Wu, H. Wang, and X.-C. Yin, “Detecting text in scene and traffic guide panels with attention anchor mechanism,” IEEE Trans. Intelligent Transportation Systems, vol. 22, no. 11, pp. 6890–6899, 2021. doi: 10.1109/TITS.2020.2996027
    [33]
    M. Liang, X. Zhu, H. Zhou, J. Qin, and X.-C. Yin, “Hfenet: Hybrid feature enhancement network for detecting texts in scenes and traffic panels,” IEEE Trans. Intelligent Transportation Systems, vol. 24, p. 12, 2023.
    [34]
    S. Khalid, J. H. Shah, M. Sharif, F. Dahan, R. Saleem, and A. Masood, “A robust intelligent system for text-based traffic signs detection and recognition in challenging weather conditions,” IEEE Access, vol. 12, pp. 78261–78274, 2024. doi: 10.1109/ACCESS.2024.3401044
    [35]
    X. He, Z. Li, J. Lin, K. Nai, J. Yuan, Y. Li, and R. Wang, “Domain adaptive multigranularity proposal network for text detection under extreme traffic scenes,” Computer Vision and Image Understanding, vol. 233, p. 103709, 2023. doi: 10.1016/j.cviu.2023.103709
    [36]
    M. Liao, B. Shi, X. Bai, X. Wang, and W. Liu, “Textboxes: a fast text detector with a single deep neural network,” in Proc. the 31th AAAI Conf. on Artificial Intelligence, 2017, pp. 4161–4167.
    [37]
    J. Li, Y. Lin, R. Liu, C. M. Ho, and H. Shi, “Rsca: Real-time segmentation-based context-aware scene text detection,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR) Workshops, 2021, pp. 2349–2358.
    [38]
    Y. Liu, C. Shen, L. Jin, T. He, P. Chen, C. Liu, and H. Chen, “Abcnet v2: Adaptive bezier-curve network for real-time end-to-end text spotting,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 11, pp. 8048–8064, 2022.
    [39]
    X. Qin, P. Lyu, C. Zhang, Y. Zhou, K. Yao, P. Zhang, H. Lin, and W. Wang, “Towards robust real-time scene text detection: From semantic to instance representation learning,” in Proc. the 31st ACM Int. Conf. on Multimedia, 2023, p. 2025–2034.
    [40]
    Y. Zhao, Y. Cai, W. Wu, and W. Wang, “Explore faster localization learning for scene text detection,” in IEEE Int. Conf. on Multimedia and Expo (ICME), 2023, pp. 156–161.
    [41]
    Z. Huang, W. Xu, and K. Yu, “Bidirectional lstm-crf models for sequence tagging,” arXiv e-prints, pp. 565–571, 2015.
    [42]
    F. Milletari, N. Navab, and S. Ahmadi, “V-net: Fully convolutional neural networks for volumetric medical image segmentation,” in 4th Int. Conf. on 3D Vision, 2016, pp. 565–571.
    [43]
    A. Shrivastava, A. Gupta, and R. Girshick, “Training region-based object detectors with online hard example mining,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 761–769.
    [44]
    Y. Chen and L. Huang, “Chinese traffic panels detection and recognition from street-level images,” MATEC Web of Conf.s, vol. 42, pp. 1–7, 2016. doi: 10.1051/matecconf/20164200001
    [45]
    D. Karatzas, L. Gomez-Bigorda, A. Nicolaou, S. Ghosh, A. Bagdanov, M. Iwamura, J. Matas, L. Neumann, V. R. Chandrasekhar, S. Lu, F. Shafait, S. Uchida, and E. Valveny, “Icdar 2015 competition on robust reading,” in 13th Int. Conf. on Document Analysis and Recognition (ICDAR), 2015, pp. 1156–1160.
    [46]
    L. Yuliang, J. Lianwen, Z. Shuaitao, and Z. Sheng, “Detecting Curve Text in the Wild: New Dataset and New Solution,” arXiv e-prints, pp. 1–9, 2017.
    [47]
    A. Gupta, A. Vedaldi, and A. Zisserman, “Synthetic data for text localisation in natural images,” in IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 2315–2324.
    [48]
    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in IEEE Conf. on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
    [49]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv e-prints, pp. 1–15, 2014.
    [50]
    J. Wang, S. Wu, H. Zhang, B. Yuan, C. Dai, and N. R. Pal, “Universal approximation abilities of a modular differentiable neural network,” IEEE Trans. Neural Networks and Learning Systems, pp. 1–15, 2024.
    [51]
    J. Zhang, Y. He, W. Chen, L.-D. Kuang, and B. Zheng, “Corrformer: Context-aware tracking with cross-correlation and transformer,” Computers and Electrical Engineering, vol. 114, p. 109075, 2024. doi: 10.1016/j.compeleceng.2024.109075
    [52]
    Y. Zhang, T. Zhang, C. Wu, and R. Tao, “Multi-scale spatiotemporal feature fusion network for video saliency prediction,” IEEE Trans. Multimedia, vol. 26, pp. 4183–4193, 2024. doi: 10.1109/TMM.2023.3321394

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(6)  / Tables(4)

    Article Metrics

    Article views (45) PDF downloads(16) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return