A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 7 Issue 5
Sep.  2020

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Xuesong Li, Yating Liu, Kunfeng Wang and Fei-Yue Wang, "A Recurrent Attention and Interaction Model for Pedestrian Trajectory Prediction," IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1361-1370, Sept. 2020. doi: 10.1109/JAS.2020.1003300
Citation: Xuesong Li, Yating Liu, Kunfeng Wang and Fei-Yue Wang, "A Recurrent Attention and Interaction Model for Pedestrian Trajectory Prediction," IEEE/CAA J. Autom. Sinica, vol. 7, no. 5, pp. 1361-1370, Sept. 2020. doi: 10.1109/JAS.2020.1003300

A Recurrent Attention and Interaction Model for Pedestrian Trajectory Prediction

doi: 10.1109/JAS.2020.1003300
Funds:  This work was supported by the National Natural Science Foundation of China (U1811463) and the Fundamental Research Funds for the Central Universities (12060093192)
More Information
  • The movement of pedestrians involves temporal continuity, spatial interactivity, and random diversity. As a result, pedestrian trajectory prediction is rather challenging. Most existing trajectory prediction methods tend to focus on just one aspect of these challenges, ignoring the temporal information of the trajectory and making too many assumptions. In this paper, we propose a recurrent attention and interaction (RAI) model to predict pedestrian trajectories. The RAI model consists of a temporal attention module, spatial pooling module, and randomness modeling module. The temporal attention module is proposed to assign different weights to the input sequence of a target, and reduce the speed deviation of different pedestrians. The spatial pooling module is proposed to model not only the social information of neighbors in historical frames, but also the intention of neighbors in the current time. The randomness modeling module is proposed to model the uncertainty and diversity of trajectories by introducing random noise. We conduct extensive experiments on several public datasets. The results demonstrate that our method outperforms many that are state-of-the-art.

     

  • loading
  • *Xuesong Li and Yating Liu contributed equally to this work.
  • [1]
    F. Large, D. Vasquez, T. Fraichard, and C. Laugier, “Avoiding cars and pedestrians using velocity obstacles and motion prediction,” in Proc. IEEE Intelligent Vehicles Symp., Parma, Italy: IEEE, Jun. 2004. pp. 375−379.
    [2]
    S. Thompson, T. Horiuchi, and S. Kagami, “A probabilistic model of human motion and navigation intent for mobile robot path planning,” in Proc. 4th IEEE Int. Conf. Autonomous Robots and Agents, Wellington, New Zealand: IEEE, 2009, pp. 663−668.
    [3]
    D. Helbing, I. Farkas, and T. Vicsek, “Simulating dynamical features of escape panic,” Nature, vol. 407, no. 6803, pp. 487–490, Sep. 2000. doi: 10.1038/35035023
    [4]
    T. Fernando, S. Denman, S. Sridharan, and C. Fookes, “Soft + Hardwired attention: An LSTM framework for human trajectory prediction and abnormal event detection,” Neural Netw., vol. 108, pp. 466–478, Dec. 2018. doi: 10.1016/j.neunet.2018.09.002
    [5]
    D. Helbing and P. Molnár, “Social force model for pedestrian dynamics,” Phys. Rev. E, vol. 51, no. 5, pp. 4282–4286, May 1995. doi: 10.1103/PhysRevE.51.4282
    [6]
    B. T. Morris and M. M. Trivedi, “Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2287–2301, Nov. 2011. doi: 10.1109/TPAMI.2011.64
    [7]
    A. Alahi, K. Goel, V. Ramanathan, A. Robicquet, F. F. Li, and S. Savarese, “Social LSTM: Human trajectory prediction in crowded spaces,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 961−971.
    [8]
    N. Bisagno, B. Zhang, and N. Conci, “Group LSTM: Group trajectory prediction in crowded scenarios,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 213−225.
    [9]
    A. Gupta, J. Johnson, F. F. Li, S. Savarese, and A. Alahi, “Social GAN: Socially acceptable trajectories with generative adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 2255−2264.
    [10]
    A. Sadeghian, V. Kosaraju, A. Sadeghian, N. Hirose, H. Rezatofighi, and S. Savarese, “SoPhie: An attentive GAN for predicting paths compliant to social and physical constraints,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 1349−1358.
    [11]
    J. Amirian, J. B. Hayet, and J. Pettré, “Social ways: Learning multi-modal distributions of pedestrian trajectories with GANs,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 2964−2972.
    [12]
    J. W. Liang, L. Jiang, J. C. Niebles, A. G. Hauptmann, and F. F. Li, “Peeking into the future: Predicting future person activities and locations in videos,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5718−5727.
    [13]
    M. Huynh and G. Alaghband, “Trajectory prediction by coupling scene-LSTM with human movement LSTM,” in Proc. Int. Symp. Visual Computing, Lake Tahoe, USA: Springer, Cham, 2019, pp. 244−259.
    [14]
    H. Minoura, T. Hirakawa, T. Yamashita, and H. Fujiyoshi, “Path predictions using object attributes and semantic environment,” in Proc. 14th Int. Conf. Computer Vision Theory and Applications, Prague, Czech Republic, 2019, pp. 19−26.
    [15]
    S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, “You’ll never walk alone: Modeling social behavior for multi-target tracking,” in Proc. 12th IEEE Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 261−268.
    [16]
    A. Lerner, Y. Chrysanthou, and D. Lischinski, “Crowds by example,” Comput. Graph. Forum, vol. 26, no. 3, pp. 655–664, Sep. 2007. doi: 10.1111/j.1467-8659.2007.01089.x
    [17]
    J. Jo, S. Hwang, S. Lee, and Y. Lee, “Multi-mode LSTM network for energy-efficient speech recognition,” in Proc. Int. SoC Design Conf., Daegu, Korea (South), 2018, pp. 133−134.
    [18]
    R. M. Li, C. Y. Jiang, F. H. Zhu, and X. L. Chen, “Traffic flow data forecasting based on interval type-2 fuzzy sets theory,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 2, pp. 141–148, Apr. 2016. doi: 10.1109/JAS.2016.7451101
    [19]
    R. Achkar, F. Elias-Sleiman, H. Ezzidine, and N. Haidar, “Comparison of BPA-MLP and LSTM-RNN for stocks prediction,” in Proc. 6th Int. Symp. Computational and Business Intelligence, Basel, Switzerland, 2018, pp. 48−51.
    [20]
    F. A. Gers, J. Schmidhuber, and F. Cummins, “Learning to forget: Continual prediction with LSTM,” in Proc. 9th Int. Conf. Artificial Neural Networks, Edinburgh, UK, 1999, pp. 850−855.
    [21]
    J. Chung, C. Gulcehre, K. H. Cho, and Y. Bengio, “Empirical evaluation of gated recurrent neural networks on sequence modeling,” arXiv preprint arXiv: 1412.3555, 2014.
    [22]
    P. Zhang, W. L. Ouyang, P. F. Zhang, J. R. Xue, and N. N. Zheng, “SR-LSTM: State refinement for LSTM towards pedestrian trajectory prediction,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 12077−12086.
    [23]
    Y. Y. Xu, Z. X. Piao, and S. H. Gao, “Encoding crowd interaction with deep neural network for pedestrian trajectory prediction,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 5275−5284.
    [24]
    X. H. Wang and H. B. Duan, “Hierarchical visual attention model for saliency detection inspired by avian visual pathways,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 2, pp. 540–552, Mar. 2019. doi: 10.1109/JAS.2017.7510664
    [25]
    F. Zheng, C. Deng, X. Sun, X. Y. Jiang, X. W. Guo, Z. Q. Yu, F. Y. Huang, and R. R. Ji, “Pyramidal person Re-IDentification via multi-loss dynamic training,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 8506−8514.
    [26]
    P. Chen, X. Y. Xu, and C. Deng, “Deep view-aware metric learning for person re-identification,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 620−626.
    [27]
    C. H. Shan, J. B. Zhang, Y. J. Wang, and L. Xie, “Attention-based end-to-end speech recognition on voice search,” in Proc. IEEE Int. Conf. Acoustics, Speech, and Signal Processing, Calgary, Canada, 2018, pp. 4764−4768.
    [28]
    P. Zhou, W. Shi, J. Tian, Z. Y. Qi, B. C. Li, H. W. Hao, and B. Xu, “Attention-based bidirectional long short-term memory networks for relation classification,” in Proc. 54th Annu. Meeting of the Association for Computational Linguistics, Berlin, Germany, 2016, pp. 207−212.
    [29]
    A. Al-Molegi, M. Jabreel, and A. Martínez-Ballesté, “Move, attend and predict: An attention-based neural model for people’s movement prediction,” Pattern Recognit. Lett., vol. 112, pp. 34–40, Sep. 2018. doi: 10.1016/j.patrec.2018.05.015
    [30]
    S. Haddad, M. Q. Wu, H. Wei, and S. K. Lam, “Situation-aware pedestrian trajectory prediction with spatio-temporal attention model,” in Proc. 24th Computer Vision Winter Workshop, Stift Vorau, Austria, 2019.
    [31]
    A. Vemula, K. Muelling, and J. Oh, “Social attention: Modeling attention in human crowds,” in Proc. IEEE Int. Conf. Robotics and Autom., Brisbane, Australia, 2018, pp. 4601−4607.
    [32]
    Y. L. Zhu, D. H. Qian, D. C. Ren, and H. X. Xia, “StarNet: Pedestrian trajectory prediction using deep neural network in star topology,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Macau, China, 2019, pp. 8075−8080.
    [33]
    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)  / Tables(4)

    Article Metrics

    Article views (1840) PDF downloads(108) Cited by()

    Highlights

    • Spatio-temporal feature is mined to model the attention, interaction and randomness of motion.
    • Attention module assigns different weights to the input sequence of target in time domain.
    • Spatial pooling module models the social behaviors of neighbors at current and historical moments.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return