A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 7
Jul.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
I. Ahmed, S. D. D. n, G. Jeon, F. Piccialli, and G. Fortino, "Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253-1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453
Citation: I. Ahmed, S. D. D. n, G. Jeon, F. Piccialli, and G. Fortino, "Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253-1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453

Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning

doi: 10.1109/JAS.2020.1003453
Funds:  This work was supported by the Framework of International Cooperation Program managed by the National Research Foundation of Korea (2019K1A3A1A8011295711)
More Information
  • Collaborative Robotics is one of the high-interest research topics in the area of academia and industry. It has been progressively utilized in numerous applications, particularly in intelligent surveillance systems. It allows the deployment of smart cameras or optical sensors with computer vision techniques, which may serve in several object detection and tracking tasks. These tasks have been considered challenging and high-level perceptual problems, frequently dominated by relative information about the environment, where main concerns such as occlusion, illumination, background, object deformation, and object class variations are commonplace. In order to show the importance of top view surveillance, a collaborative robotics framework has been presented. It can assist in the detection and tracking of multiple objects in top view surveillance. The framework consists of a smart robotic camera embedded with the visual processing unit. The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization. The detection models are further combined with different tracking algorithms, including GOTURN, MEDIANFLOW, TLD, KCF, MIL, and BOOSTING. These algorithms, along with detection models, help to track and predict the trajectories of detected objects. The pre-trained models are employed; therefore, the generalization performance is also investigated through testing the models on various sequences of top view data set. The detection models achieved maximum True Detection Rate 93% to 90% with a maximum 0.6% False Detection Rate. The tracking results of different algorithms are nearly identical, with tracking accuracy ranging from 90% to 94%. Furthermore, a discussion has been carried out on output results along with future guidelines.

     

  • loading
  • [1]
    L. G. Clift, J. Lepley, H. Hagras, and A. F. Clark, “Autonomous computational intelligence-based behaviour recognition in security and surveillance,” in Proc. SPIE 10802, Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies II, Berlin, Germany, 2018, pp. 108020L.
    [2]
    H. M. Hodgetts, F. Vachon, C. Chamberland, and S. Tremblay, “See no evil: Cognitive challenges of security surveillance and monitoring,” J. Appl. Res. Mem. Cognit., vol. 6, no. 3, pp. 230–243, Sept. 2017. doi: 10.1016/j.jarmac.2017.05.001
    [3]
    P. Bansal and K. M. Kockelman, “Are we ready to embrace connected and self-driving vehicles? A case study of Texans” Transportation, vol. 45, no. 2, pp. 641–675, Mar. 2018. doi: 10.1007/s11116-016-9745-z
    [4]
    M. Haghighat and M. Abdel-Mottaleb, “Low resolution face recognition in surveillance systems using discriminant correlation analysis,” in Proc. 12th IEEE Int. Conf. Automatic Face & Gesture Recognition, Washington, USA, 2017, pp. 912–917.
    [5]
    Y. Jeong, S. Son, E. Jeong, and B. Lee, “An integrated self-diagnosis system for an autonomous vehicle based on an IoT gateway and deep learning,” Appl. Sci., vol. 8, no. 7, Article No. 1164, Jul. 2018. doi: 10.3390/app8071164
    [6]
    M. Chen, J. Zhou, G. M. Tao, J. Yang, and L. Hu, “Wearable affective robot,” IEEE Access, vol. 6, pp. 64766–64776, Oct. 2018. doi: 10.1109/ACCESS.2018.2877919
    [7]
    M. Chen and Y. X. Hao, “Label-less learning for emotion cognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2430–2440, Jul. 2020.
    [8]
    M. Chen, Y. Cao, R. Wang, Y. Li, D. Wu, and Z. C. Liu, “Deepfocus: Deep encoding brainwaves and emotions with multi-scenario behavior analytics for human attention enhancement,” IEEE Netw., vol. 33, no. 6, pp. 70–77, Nov.–Dec. 2019. doi: 10.1109/MNET.001.1900054
    [9]
    Z. X. Zou, Z. W. Shi, Y. H. Guo, and J. P. Ye, “Object detection in 20 years: A survey,” arXiv preprint arXiv: 1905.05055, 2019.
    [10]
    R. Yao, G. S. Lin, S. X. Xia, J. Q. Zhao, and Y. Zhou, “Video object segmentation and tracking: A survey,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 4, pp. 1–47, May 2020.
    [11]
    K. A. Joshi and D. G. Thakore, “A survey on moving object detection and tracking in video surveillance system,” Int. J. Soft Comput. Eng., vol. 2, no. 3, pp. 44–48, Jul. 2012.
    [12]
    S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, S. L. Hicks, and P. H. S. Torr, “Struck: Structured output tracking with kernels,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 2096–2109, Oct. 2016. doi: 10.1109/TPAMI.2015.2509974
    [13]
    F. Yang, H. Lu, W. Zhang, and G. Yang, “Visual tracking via bag of features,” IET Image Process., vol. 6, no. 2, pp. 115–128, Mar. 2012. doi: 10.1049/iet-ipr.2010.0127
    [14]
    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005.
    [15]
    J. L. Fan, X. H. Shen, and Y. Wu, “Scribble tracker: A matting-based approach for robust tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp. 1633–1644, Aug. 2012. doi: 10.1109/TPAMI.2011.257
    [16]
    X. Li, A. Dick, C. H. Shen, A. Van den Hengel, and H. Z. Wang, “Incremental learning of 3D-DCT compact representations for robust visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp. 863–881, Apr. 2013. doi: 10.1109/TPAMI.2012.166
    [17]
    H. S. Parekh, D. G. Thakore, and U. K. Jaliya, “A survey on object detection and tracking methods,” Int. J. Innovat. Res. Comput. Commun. Eng., vol. 2, no. 2, pp. 2970–2978, Feb. 2014.
    [18]
    P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv: 1312.6229, 2013.
    [19]
    G. Fortino, W. Russo, C. Savaglio, W. M. Shen, and M. C. Zhou, “Agent-oriented cooperative smart objects: From IoT system design to implementation,” IEEE Trans. Syst. Man Cybern. Syst., vol. 48, no. 11, pp. 1939–1956, Nov. 2018. doi: 10.1109/TSMC.2017.2780618
    [20]
    R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1440–1448.
    [21]
    S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware CNN model,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1134–1142.
    [22]
    J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 779–788.
    [23]
    W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
    [24]
    J. F. Dai, Y. Li, K. M. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 379–387.
    [25]
    L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in Proc. European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 850–865.
    [26]
    A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1442–1468, Jul. 2014. doi: 10.1109/TPAMI.2013.230
    [27]
    G. Smart, N. Deligiannis, R. Surace, V. Loscri, G. Fortino, and Y. Andreopoulos, “Decentralized time-synchronized channel swapping for ad hoc wireless networks,” IEEE Trans. Vehicular Technol., vol. 65, no. 10, pp. 8538–8553, Oct. 2016. doi: 10.1109/TVT.2015.2509861
    [28]
    Y. Wu, J. Lim, and M. H. Yang, “Object tracking benchmark,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1834–1848, Sept. 2015. doi: 10.1109/TPAMI.2014.2388226
    [29]
    G. Ciaparrone, F. L. Sánchez, S. Tabik, L. Troiano, R. Tagliaferri, and F. Herrera, “Deep learning in video multi-object tracking: A survey,” Neurocomputing, vol. 381, pp. 61–88, Mar. 2020. doi: 10.1016/j.neucom.2019.11.023
    [30]
    T. Kong, A. B. Yao, Y. R. Chen, and F. C. Sun, “Hypernet: Towards accurate region proposal generation and joint object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 845–853.
    [31]
    G. Zhu, F. Porikli, and H. D. Li, “Robust visual tracking with deep convolutional neural network based object proposals on pets,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, USA, 2016, pp. 1265–1272.
    [32]
    M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. van Gool, “Online multiperson tracking-by-detection from a single, uncalibrated camera,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1820–1833, Sept. 2011. doi: 10.1109/TPAMI.2010.232
    [33]
    K. Potdar, C. D. Pai, and S. Akolkar, “A convolutional neural network based live object recognition system as blind aid,” arXiv preprint arXiv: 1811.10399, 2018.
    [34]
    A. Vavilin and K. H. Jo, “Motion analysis for scenes with multiple moving objects,” IEEJ Trans. Electron. Inf. Syst., vol. 133, no. 1, pp. 40–46, Jan. 2013.
    [35]
    G. Khan, Z. Tariq, and M. U. G. Khan, “Multi-person tracking based on faster R-CNN and deep appearance features,” in Visual Object Tracking with Deep Neural Networks, P. L. Mazzeo, S. Ramakrishnan, and P. Spagnolo, Eds. IntechOpen, 2019.
    [36]
    I. Ahmed and J. N. Carter, “A robust person detector for overhead views,” in Proc. 21st Int. Conf. Pattern Recognition, Tsukuba, Japan, 2012, pp. 1483–1486.
    [37]
    I. Ahmed and A. Adnan, “A robust algorithm for detecting people in overhead views,” Cluster Comput., vol. 21, no. 1, pp. 633–654, Mar. 2018. doi: 10.1007/s10586-017-0968-3
    [38]
    M. Ahmad, I. Ahmed, K. Ullah, I. Khan, and A. Adnan, “Robust background subtraction based person’s counting from overhead view,” in Proc. 9th IEEE Annu. Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2018, pp. 746–752.
    [39]
    H. Tayara and K. T. Chong, “Object detection in very high-resolution aerial images using one-stage densely connected feature pyramid network,” Sensors, vol. 18, no. 10, Article No. 3341, Oct. 2018. doi: 10.3390/s18103341
    [40]
    A. van Etten, “You only look twice: Rapid multi-scale object detection in satellite imagery,” arXiv preprint arXiv: 1805.09512, 2018.
    [41]
    M. Sigalas, M. Pateraki, and P. Trahanias, “Full-body pose tracking—the top view reprojection approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1569–1582, Aug. 2016. doi: 10.1109/TPAMI.2015.2502582
    [42]
    C. Migniot and F. Ababsa, “Hybrid 3D–2D human tracking in a top view,” J. Real-Time Image Process., vol. 11, no. 4, pp. 769–784, Dec. 2016. doi: 10.1007/s11554-014-0429-7
    [43]
    I. Ahmed, S. Din, G. Jeon, and F. Piccialli, “Exploring deep learning models for overhead view multiple object detection,” IEEE Internet Things J., vol. 7, no. 7, pp. 5737–5744, Jul. 2020. doi: 10.1109/JIOT.2019.2951365
    [44]
    M. Ahmad, I. Ahmed, K. Ullah, I. Khan, A. Khattak, and A. Adnan, “Energy efficient camera solution for video surveillance,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 3, pp. 522–529, 2019.
    [45]
    S. R. Zhou, M. L. Ke, J. Qiu, and J. Wang, “A survey of multi-object video tracking algorithms,” in Int. Conf. Applications and Techniques in Cyber Security and Intelligence, J. Abawajy, K. K. R. Choo, R. Islam, Z. Xu, and M. Atiquzzaman, Eds. Cham, Germany: Springer, 2018, pp. 351–369.
    [46]
    P. X. Li, D. Wang, L. J. Wang, and H. C. Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognit., vol. 76, pp. 323–338, Apr. 2018. doi: 10.1016/j.patcog.2017.11.007
    [47]
    D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577, May 2003. doi: 10.1109/TPAMI.2003.1195991
    [48]
    M. Danelljan, F. S. Khan, M. Felsberg, and J. van de Weijer, “Adaptive color attributes for real-time visual tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1090–1097.
    [49]
    D. A. Ross, J. Lim, R. S. Lin, and M. H. Yang, “Incremental learning for robust visual tracking,” Int. J. Comput. Vis., vol. 77, no. 1-3, pp. 125–141, May 2008. doi: 10.1007/s11263-007-0075-7
    [50]
    Q. Wang, F. Chen, W. L. Xu, and M. H. Yang, “Object tracking via partial least squares analysis,” IEEE Trans. Image Process., vol. 21, no. 10, pp. 4454–4465, Oct. 2012. doi: 10.1109/TIP.2012.2205700
    [51]
    Y. Lu, T. F. Wu, and S. C. Zhu, “Online object tracking, learning, and parsing with and-or graphs,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, pp. 3462–3469.
    [52]
    R. Yao, Q. F. Shi, C. H. Shen, Y. N. Zhang, and A. van den Hengel, “Part-based visual tracking with online latent structural learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Portland, USA, 2013, pp. 2363–2370.
    [53]
    Y. C. Bai and M. Tang, “Robust tracking via weakly supervised ranking SVM,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 1854–1861.
    [54]
    J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof, “Prost: Parallel robust online simple tracking,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 723–730.
    [55]
    J. Gall, A. Yao, N. Razavi, L. van Gool, and V. Lempitsky, “Hough forests for object detection, tracking, and action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2188–2202, Nov. 2011. doi: 10.1109/TPAMI.2011.70
    [56]
    L. Zhang and L. van der Maaten, “Preserving structure in model-free tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 4, pp. 756–769, Apr. 2013.
    [57]
    J. Kwon and K. M. Lee, “Tracking by sampling and integratingmultiple trackers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1428–1441, Jul. 2014. doi: 10.1109/TPAMI.2013.213
    [58]
    D. Wang, H. C. Lu, and M. H. Yang, “Online object tracking with sparse prototypes,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 314–325, Jan. 2012.
    [59]
    R. T. Collins, Y. X. Liu, and M. Leordeanu, “Online selection of discriminative tracking features,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1631–1643, Oct. 2005. doi: 10.1109/TPAMI.2005.205
    [60]
    S. Duffner and C. Garcia, “Pixeltrack: A fast adaptive algorithm for tracking non-rigid objects,” in Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 2480–2487.
    [61]
    C. G. Ertler, H. Possegger, M. Opitz, and H. Bischof, “Pedestrian detection in RGB-D images from an elevated viewpoint,” in Proc. 22nd Computer Vision Winter Workshop, Wien, Austria, 2017.
    [62]
    J. W. Perng, T. Y. Wang, Y. W. Hsu, and B. F. Wu, “The design and implementation of a vision-based people counting system in buses,” in Proc. Int. Conf. System Science and Engineering, Puli, China, 2016, pp. 1–3.
    [63]
    P. Vera, S. Monjaraz, and J. Salas, “Counting pedestrians with a zenithal arrangement of depth cameras,” Machine Vision and Applications, vol. 27, no. 2, pp. 303–315, Feb. 2016. doi: 10.1007/s00138-015-0739-1
    [64]
    Y. W. Pang, Y. Yuan, X. L. Li, and J. Pan, “Efficient hog human detection,” Signal Process., vol. 91, no. 4, pp. 773–781, Apr. 2011. doi: 10.1016/j.sigpro.2010.08.010
    [65]
    T. W. Choi, D. H. Kim, and K. H. Kim, “Human detection in top-view depth image,” Contemporary Engineering Sciences, vol. 9, no. 11, pp. 547–552, 2016.
    [66]
    I. Ahmed, M. Ahmad, A. Adnan, A. Ahmad, and M. Khan, “Person detector for different overhead views using machine learning,” Int. J. Mach. Learn. Cyber., vol. 10, no. 10, pp. 2657–2668, Nov. 2019. doi: 10.1007/s13042-019-00950-5
    [67]
    I. Ahmed, A. Ahmad, F. Piccialli, A. K. Sangaiah, and G. Jeon, “A robust features-based person tracker for overhead views in industrial environment,” IEEE Internet Things J., vol. 5, no. 3, pp. 1598–1605, Jun. 2018. doi: 10.1109/JIOT.2017.2787779
    [68]
    K. Ullah, I. Ahmed, M. Ahmad, A. U. Rahman, M. Nawaz, and A. Adnan, “Rotation invariant person tracker using top view,” J. Ambient Intell. Humaniz. Comput., 2019. DOI: 10.1007/s12652-019-01526-5.
    [69]
    I. Ahmed, M. Ahmad, M. Nawaz, K. Haseeb, S. Khan, and G. Jeon, “Efficient topview person detector using point based transformation and lookup table,” Comput. Commun., vol. 147, pp. 188–197, Nov. 2019. doi: 10.1016/j.comcom.2019.08.015
    [70]
    R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 580–587.
    [71]
    K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sept. 2015. doi: 10.1109/TPAMI.2015.2389824
    [72]
    S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 91–99.
    [73]
    T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
    [74]
    T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 936–944.
    [75]
    J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6517–6525.
    [76]
    J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018.
    [77]
    J. L. Fan, W. Xu, Y. Wu, and Y. H. Gong, “Human tracking using convolutional neural networks,” IEEE Trans. Neural Netw., vol. 21, no. 10, pp. 1610–1623, Oct. 2010. doi: 10.1109/TNN.2010.2066286
    [78]
    H. M. Lu, T. Uemura, D. Wang, J. H. Zhu, Z. Huang, and H. Kim, “Deep-sea organisms tracking using dehazing and deep learning,” Mobile Netw. Appl., vol. 25, no. 3, pp. 1008–1015, Jun. 2020. doi: 10.1007/s11036-018-1117-9
    [79]
    J. Zhang, S. Yang, C. Bo, and H. Lu, “Single stage vehicle logo detector based on multi-scale prediction,” Trans. Information and Systems, vol. E103, no. 10, 2020.
    [80]
    B. N. Zhong, H. X. Yao, S. Chen, R. R. Ji, T. J. Chin, and H. Z. Wang, “Visual tracking via weakly supervised learning from multiple imperfect oracles,” Pattern Recognit., vol. 47, no. 3, pp. 1395–1410, Mar. 2014. doi: 10.1016/j.patcog.2013.10.002
    [81]
    S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learning discriminative saliency map with convolutional neural network,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 597–606.
    [82]
    N. Y. Wang and D. Y. Yeung, “Learning a deep compact image representation for visual tracking,” in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, USA, 2013, pp. 809–817.
    [83]
    N. Y. Wang, S. Y. Li, A. Gupta, and D. Y. Yeung, “Transferring rich feature hierarchies for robust visual tracking,” arXiv preprint arXiv: 1501.04587, 2015.
    [84]
    G. H. Ning, Z. Zhang, C. Huang, X. B. Ren, H. H. Wang, C. H. Cai, and Z. H. He, “Spatially supervised recurrent convolutional neural networks for visual object tracking,” in Proc. IEEE Int. Symp. Circuits and Systems, Baltimore, USA, 2017, pp. 1–4.
    [85]
    N. Y. Wang and D. Y. Yeung, “Ensemble-based tracking: Aggregating crowdsourced structured time series data,” in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 1107–1115.
    [86]
    J. Kuen, K. M. Lim, and C. P. Lee, “Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle,” Pattern Recognit., vol. 48, no. 10, pp. 2964–2982, Oct. 2015. doi: 10.1016/j.patcog.2015.02.012
    [87]
    Z. Cui, S. T. Xiao, J. S. Feng, and S. C. Yan, “Recurrently target-attending tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 1449–1458.
    [88]
    J. Y. Gao, T. Z. Zhang, X. S. Yang, and C. S. Xu, “Deep relative tracking,” IEEE Trans. Image Process., vol. 26, no. 4, pp. 1845–1858, Apr. 2017. doi: 10.1109/TIP.2017.2656628
    [89]
    D. W. Du, Y. K. Qi, H. Y. Yu, Y. F. Yang, K. W. Duan, G. R. Li, W. G. Zhang, Q. M. Huang, and Q. Tian, “The unmanned aerial vehicle benchmark: Object detection and tracking,” in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 375–391.
    [90]
    P. F. Zhu, L. Y. Wen, D. W. Du, et al., “Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 437–468.
    [91]
    Y. K. Qi, S. P. Zhang, W. G. Zhang, L. Su, Q. M. Huang, and M. H. Yang, “Learning attribute-specific representations for visual tracking,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, 2019, pp. 8835–8842.
    [92]
    M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M. Alrubaian, and G. Fortino, “Facial expression recognition utilizing local direction-based robust features and deep belief network,” IEEE Access, vol. 5, pp. 4525–4536, Mar. 2017. doi: 10.1109/ACCESS.2017.2676238
    [93]
    M. Ahmad, I. Ahmed, and A. Adnan, “Overhead view person detection using YOLO,” in Proc. IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2019, pp. 627–633.
    [94]
    M. Ahmad, I. Ahmed, K. Ullah, and M. Ahmad, “A deep neural network approach for top view people detection and counting,” in Proc. IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2019, pp. 1082–1088.
    [95]
    D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 2155–2162.
    [96]
    H. Grabner, M. Grabner, and H. Bischof, “Real-time tracking via on-line boosting,” in Proc. British Machine Vision Conf., Edinburgh, UK, 2006, pp. 6.
    [97]
    B. Babenko, M. H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 983–990.
    [98]
    J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “Exploiting the circulant structure of tracking-by-detection with kernels,” in Proc. 12th European Conf. Computer Vision, Florence, Italy, 2012, pp. 702–715.
    [99]
    Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1409–1422, Jul. 2012. doi: 10.1109/TPAMI.2011.239
    [100]
    Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-backward error: Automatic detection of tracking failures,” in Proc. 20th Int. Conf. Pattern Recognition, Istanbul, Turkey, 2010, 2756–2759.
    [101]
    D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 749–765.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(17)  / Tables(4)

    Article Metrics

    Article views (9905) PDF downloads(297) Cited by()

    Highlights

    • Collaborative surveillance framework is presented for multiple object tracking and detection.
    • Framework consists of a smart camera, visual processing unit, & deep learning models.
    • Generalization performance of detection models has been investigated for top view.
    • Object tracking is performed by combining detection models with tracking algorithms.
    • Comparison of six tracking algorithms, and detection models, have also been made.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return