Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning

Imran Ahmed; Sadia Din; Gwanggil Jeon; Francesco Piccialli; Giancarlo Fortino

doi:10.1109/JAS.2020.1003453

Volume 8 Issue 7

Jul. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(7): 1253-1270

I. Ahmed, S. D. D. n, G. Jeon, F. Piccialli, and G. Fortino, "Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253-1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453

Citation:

I. Ahmed, S. D. D. n, G. Jeon, F. Piccialli, and G. Fortino, "Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253-1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453

Citation:

I. Ahmed, S. D. D. n, G. Jeon, F. Piccialli, and G. Fortino, "Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning," IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253-1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453

PDF( 1803 KB)

Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning

doi: 10.1109/JAS.2020.1003453

1.
Center of excellence in Information Technology, Institute of Management Sciences, Peshawar 25000, Pakistan
2.
Department of Information and Communication Engineering, Yeungnam University, South Korea
3.
School of Electronic Engineering, Xidian University, Xi’an 710071, China
4.
Department of Embedded Systems Engineering, Incheon National University, Incheon 22012, Korea
5.
Department of Mathematics and Applications “R. Caccioppoli”, University of Naples Federico II, Napoli 80138, Italy
6.
Department of Informatics, Modeling, Electronics and Systems, University of Calabria, Rende, CS 87036, Italy

Funds: This work was supported by the Framework of International Cooperation Program managed by the National Research Foundation of Korea (2019K1A3A1A8011295711)

More Information

Author Bio:
Imran Ahmed (SM’21) is currently working as Assistant Professor at the Institute of Management Sciences, Pakistan. He received the Ph.D. degree with the computer science major from the University of Southampton, UK. He did his MS-IT from the Institute of Management Sciences, Pakistan, with major research in computer vision. He received the B.Sc. degree in computer science and mathematics from Edwardes College Peshawar, Pakistan, and the M.Sc. degree in computer science from the University of Peshawar, Pakistan. He has several research interests such as deep learning, machine learning, data science, computer vision, feature extraction, digital image and signal processing, medical image processing, bio-metrics, pattern recognition, and data mining. He has attended several national & international conferences in these areas. He has been acting as a Reviewer in journals such as IEEE Industrial Electronics, IEEE Access, Journal of Ambient Intelligence (Elsevier), etc

Sadia Din is currently working as an Assistant Professor in the Department of Information and Communication Engineering, Yeungnam University, South Korea. Previously, she was working as a Post-Doctoral Researcher in Kyungpook National University, South Korea (Mar. 2020–Aug. 2020). She received the Ph.D. degree in data science and master degree in computer science from Kyungpook National University, South Korea, and Abasyn University, Islamabad Pakistan, in 2020 and 2015, respectively. During the Ph.D., she was working on various projects including artificial learning, machine/deep learning, internet of things, and big data analytics. In 2015, she was Visiting Researcher at CCMP Lab, Kyungpook National University, South Korea, where she was working on big data and internet of things. Her area of research was big data, 5G, IOT and data science. She has published some highly reputed conferences such as IEEE LCN, ACM SAC, ICC, Globecom and some SCIE journal at the beginning of her research career. She was the chair for couple of sessions in IEEE LCN 2017 in Singapore. She was the Chair for the IEEE International Conf. on Local Computer Networks (LCN’18). She is serving as a Guest Editor injournal of Wiley

Gwanggil Jeon (SM’20) received the B.S., M.S., and Ph.D. (summa cum laude) degrees from the Department of Electronics and Computer Engineering, Hanyang University, Korea, in 2003, 2005, and 2008, respectively. From Sept. 2009 to Aug. 2011, he was with the School of Information Technology and Engineering, University of Ottawa, Canada, as a Post-Doctoral Fellow. From Sept. 2011 to Feb. 2012, he was with the Graduate School of Science and Technology, Niigata University, Japan, as an Assistant Professor. From Dec. 2014 to Feb. 2015 and Jun. 2015 to Jul. 2015, he was a Visiting Scholar at Centre de Mathématiques et Leurs Applications (CMLA), École Normale Supérieure Paris-Saclay (ENS-Cachan), France. From 2019 to 2020, he was a Prestigious Visiting Professor at Dipartimento di Informatica, Università degli Studi di Milano Statale, Italy. He is currently a Full Professor at Xidian University, China, and at Incheon National University, Korea. He was a Visiting Professor at Sichuan University, China, Universitat Pompeu Fabra, Spain, Xinjiang University, China, King Mongkut’s Institute of Technology Ladkrabang, Thailand, and University of Burgundy, France. Dr. Jeon is an Associate Editor of Sustainable Cities and Society, IEEE Access, Real-Time Image Processing, Journal of System Architecture, and MDPI Remote Sensing. Dr. Jeon was a recipient of the IEEE Chester Sall Award in 2007 and the ETRI Journal Paper Award in 2008

Francesco Piccialli (M’20) is currently Assistant Professor (tenure track) of computer science at the Department of Mathematics and Applications “R. Caccioppoli” (DMA) of the University of Naples Federico II (UNINA). He received the laurea degree (BSc+MSc) in computer science and the Ph.D. in computational and computer sciences from the University of Naples Federico II. He is the Founder and Scientific Director of the M.O.D.A.L. Research Group that is engaged in cutting-edge on novel methodologies, applications and services in data science and machine learning fields and their emerging application domains. He has been involved in research and development projects in the research areas of machine learning, deep learning, data science, internet of things. He is author of many papers (90+) in international conferences and top-level journals (IEEE, Springer, ACM, and Elsevier)

Giancarlo Fortino (SM’12) is a Full Professor of computer engineering at the Department of Informatics, Modeling, Electronics, and Systems of the University of Calabria (Unical), Italy. He received the Ph.D. degree in computer engineering from Unical in 2000. He is also Distinguished Professor at Wuhan University of Technology and Huazhong Agricultural University, High-End Expert at HUST, Senior Research Fellow at the ICAR-CNR Institute, and CAS PIFI Visiting Scientist at SIAT-Shenzhen. He is the Director of the SPEME lab at Unical as well as Co-Chair of Joint labs on IoT established between Unical and WUT and SMU and HZAU chinese universities, respectively. His research interests include agent-based computing, wireless (body) sensor networks, and IoT. He is author of 450+ papers in int’l journals, conferences and books. He is (Founding) Series Editor of IEEE Press Book Series on Human-Machine Systems and EiC of Springer Internet of Things series and AE of many int’l journals such as IEEE TAC, IEEE THMS, IEEE IoTJ, IEEE SJ, IEEE SMCM, IEEE OJEMB, IEEE OJCS, Information Fusion, JNCA, EAAI, etc. He organized as Chair many int’l workshops and conferences (100+), was involved in a huge number of int’l conferences/workshops (500+) as IPC Member, is/was Guest-Editor of many special issues (60+). He is Co-Founder and CEO of SenSysCal S.r.l., a Unical spinoff focused on innovative IoT systems. He is currently Member of the IEEE SMCS BoG and of the IEEE Press BoG, and Chair of the IEEE SMCS Italian Chapter
Corresponding author: Gwanggil Jeon, e-mail: gjeon@inu.ac.kr
Received Date: 2020-05-22
Revised Date: 2020-08-11
Accepted Date: 2020-08-28

Available Online: 2020-09-16

Abstract

Abstract

Collaborative Robotics is one of the high-interest research topics in the area of academia and industry. It has been progressively utilized in numerous applications, particularly in intelligent surveillance systems. It allows the deployment of smart cameras or optical sensors with computer vision techniques, which may serve in several object detection and tracking tasks. These tasks have been considered challenging and high-level perceptual problems, frequently dominated by relative information about the environment, where main concerns such as occlusion, illumination, background, object deformation, and object class variations are commonplace. In order to show the importance of top view surveillance, a collaborative robotics framework has been presented. It can assist in the detection and tracking of multiple objects in top view surveillance. The framework consists of a smart robotic camera embedded with the visual processing unit. The existing pre-trained deep learning models named SSD and YOLO has been adopted for object detection and localization. The detection models are further combined with different tracking algorithms, including GOTURN, MEDIANFLOW, TLD, KCF, MIL, and BOOSTING. These algorithms, along with detection models, help to track and predict the trajectories of detected objects. The pre-trained models are employed; therefore, the generalization performance is also investigated through testing the models on various sequences of top view data set. The detection models achieved maximum True Detection Rate 93% to 90% with a maximum 0.6% False Detection Rate. The tracking results of different algorithms are nearly identical, with tracking accuracy ranging from 90% to 94%. Furthermore, a discussion has been carried out on output results along with future guidelines.
- Collaborative robotics,
- deep learning,
- object detection and tracking,
- top view,
- video surveillance

FullText(HTML)

References(101)

References

[1]	L. G. Clift, J. Lepley, H. Hagras, and A. F. Clark, “Autonomous computational intelligence-based behaviour recognition in security and surveillance,” in Proc. SPIE 10802, Counterterrorism, Crime Fighting, Forensics, and Surveillance Technologies II, Berlin, Germany, 2018, pp. 108020L.
[2]	H. M. Hodgetts, F. Vachon, C. Chamberland, and S. Tremblay, “See no evil: Cognitive challenges of security surveillance and monitoring,” J. Appl. Res. Mem. Cognit., vol. 6, no. 3, pp. 230–243, Sept. 2017. doi: 10.1016/j.jarmac.2017.05.001
[3]	P. Bansal and K. M. Kockelman, “Are we ready to embrace connected and self-driving vehicles? A case study of Texans” Transportation, vol. 45, no. 2, pp. 641–675, Mar. 2018. doi: 10.1007/s11116-016-9745-z
[4]	M. Haghighat and M. Abdel-Mottaleb, “Low resolution face recognition in surveillance systems using discriminant correlation analysis,” in Proc. 12th IEEE Int. Conf. Automatic Face & Gesture Recognition, Washington, USA, 2017, pp. 912–917.
[5]	Y. Jeong, S. Son, E. Jeong, and B. Lee, “An integrated self-diagnosis system for an autonomous vehicle based on an IoT gateway and deep learning,” Appl. Sci., vol. 8, no. 7, Article No. 1164, Jul. 2018. doi: 10.3390/app8071164
[6]	M. Chen, J. Zhou, G. M. Tao, J. Yang, and L. Hu, “Wearable affective robot,” IEEE Access, vol. 6, pp. 64766–64776, Oct. 2018. doi: 10.1109/ACCESS.2018.2877919
[7]	M. Chen and Y. X. Hao, “Label-less learning for emotion cognition,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2430–2440, Jul. 2020.
[8]	M. Chen, Y. Cao, R. Wang, Y. Li, D. Wu, and Z. C. Liu, “Deepfocus: Deep encoding brainwaves and emotions with multi-scenario behavior analytics for human attention enhancement,” IEEE Netw., vol. 33, no. 6, pp. 70–77, Nov.–Dec. 2019. doi: 10.1109/MNET.001.1900054
[9]	Z. X. Zou, Z. W. Shi, Y. H. Guo, and J. P. Ye, “Object detection in 20 years: A survey,” arXiv preprint arXiv: 1905.05055, 2019.
[10]	R. Yao, G. S. Lin, S. X. Xia, J. Q. Zhao, and Y. Zhou, “Video object segmentation and tracking: A survey,” ACM Trans. Intell. Syst. Technol., vol. 11, no. 4, pp. 1–47, May 2020.
[11]	K. A. Joshi and D. G. Thakore, “A survey on moving object detection and tracking in video surveillance system,” Int. J. Soft Comput. Eng., vol. 2, no. 3, pp. 44–48, Jul. 2012.
[12]	S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, S. L. Hicks, and P. H. S. Torr, “Struck: Structured output tracking with kernels,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 2096–2109, Oct. 2016. doi: 10.1109/TPAMI.2015.2509974
[13]	F. Yang, H. Lu, W. Zhang, and G. Yang, “Visual tracking via bag of features,” IET Image Process., vol. 6, no. 2, pp. 115–128, Mar. 2012. doi: 10.1049/iet-ipr.2010.0127
[14]	N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005.
[15]	J. L. Fan, X. H. Shen, and Y. Wu, “Scribble tracker: A matting-based approach for robust tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 8, pp. 1633–1644, Aug. 2012. doi: 10.1109/TPAMI.2011.257
[16]	X. Li, A. Dick, C. H. Shen, A. Van den Hengel, and H. Z. Wang, “Incremental learning of 3D-DCT compact representations for robust visual tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 4, pp. 863–881, Apr. 2013. doi: 10.1109/TPAMI.2012.166
[17]	H. S. Parekh, D. G. Thakore, and U. K. Jaliya, “A survey on object detection and tracking methods,” Int. J. Innovat. Res. Comput. Commun. Eng., vol. 2, no. 2, pp. 2970–2978, Feb. 2014.
[18]	P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun, “Overfeat: Integrated recognition, localization and detection using convolutional networks,” arXiv preprint arXiv: 1312.6229, 2013.
[19]	G. Fortino, W. Russo, C. Savaglio, W. M. Shen, and M. C. Zhou, “Agent-oriented cooperative smart objects: From IoT system design to implementation,” IEEE Trans. Syst. Man Cybern. Syst., vol. 48, no. 11, pp. 1939–1956, Nov. 2018. doi: 10.1109/TSMC.2017.2780618
[20]	R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1440–1448.
[21]	S. Gidaris and N. Komodakis, “Object detection via a multi-region and semantic segmentation-aware CNN model,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1134–1142.
[22]	J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 779–788.
[23]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 21–37.
[24]	J. F. Dai, Y. Li, K. M. He, and J. Sun, “R-FCN: Object detection via region-based fully convolutional networks,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 379–387.
[25]	L. Bertinetto, J. Valmadre, J. F. Henriques, A. Vedaldi, and P. H. S. Torr, “Fully-convolutional siamese networks for object tracking,” in Proc. European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 850–865.
[26]	A. W. M. Smeulders, D. M. Chu, R. Cucchiara, S. Calderara, A. Dehghan, and M. Shah, “Visual tracking: An experimental survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1442–1468, Jul. 2014. doi: 10.1109/TPAMI.2013.230
[27]	G. Smart, N. Deligiannis, R. Surace, V. Loscri, G. Fortino, and Y. Andreopoulos, “Decentralized time-synchronized channel swapping for ad hoc wireless networks,” IEEE Trans. Vehicular Technol., vol. 65, no. 10, pp. 8538–8553, Oct. 2016. doi: 10.1109/TVT.2015.2509861
[28]	Y. Wu, J. Lim, and M. H. Yang, “Object tracking benchmark,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1834–1848, Sept. 2015. doi: 10.1109/TPAMI.2014.2388226
[29]	G. Ciaparrone, F. L. Sánchez, S. Tabik, L. Troiano, R. Tagliaferri, and F. Herrera, “Deep learning in video multi-object tracking: A survey,” Neurocomputing, vol. 381, pp. 61–88, Mar. 2020. doi: 10.1016/j.neucom.2019.11.023
[30]	T. Kong, A. B. Yao, Y. R. Chen, and F. C. Sun, “Hypernet: Towards accurate region proposal generation and joint object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 845–853.
[31]	G. Zhu, F. Porikli, and H. D. Li, “Robust visual tracking with deep convolutional neural network based object proposals on pets,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, USA, 2016, pp. 1265–1272.
[32]	M. D. Breitenstein, F. Reichlin, B. Leibe, E. Koller-Meier, and L. van Gool, “Online multiperson tracking-by-detection from a single, uncalibrated camera,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 9, pp. 1820–1833, Sept. 2011. doi: 10.1109/TPAMI.2010.232
[33]	K. Potdar, C. D. Pai, and S. Akolkar, “A convolutional neural network based live object recognition system as blind aid,” arXiv preprint arXiv: 1811.10399, 2018.
[34]	A. Vavilin and K. H. Jo, “Motion analysis for scenes with multiple moving objects,” IEEJ Trans. Electron. Inf. Syst., vol. 133, no. 1, pp. 40–46, Jan. 2013.
[35]	G. Khan, Z. Tariq, and M. U. G. Khan, “Multi-person tracking based on faster R-CNN and deep appearance features,” in Visual Object Tracking with Deep Neural Networks, P. L. Mazzeo, S. Ramakrishnan, and P. Spagnolo, Eds. IntechOpen, 2019.
[36]	I. Ahmed and J. N. Carter, “A robust person detector for overhead views,” in Proc. 21st Int. Conf. Pattern Recognition, Tsukuba, Japan, 2012, pp. 1483–1486.
[37]	I. Ahmed and A. Adnan, “A robust algorithm for detecting people in overhead views,” Cluster Comput., vol. 21, no. 1, pp. 633–654, Mar. 2018. doi: 10.1007/s10586-017-0968-3
[38]	M. Ahmad, I. Ahmed, K. Ullah, I. Khan, and A. Adnan, “Robust background subtraction based person’s counting from overhead view,” in Proc. 9th IEEE Annu. Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2018, pp. 746–752.
[39]	H. Tayara and K. T. Chong, “Object detection in very high-resolution aerial images using one-stage densely connected feature pyramid network,” Sensors, vol. 18, no. 10, Article No. 3341, Oct. 2018. doi: 10.3390/s18103341
[40]	A. van Etten, “You only look twice: Rapid multi-scale object detection in satellite imagery,” arXiv preprint arXiv: 1805.09512, 2018.
[41]	M. Sigalas, M. Pateraki, and P. Trahanias, “Full-body pose tracking—the top view reprojection approach,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1569–1582, Aug. 2016. doi: 10.1109/TPAMI.2015.2502582
[42]	C. Migniot and F. Ababsa, “Hybrid 3D–2D human tracking in a top view,” J. Real-Time Image Process., vol. 11, no. 4, pp. 769–784, Dec. 2016. doi: 10.1007/s11554-014-0429-7
[43]	I. Ahmed, S. Din, G. Jeon, and F. Piccialli, “Exploring deep learning models for overhead view multiple object detection,” IEEE Internet Things J., vol. 7, no. 7, pp. 5737–5744, Jul. 2020. doi: 10.1109/JIOT.2019.2951365
[44]	M. Ahmad, I. Ahmed, K. Ullah, I. Khan, A. Khattak, and A. Adnan, “Energy efficient camera solution for video surveillance,” Int. J. Adv. Comput. Sci. Appl., vol. 10, no. 3, pp. 522–529, 2019.
[45]	S. R. Zhou, M. L. Ke, J. Qiu, and J. Wang, “A survey of multi-object video tracking algorithms,” in Int. Conf. Applications and Techniques in Cyber Security and Intelligence, J. Abawajy, K. K. R. Choo, R. Islam, Z. Xu, and M. Atiquzzaman, Eds. Cham, Germany: Springer, 2018, pp. 351–369.
[46]	P. X. Li, D. Wang, L. J. Wang, and H. C. Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognit., vol. 76, pp. 323–338, Apr. 2018. doi: 10.1016/j.patcog.2017.11.007
[47]	D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577, May 2003. doi: 10.1109/TPAMI.2003.1195991
[48]	M. Danelljan, F. S. Khan, M. Felsberg, and J. van de Weijer, “Adaptive color attributes for real-time visual tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1090–1097.
[49]	D. A. Ross, J. Lim, R. S. Lin, and M. H. Yang, “Incremental learning for robust visual tracking,” Int. J. Comput. Vis., vol. 77, no. 1-3, pp. 125–141, May 2008. doi: 10.1007/s11263-007-0075-7
[50]	Q. Wang, F. Chen, W. L. Xu, and M. H. Yang, “Object tracking via partial least squares analysis,” IEEE Trans. Image Process., vol. 21, no. 10, pp. 4454–4465, Oct. 2012. doi: 10.1109/TIP.2012.2205700
[51]	Y. Lu, T. F. Wu, and S. C. Zhu, “Online object tracking, learning, and parsing with and-or graphs,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, 2014, pp. 3462–3469.
[52]	R. Yao, Q. F. Shi, C. H. Shen, Y. N. Zhang, and A. van den Hengel, “Part-based visual tracking with online latent structural learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Portland, USA, 2013, pp. 2363–2370.
[53]	Y. C. Bai and M. Tang, “Robust tracking via weakly supervised ranking SVM,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 1854–1861.
[54]	J. Santner, C. Leistner, A. Saffari, T. Pock, and H. Bischof, “Prost: Parallel robust online simple tracking,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 723–730.
[55]	J. Gall, A. Yao, N. Razavi, L. van Gool, and V. Lempitsky, “Hough forests for object detection, tracking, and action recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2188–2202, Nov. 2011. doi: 10.1109/TPAMI.2011.70
[56]	L. Zhang and L. van der Maaten, “Preserving structure in model-free tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 4, pp. 756–769, Apr. 2013.
[57]	J. Kwon and K. M. Lee, “Tracking by sampling and integratingmultiple trackers,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 7, pp. 1428–1441, Jul. 2014. doi: 10.1109/TPAMI.2013.213
[58]	D. Wang, H. C. Lu, and M. H. Yang, “Online object tracking with sparse prototypes,” IEEE Trans. Image Process., vol. 22, no. 1, pp. 314–325, Jan. 2012.
[59]	R. T. Collins, Y. X. Liu, and M. Leordeanu, “Online selection of discriminative tracking features,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 27, no. 10, pp. 1631–1643, Oct. 2005. doi: 10.1109/TPAMI.2005.205
[60]	S. Duffner and C. Garcia, “Pixeltrack: A fast adaptive algorithm for tracking non-rigid objects,” in Proc. IEEE Int. Conf. Computer Vision, Sydney, Australia, 2013, pp. 2480–2487.
[61]	C. G. Ertler, H. Possegger, M. Opitz, and H. Bischof, “Pedestrian detection in RGB-D images from an elevated viewpoint,” in Proc. 22nd Computer Vision Winter Workshop, Wien, Austria, 2017.
[62]	J. W. Perng, T. Y. Wang, Y. W. Hsu, and B. F. Wu, “The design and implementation of a vision-based people counting system in buses,” in Proc. Int. Conf. System Science and Engineering, Puli, China, 2016, pp. 1–3.
[63]	P. Vera, S. Monjaraz, and J. Salas, “Counting pedestrians with a zenithal arrangement of depth cameras,” Machine Vision and Applications, vol. 27, no. 2, pp. 303–315, Feb. 2016. doi: 10.1007/s00138-015-0739-1
[64]	Y. W. Pang, Y. Yuan, X. L. Li, and J. Pan, “Efficient hog human detection,” Signal Process., vol. 91, no. 4, pp. 773–781, Apr. 2011. doi: 10.1016/j.sigpro.2010.08.010
[65]	T. W. Choi, D. H. Kim, and K. H. Kim, “Human detection in top-view depth image,” Contemporary Engineering Sciences, vol. 9, no. 11, pp. 547–552, 2016.
[66]	I. Ahmed, M. Ahmad, A. Adnan, A. Ahmad, and M. Khan, “Person detector for different overhead views using machine learning,” Int. J. Mach. Learn. Cyber., vol. 10, no. 10, pp. 2657–2668, Nov. 2019. doi: 10.1007/s13042-019-00950-5
[67]	I. Ahmed, A. Ahmad, F. Piccialli, A. K. Sangaiah, and G. Jeon, “A robust features-based person tracker for overhead views in industrial environment,” IEEE Internet Things J., vol. 5, no. 3, pp. 1598–1605, Jun. 2018. doi: 10.1109/JIOT.2017.2787779
[68]	K. Ullah, I. Ahmed, M. Ahmad, A. U. Rahman, M. Nawaz, and A. Adnan, “Rotation invariant person tracker using top view,” J. Ambient Intell. Humaniz. Comput., 2019. DOI: 10.1007/s12652-019-01526-5.
[69]	I. Ahmed, M. Ahmad, M. Nawaz, K. Haseeb, S. Khan, and G. Jeon, “Efficient topview person detector using point based transformation and lookup table,” Comput. Commun., vol. 147, pp. 188–197, Nov. 2019. doi: 10.1016/j.comcom.2019.08.015
[70]	R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for accurate object detection and semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 580–587.
[71]	K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1904–1916, Sept. 2015. doi: 10.1109/TPAMI.2015.2389824
[72]	S. Q. Ren, K. M. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2015, pp. 91–99.
[73]	T. Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[74]	T. Y. Lin, P. Dollár, R. Girshick, K. M. He, B. Hariharan, and S. Belongie, “Feature pyramid networks for object detection,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 936–944.
[75]	J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017, pp. 6517–6525.
[76]	J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018.
[77]	J. L. Fan, W. Xu, Y. Wu, and Y. H. Gong, “Human tracking using convolutional neural networks,” IEEE Trans. Neural Netw., vol. 21, no. 10, pp. 1610–1623, Oct. 2010. doi: 10.1109/TNN.2010.2066286
[78]	H. M. Lu, T. Uemura, D. Wang, J. H. Zhu, Z. Huang, and H. Kim, “Deep-sea organisms tracking using dehazing and deep learning,” Mobile Netw. Appl., vol. 25, no. 3, pp. 1008–1015, Jun. 2020. doi: 10.1007/s11036-018-1117-9
[79]	J. Zhang, S. Yang, C. Bo, and H. Lu, “Single stage vehicle logo detector based on multi-scale prediction,” Trans. Information and Systems, vol. E103, no. 10, 2020.
[80]	B. N. Zhong, H. X. Yao, S. Chen, R. R. Ji, T. J. Chin, and H. Z. Wang, “Visual tracking via weakly supervised learning from multiple imperfect oracles,” Pattern Recognit., vol. 47, no. 3, pp. 1395–1410, Mar. 2014. doi: 10.1016/j.patcog.2013.10.002
[81]	S. Hong, T. You, S. Kwak, and B. Han, “Online tracking by learning discriminative saliency map with convolutional neural network,” in Proc. 32nd Int. Conf. Machine Learning, Lille, France, 2015, pp. 597–606.
[82]	N. Y. Wang and D. Y. Yeung, “Learning a deep compact image representation for visual tracking,” in Proc. 26th Int. Conf. Neural Information Processing Systems, Lake Tahoe, USA, 2013, pp. 809–817.
[83]	N. Y. Wang, S. Y. Li, A. Gupta, and D. Y. Yeung, “Transferring rich feature hierarchies for robust visual tracking,” arXiv preprint arXiv: 1501.04587, 2015.
[84]	G. H. Ning, Z. Zhang, C. Huang, X. B. Ren, H. H. Wang, C. H. Cai, and Z. H. He, “Spatially supervised recurrent convolutional neural networks for visual object tracking,” in Proc. IEEE Int. Symp. Circuits and Systems, Baltimore, USA, 2017, pp. 1–4.
[85]	N. Y. Wang and D. Y. Yeung, “Ensemble-based tracking: Aggregating crowdsourced structured time series data,” in Proc. 31st Int. Conf. Machine Learning, Beijing, China, 2014, pp. 1107–1115.
[86]	J. Kuen, K. M. Lim, and C. P. Lee, “Self-taught learning of a deep invariant representation for visual tracking via temporal slowness principle,” Pattern Recognit., vol. 48, no. 10, pp. 2964–2982, Oct. 2015. doi: 10.1016/j.patcog.2015.02.012
[87]	Z. Cui, S. T. Xiao, J. S. Feng, and S. C. Yan, “Recurrently target-attending tracking,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 1449–1458.
[88]	J. Y. Gao, T. Z. Zhang, X. S. Yang, and C. S. Xu, “Deep relative tracking,” IEEE Trans. Image Process., vol. 26, no. 4, pp. 1845–1858, Apr. 2017. doi: 10.1109/TIP.2017.2656628
[89]	D. W. Du, Y. K. Qi, H. Y. Yu, Y. F. Yang, K. W. Duan, G. R. Li, W. G. Zhang, Q. M. Huang, and Q. Tian, “The unmanned aerial vehicle benchmark: Object detection and tracking,” in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 375–391.
[90]	P. F. Zhu, L. Y. Wen, D. W. Du, et al., “Visdrone-vdt2018: The vision meets drone video detection and tracking challenge results,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 437–468.
[91]	Y. K. Qi, S. P. Zhang, W. G. Zhang, L. Su, Q. M. Huang, and M. H. Yang, “Learning attribute-specific representations for visual tracking,” in Proc. AAAI Conf. Artificial Intelligence, vol. 33, 2019, pp. 8835–8842.
[92]	M. Z. Uddin, M. M. Hassan, A. Almogren, A. Alamri, M. Alrubaian, and G. Fortino, “Facial expression recognition utilizing local direction-based robust features and deep belief network,” IEEE Access, vol. 5, pp. 4525–4536, Mar. 2017. doi: 10.1109/ACCESS.2017.2676238
[93]	M. Ahmad, I. Ahmed, and A. Adnan, “Overhead view person detection using YOLO,” in Proc. IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2019, pp. 627–633.
[94]	M. Ahmad, I. Ahmed, K. Ullah, and M. Ahmad, “A deep neural network approach for top view people detection and counting,” in Proc. IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conf., New York City, USA, 2019, pp. 1082–1088.
[95]	D. Erhan, C. Szegedy, A. Toshev, and D. Anguelov, “Scalable object detection using deep neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 2155–2162.
[96]	H. Grabner, M. Grabner, and H. Bischof, “Real-time tracking via on-line boosting,” in Proc. British Machine Vision Conf., Edinburgh, UK, 2006, pp. 6.
[97]	B. Babenko, M. H. Yang, and S. Belongie, “Visual tracking with online multiple instance learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 983–990.
[98]	J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “Exploiting the circulant structure of tracking-by-detection with kernels,” in Proc. 12th European Conf. Computer Vision, Florence, Italy, 2012, pp. 702–715.
[99]	Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1409–1422, Jul. 2012. doi: 10.1109/TPAMI.2011.239
[100]	Z. Kalal, K. Mikolajczyk, and J. Matas, “Forward-backward error: Automatic detection of tracking failures,” in Proc. 20th Int. Conf. Pattern Recognition, Istanbul, Turkey, 2010, 2756–2759.
[101]	D. Held, S. Thrun, and S. Savarese, “Learning to track at 100 fps with deep regression networks,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 749–765.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(17) / Tables(4)

Get Citation

PDF

XML

Article Metrics

Article views (10081) PDF downloads(321)

Highlights

Collaborative surveillance framework is presented for multiple object tracking and detection.
Framework consists of a smart camera, visual processing unit, & deep learning models.
Generalization performance of detection models has been investigated for top view.
Object tracking is performed by combining detection models with tracking algorithms.
Comparison of six tracking algorithms, and detection models, have also been made.

Towards Collaborative Robotics in Top View Surveillance: A Framework for Multiple Object Tracking by Detection Using Deep Learning

doi: 10.1109/JAS.2020.1003453

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content