Vision Based Hand Gesture Recognition Using 3D Shape Context

Chen Zhu; Jianyu Yang; Zhanpeng Shao; Chunping Liu

doi:10.1109/JAS.2019.1911534

Volume 8 Issue 9

Sep. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(9): 1600-1613

C. Zhu, J. Y. Yang, Z. P. Shao, and C. P. Liu, "Vision Based Hand Gesture Recognition Using 3D Shape Context," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1600-1613, Sep. 2021. doi: 10.1109/JAS.2019.1911534

Citation:

C. Zhu, J. Y. Yang, Z. P. Shao, and C. P. Liu, "Vision Based Hand Gesture Recognition Using 3D Shape Context," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1600-1613, Sep. 2021. doi: 10.1109/JAS.2019.1911534

Citation:

PDF( 1339 KB)

Vision Based Hand Gesture Recognition Using 3D Shape Context

doi: 10.1109/JAS.2019.1911534

1.
School of Rail Transportation, Soochow University, Suzhou 215131, China
2.
College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China
3.
School of Computer Science and Technology, Soochow University, Suzhou 215006, China

Funds: This work was supported by the National Natural Science Foundation of China (61773272, 61976191), the Six Talent Peaks Project of Jiangsu Province, China (XYDXX-053), and Suzhou Research Project of Technical Innovation, Jiangsu, China (SYG201711)

More Information

Author Bio:
Chen Zhu received the B.E. degree from Soochow University in 2016. Currently, he is working toward the M.E. degree at the School of Rail Transportation, Soochow University. His research interests include computer vision, pattern recognition, and machine learning

Jianyu Yang (M’11) received the Ph.D. degree in computer vision jointly from the University of Science and Technology of China (USTC) and City University of Hong Kong (CityU), China, in 2012. Before that, he received the B.Sc. and B.Eng. degrees from USTC in 2006. He was a post-doctoral research Fellow at the School of Electrical and Electronics Engineering (EEE), Nanyang Technological University (NTU) in 2015. He is now an Associate Professor at the School of Rail Transportation, Soochow University. His research interests include computer vision, pattern recognition, and machine learning

Zhanpeng Shao (M’13) received the B.S. and M.S. degrees in mechanical engineering from Xi’an University of Technology in 2004 and 2007, respectively. He obtained the Ph.D. degree in computer vision from City University of Hong Kong, Hong Kong, China, in 2015. From 2015 to 2016, he was a Senior Research Associate with the City University of HK Shenzhen Research Institute. He joined Zhejiang University of Technology in 2016, where he is currently an Associate Professor with the College of Computer Science and Technology. His current research interests include computer vision, pattern recognition, machine learning, and robot sensing. He received the Best Conference Paper Award on ICMA 2014 and ICMA 2016

Chunping Liu received the Ph.D. degree in pattern recognition and artificial intelligence from Nanjing University of Science & Technology in 2002. She is now a Professor of School of Computer Science and Technology, Soochow University. Her research interests include computer vision, image analysis and recognition, in particular, in the domains of visual saliency detection, object detection and recognition, and scene understanding
Corresponding author: Jianyu Yang, e-mail: jyyang@suda.edu.cn
Received Date: 2019-01-24
Accepted Date: 2019-03-12

Available Online: 2019-05-17

Abstract

Abstract

Hand gesture recognition is a popular topic in computer vision and makes human-computer interaction more flexible and convenient. The representation of hand gestures is critical for recognition. In this paper, we propose a new method to measure the similarity between hand gestures and exploit it for hand gesture recognition. The depth maps of hand gestures captured via the Kinect sensors are used in our method, where the 3D hand shapes can be segmented from the cluttered backgrounds. To extract the pattern of salient 3D shape features, we propose a new descriptor–3D Shape Context, for 3D hand gesture representation. The 3D Shape Context information of each 3D point is obtained in multiple scales because both local shape context and global shape distribution are necessary for recognition. The description of all the 3D points constructs the hand gesture representation, and hand gesture recognition is explored via dynamic time warping algorithm. Extensive experiments are conducted on multiple benchmark datasets. The experimental results verify that the proposed method is robust to noise, articulated variations, and rigid transformations. Our method outperforms state-of-the-art methods in the comparisons of accuracy and efficiency.
- 3D shape context,
- depth map,
- hand shape segmentation,
- hand gesture recognition,
- human-computer interaction

FullText(HTML)

References(63)

References

[1]	A. Memo and P. Zanuttigh, “Head-mounted gesture controlled interface for human-computer interaction,” Multimed. Tools Appl., vol. 77, no. 1, pp. 27–53, Dec. 2018. doi: 10.1007/s11042-016-4223-3
[2]	A. Haria, A. Subramanian, N. Asokkumar, S. Poddar, and J. S. Nayak, “Hand gesture recognition for human computer interaction,” Proc. Comput. Sci., vol. 115, pp. 367–374, Dec. 2017. doi: 10.1016/j.procs.2017.09.092
[3]	R. R. Itkarkar and A. V. Nandi, “A survey of 2D and 3D imaging used in hand gesture recognition for human-computer interaction (HCI),” in Proc. IEEE Int. WIE Conf. Electrical and Computer Engineering, Pune, India, 2017, pp. 188−193.
[4]	B. K. Chakraborty, D. Sarma, M. K. Bhuyan, and K. F. MacDorman, “Review of constraints on vision-based gesture recognition for human-computer interaction,” IET Comput. Vis., vol. 12, no. 1, pp. 3–15, Feb. 2018. doi: 10.1049/iet-cvi.2017.0052
[5]	J. P. Wachs, M. Kölsch, H. Stern, and Y. Edan, “Vision-based hand-gesture applications,” Commun. ACM, vol. 54, no. 2, pp. 60–71, Feb. 2011. doi: 10.1145/1897816.1897838
[6]	H. F. Lv, “Research on the static hand gesture recognition base on convolutional neural network,” Mod. Comput., vol. 23, no. 27, pp. 44–46, 2018.
[7]	M. K. Bhuyan, D. Ghosh, and P. K. Bora, “Feature extraction from 2D gesture trajectory in dynamic hand gesture recognition,” in Proc. IEEE Conf. Cybernetics and Intelligent Systems, Bangkok, Thailand, 2006, pp. 1−6.
[8]	J. Pansare and M. Ingle, “2D hand gesture for numeric devnagari sign language analyzer based on two cameras,” in Proc. 8th Int. Conf. Intelligent Human Computer Interaction, Pilani, India, 2016, pp. 148−160.
[9]	P. X. Li, D. Wang, L. J. Wang, and H. C. Lu, “Deep visual tracking: Review and experimental comparison,” Pattern Recognit., vol. 76, pp. 323–338, Apr. 2018. doi: 10.1016/j.patcog.2017.11.007
[10]	H. X. Yang, L. Shao, F. Zheng, L. Wang, and Z. Song, “Recent advances and trends in visual tracking: A review,” Neurocomputing, vol. 74, no. 18, pp. 3823–3831, Nov. 2011. doi: 10.1016/j.neucom.2011.07.024
[11]	J. Shotton, T. Sharp, A. Kipman, A. Fitzgibbon, M. Finocchio, A. Blake, M. Cook, and R. Moore, “Real-time human pose recognition in parts from single depth images,” Commun. ACM, vol. 56, no. 1, pp. 116–124, Jan. 2013. doi: 10.1145/2398356.2398381
[12]	B. Y. Xu, Z. H. Zhou, J. C. Huang, and Y. Huang, “Static hand gesture recognition based on RGB-D image and arm removal,” in Proc. 14th Int. Symp. Neural Networks, Muroran, Hokkaido, Japan, 2017, pp. 180−187.
[13]	F. Wen, C. Q. Kang, L. W. Chen, H. Ding, K. Xu, and N. N. Wang, “Static hand gesture recognition based on RGBD data,” Comput. Moderniz., vol. 1, pp. 74–77, Jan. 2018.
[14]	P. Molchanov, S. Gupta, K. Kim, and J. Kautz, “Hand gesture recognition with 3D convolutional neural networks,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 1−7.
[15]	Y. W. He, J. Y. Yang, Z. P. Shao, and Y. F. Li, “Salient feature point selection for real time RGB-D hand gesture recognition,” in Proc. IEEE Int. Conf. Real-Time Computing and Robotics, Okinawa, Japan, 2017, pp. 103−108.
[16]	G. Marin, F. Dominio, and P. Zanuttigh, “Hand gesture recognition with jointly calibrated leap motion and depth sensor,” Multimed. Tools Appl., vol. 75, no. 22, pp. 14991–15015, 2016. doi: 10.1007/s11042-015-2451-6
[17]	C. Dong, M. C. Leu, and Z. Z. Yin, “American sign language alphabet recognition using microsoft kinect,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition Workshops, Boston, MA, USA, 2015, pp. 44−52.
[18]	C. Wang, Z. Liu, and S. C. Chan, “Superpixel-based hand gesture recognition with kinect depth camera,” IEEE Trans. Multimed., vol. 17, no. 1, pp. 29–39, Jan. 2015. doi: 10.1109/TMM.2014.2374357
[19]	G. M. Zhu, L. Zhang, L. Mei, J. Shao, J. Song, and P. Y. Shen, “Large-scale isolated gesture recognition using pyramidal 3D convolutional networks,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 19−24.
[20]	P. C. Wang, W. Q. Li, S. Liu, Z. M. Gao, C. Tang, and P. Ogunbona, “Large-scale isolated gesture recognition using convolutional neural networks,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 7−12.
[21]	Y. N. Li, Q. G. Miao, K. Tian, Y. Y. Fan, X. Xu, R. Li, and J. F. Song, “Large-scale gesture recognition with a fusion of RGB-D data based on the C3D model,” in Proc. 23rd Int. Conf. Pattern Recognition, Cancun, Mexico, 2017, pp. 25−30.
[22]	Z. Ren, J. S. Yuan, J. J. Meng, and Z. Y. Zhang, “Robust part-based hand gesture recognition using kinect sensor,” IEEE Trans. Multimed., vol. 15, no. 5, pp. 1110–1120, Aug. 2013. doi: 10.1109/TMM.2013.2246148
[23]	M. Elmezain, A. Al-Hamadi, J. Appenrodt, and B. Michaelis, “A hidden Markov model-based continuous gesture recognition system for hand motion trajectory,” in Proc. 19th Int. Conf. Pattern Recognition, Tampa, FL, USA, 2008, pp. 1−4.
[24]	J. Y. Yang, J. S. Yuan, and Y. F. Li, “Parsing 3D motion trajectory for gesture recognition,” J. Vis. Commun. Image Represent., vol. 38, pp. 627–640, Jul. 2016. doi: 10.1016/j.jvcir.2016.04.010
[25]	H. Tang, W. Wang, D. Xu, Y. Yan, and N. Sebe, “GestureGAN for hand gesture-to-gesture translation in the wild,” in Proc. 26th ACM Int. Conf. Multimedia, Seoul, Republic of Korea, 2018, pp. 774−782.
[26]	W. Z. Nai, Y. Liu, D. Rempel, and Y. T. Wang, “Fast hand posture classification using depth features extracted from random line segments,” Pattern Recognit., vol. 65, pp. 1–10, May 2017. doi: 10.1016/j.patcog.2016.11.022
[27]	C. Y. Zhang and Y. L. Tian, “Histogram of 3D facets: A depth descriptor for human action and hand gesture recognition,” Comput. Vis. Image Underst., vol. 139, pp. 29–39, Oct. 2015. doi: 10.1016/j.cviu.2015.05.010
[28]	Y. L. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. W. Wan, “3D object recognition in cluttered scenes with local surface features: A survey,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 11, pp. 2270–2287, Nov. 2014. doi: 10.1109/TPAMI.2014.2316828
[29]	N. Bayramoglu and A. A. Alatan, “Shape index SIFT: Range image recognition using local features,” in Proc. 20th Int. Conf. Pattern Recognition, Istanbul, Turkey, 2010, pp. 352−355.
[30]	J. Wan, G. D. Guo, and S. Z. Li, “Explore efficient local features from RGB-D data for one-shot learning gesture recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 8, pp. 1626–1639, Aug. 2016. doi: 10.1109/TPAMI.2015.2513479
[31]	A. I. Maqueda, C. R. del-Blanco, F. Jaureguizar, and N. García, “Human-computer interaction based on visual hand-gesture recognition using volumetric spatiograms of local binary patterns,” Comput. Vis. Image Underst., vol. 141, pp. 126–137, Dec. 2015. doi: 10.1016/j.cviu.2015.07.009
[32]	S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape contexts,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 24, no. 4, pp. 509–522, Apr. 2002. doi: 10.1109/34.993558
[33]	L. Rabiner and B. Juang, Fundamentals of Speech Recognition. Beijing, China: Tsinghua University Press, 1999.
[34]	A. Satorra and P. M. Bentler, “A scaled difference chi-square test statistic for moment structure analysis,” Psychometrika, vol. 66, no. 4, pp. 507–514, Dec. 2001. doi: 10.1007/BF02296192
[35]	G. Marin, F. Dominio, and P. Zanuttigh, “Hand gesture recognition with leap motion and kinect devices,” in Proc. IEEE Int. Conf. Image Processing, Paris, France, 2014, pp. 1565−1569.
[36]	A. Memo, L. Minto, and P. Zanuttigh, “Exploiting silhouette descriptors and synthetic data for hand gesture recognition,” in Smart Tools and Apps for Graphics - Eurographics Italian Chapter Conference, A. Giachetti, S. Biasotti, and M. Tarini, Eds. The Eurographics Association, 2015, pp. 15−23.
[37]	N. Pugeault and R. Bowden, “Spelling it out: Real-time ASL fingerspelling recognition,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Barcelona, Spain, 2011, pp. 1114−1119.
[38]	J. Wan, S. Z. Li, Y. B. Zhao, S. Zhou, I. Guyon, and S. Escalera, “ChaLearn looking at people RGB-D isolated and continuous datasets for gesture recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, NV, USA, 2016, pp. 761−769.
[39]	J. Suarez and R. R. Murphy, “Hand gesture recognition with depth images: A review,” in Proc. 21st IEEE Int. Symp. Robot and Human Interactive Communication, Paris, France, 2012, pp. 411−417.
[40]	H. Cheng, L. Yang, and Z. C. Liu, “Survey on 3D hand gesture recognition,” IEEE Trans. Circuits Syst. Video Technol., vol. 26, no. 9, pp. 1659–1673, Sept. 2016. doi: 10.1109/TCSVT.2015.2469551
[41]	C. Keskin, F. Kımathraç, Y. E. Kara, and L. Akarun, “Randomized decision forests for static and dynamic hand shape classification,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 2012, pp. 31−36.
[42]	C. Y. Zhang, X. D. Yang, and Y. L. Tian, “Histogram of 3D Facets: A characteristic descriptor for hand gesture recognition,” in Proc. 10th IEEE Int. Conf. Workshops on Automatic Face and Gesture Recognition, Shanghai, China, 2013, pp. 1−8.
[43]	J. Y. Yang, C. Zhu, and J. S. Yuan, “Real time hand gesture recognition via finger-emphasized multi-scale description,” in Proc. IEEE Int. Conf. Multimedia and Expo, Hong Kong, China, 2017, pp. 631−636.
[44]	J. Cheng, C. Xie, W. Bian, and D. C. Tao, “Feature fusion for 3D hand gesture recognition by learning a shared hidden space,” Pattern Recognit. Lett., vol. 33, no. 4, pp. 476–484, Mar. 2012. doi: 10.1016/j.patrec.2010.12.009
[45]	A. D. Wilson and A. F. Bobick, “Parametric hidden Markov models for gesture recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 9, pp. 884–900, Sept. 1999. doi: 10.1109/34.790429
[46]	J. J. LaViola, “An introduction to 3D gestural interfaces,” in Proc. ACM SIGGRAPH Courses on - SIGGRAPH’14, Vancouver, Canada, 2014, pp. 1−42.
[47]	H. B. Pang and Y. D. Ding, “Dynamic hand gesture recognition using kinematic features based on hidden Markov model,” Lecture Notes in Electrical Engineering, vol. 227, pp. 255−262, 2013.
[48]	C. Keskin, F. Kımathraç, Y. E. Kara, and L. Akarun, “Real time hand pose estimation using depth sensors,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Barcelona, Spain, 2011, pp. 1228−1234.
[49]	T. Arici, S. Celebi, A. S. Aydin, and T. T. Temiz, “Robust gesture recognition using feature pre-processing and weighted dynamic time warping,” Multimed. Tools Appl., vol. 72, no. 3, pp. 3045–3062, Jul. 2014. doi: 10.1007/s11042-013-1591-9
[50]	C. Keskin, A. T. Cemgil, and L. Akarun, “DTW based clustering to improve hand gesture recognition,” in Proc. 2nd Int. Workshop on Human Behavior Understanding, Amsterdam, The Netherlands, 2011, pp. 72−81.
[51]	S. D. Wu and Y. F. Li, “Flexible signature descriptions for adaptive motion trajectory representation, perception and recognition,” Pattern Recognit., vol. 42, no. 1, pp. 194–214, Jan. 2009. doi: 10.1016/j.patcog.2008.06.023
[52]	J. Y. Yang, H. X. Wang, J. S. Yuan, Y. F. Li, and J. Y. Liu, “Invariant multi-scale descriptor for shape representation, matching and retrieval,” Comput. Vis. Image Underst., vol. 145, pp. 43–58, Apr. 2016. doi: 10.1016/j.cviu.2016.01.005
[53]	X. Bai and L. J. Latecki, “Path similarity skeleton graph matching,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 30, no. 7, pp. 1282–1292, Jul. 2008. doi: 10.1109/TPAMI.2007.70769
[54]	H. B. Ling and D. W. Jacobs, “Shape classification using the inner-distance,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 2, pp. 286–299, Feb. 2007. doi: 10.1109/TPAMI.2007.41
[55]	A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, Nevada, 2012, pp. 1097−1105.
[56]	L. Q. Ma, X. Jia, Q. R. Sun, B. Schiele, T. Tuytelaars, and L. Van Goo, “Pose guided person image generation,” in Proc. 31st Conf. Neural Information Processing Systems, Long Beach, CA, USA, 2017, pp. 406−416.
[57]	Y. C. Yan, J. W. Xu, B. B. Ni, W. D. Zhang, and X. K. Yang, “Skeleton-aided articulated motion generation,” in Proc. 25th ACM Int. Conf. Multimedia, Mountain View, California, USA, 2017, pp. 199−207.
[58]	L. Q. Ma, Q. R. Sun, S. Georgoulis, L. Van Gool, B. Schiele, and M. Fritz, “Disentangled person image generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 99−108.
[59]	A. Siarohin, E. Sangineto, S. Lathuilière, and N. Sebe, “Deformable GANs for pose-based human image generation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 3408−3416.
[60]	A. Kuznetsova, L. Leal-Taixé, and B. Rosenhahn, “Real-time sign language recognition using a consumer depth camera,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Sydney, NSW, Australia, 2013, pp. 83−90.
[61]	I. Guyon, V. Athitsos, P. Jangyodsuk, H. J. Escalante, and B. Hamner, “Results and analysis of the ChaLearn gesture challenge 2012,” in Proc. Int. Workshop on Depth Image Analysis and Applications, Tsukuba, Japan, 2012, pp. 186−204.
[62]	S. I. Kang, A. Roh, and H. Hong, “Using depth and skin color for hand gesture classification,” in Proc. IEEE Int. Conf. Consumer Electronics, Las Vegas, NV, USA, 2011, pp. 155−156.
[63]	S. Mo, S. H. Cheng, and X. F. Xing, “Hand gesture segmentation based on improved kalman filter and TSL skin color model,” in Proc. Int. Conf. Multimedia Technology, Hangzhou, China, 2011, pp. 3543−3546.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(18) / Tables(9)

Get Citation

PDF

XML

Article Metrics

Article views (2319) PDF downloads(149)

Highlights

A new shape descriptor 3D-SC is proposed to represent 3D hand gesture
Both local shape feature and global shape distribution are included in multi-scales
This method outperforms state-of-the-art methods in both accuracy and efficiency
The proposed method is efficient enough for real-time applications

Vision Based Hand Gesture Recognition Using 3D Shape Context

doi: 10.1109/JAS.2019.1911534

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content