IEEE/CAA Journal of Automatica Sinica
Citation: | Wenjin Zhang, Jiacun Wang and Fangping Lan, "Dynamic Hand Gesture Recognition Based on Short-Term Sampling Neural Networks," IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 110-120, Jan. 2021. doi: 10.1109/JAS.2020.1003465 |
[1] |
T. Starner, J. Weaver, and A. Pentland, “Real-time American sign language recognition using desk and wearable computer based video,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 20, no. 12, pp. 1371–1375, Dec. 1998. doi: 10.1109/34.735811
|
[2] |
H. Cooper, B. Holt, and R. Bowden, “Sign language recognition,” in Visual Analysis of Humans: Looking at People, T. B. Moeslund, A. Hilton, V. Krüger, and L. Sigal, Eds. London, UK: Springer, 2011, pp. 539–562.
|
[3] |
J. S. Sonkusare, N. B. Chopade, R. Sor, and S. L. Tade, “A review on hand gesture recognition system,” in Proc. Int. Conf. Computing Communication Control and Automation, Pune, India, 2015, pp. 790–794.
|
[4] |
L. Dipietro, A. M. Sabatini, and P. Dario, “A survey of glove-based systems and their applications,” IEEE Trans. Syst.,Man,Cybern.,Part C, vol. 38, no. 4, pp. 461–482, Jul. 2008. doi: 10.1109/TSMCC.2008.923862
|
[5] |
B. K. Chakraborty, D. Sarma, M. K. Bhuyan, and K. F. MacDorman, “Review of constraints on vision-based gesture recognition for human-computer interaction,” in IET Comput. Vis., vol. 12, no. 1, pp. 3–15, Feb. 2018. doi: 10.1049/iet-cvi.2017.0052
|
[6] |
C. Zhu, J. Y. Yang, Z. P. Shao, and C. P. Liu, “Vision based hand gesture recognition using 3D shape context,” IEEE/CAA J. Autom. Sinica, DOI: 10.1109/JAS.2019.1911534.
|
[7] |
X. H. Yuan, L. B. Kong, D. C. Feng, and Z. C. Wei, “Automatic feature point detection and tracking of human actions in time-of-flight videos,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 4, pp. 677–685, Sept. 2017. doi: 10.1109/JAS.2017.7510625
|
[8] |
B. Hu and J. C. Wang, “Deep learning based hand gesture recognition and UAV flight controls,” in Proc. 24th Int. Conf. Automation and Computing, Newcastle upon Tyne, United Kingdom, 2018, pp. 1–6.
|
[9] |
G. Marin, F. Dominio, and P. Zanuttigh, “Hand gesture recognition with leap motion and kinect devices,” in Proc. IEEE Int. Conf. Image Processing, Paris, France, 2014, pp. 1565–1569.
|
[10] |
K. Simonyan and A. Zisserman, “Two-stream convolutional networks for action recognition in videos,” in Proc. 27th Int. Conf. Neural Information Processing Systems, Lake Tahoe, USA, 2014, pp. 568–576.
|
[11] |
M. Asadi-Aghbolaghi, A. Clapés, M. Bellantonio, H. J. Escalante, V. Ponce-López, X. Baró, I. Guyon, S. Kasaei, and S. Escalera, “Deep learning for action and gesture recognition in image sequences: A survey,” in Gesture Recognition, S. Escalera, I. Guyon, and V. Athitsos, Eds. Cham, Germany: Springer, 2017.
|
[12] |
Y. Zhu, Z. Z. Lan, S. Newsam, and A. Hauptmann, “Hidden two-stream convolutional networks for action recognition”, in Proc. 14th Asian Conf. Computer Vision, Perth, Australia, 2018.
|
[13] |
C. Feichtenhofer, A. Pinz, and A. Zisserman, “Convolutional two-stream network fusion for video action recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 1933–1941.
|
[14] |
R. Girdhar, D. Ramanan, A. Gupta, J. Sivic, and B. Russell, “ActionVLAD: Learning spatio-temporal aggregation for action classification,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 3165–3174.
|
[15] |
L. M. Wang, Y. J. Xiong, Z. Wang, Y. Qiao, D. H. Lin, X. O. Tang, and L. Van Gool, “Temporal segment networks: Towards good practices for deep action recognition,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016.
|
[16] |
L. M. Wang, Y. J. Xiong, Z. Wang, Y. Qiao, D. H. Lin, X. O. Tang, and L. Van Gool, “Temporal segment networks for action recognition in videos,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 11, pp. 2740–2755, Nov. 2019. doi: 10.1109/TPAMI.2018.2868668
|
[17] |
H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recurrent neural network architectures for large scale acoustic modeling,” in Proc. 15th Annual Conf. Int. Speech Communication Association: Celebrating the Diversity of Spoken Languages, Singapore, 2014.
|
[18] |
Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp. 2278–2324, Nov. 1998. doi: 10.1109/5.726791
|
[19] |
M. Shilman, Z. L. Wei, S. Raghupathy, P. Simard, and D. Jones, “Discerning structure from freeform handwritten notes,” in Proc. 7th Int. Conf. Document Analysis and Recognition, Edinburgh, UK, 2003, pp. 60–65.
|
[20] |
B. Schölkopf, J. Platt, and T. Hofmann, “Efficient learning of sparse representations with an energy-based model,” in Advances in Neural Information Processing Systems 19: Proc. the 2006 Conf., MIT Press, 2007.
|
[21] |
D. Ciregan, U. Meier, and J. Schmidhuber, “Multi-column deep neural networks for image classification,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Providence, USA, 2012, pp. 3642–3649.
|
[22] |
K. Bong, S. Choi, C. Kim, and H. J. Yoo, “Low-power convolutional neural network processor for a face-recognition system,” IEEE Micro, vol. 37, no. 6, pp. 30–38, Nov.-Dec. 2017. doi: 10.1109/MM.2017.4241350
|
[23] |
K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
|
[24] |
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997. doi: 10.1162/neco.1997.9.8.1735
|
[25] |
R. Haridy. (2017, Aug. 22). Microsoft’s speech recognition system is now as good as a human. Microsoft, Redmond, Washington. [Online] Available: https://newatlas.com/microsoft-speech-recognition-equals-humans/50999/
|
[26] |
F. Beaufays. (2015, Aug.). The neural networks behind Google Voice transcription. Google, Mountain View, CA. [Online]. Available: https://ai.googleblog.com/2015/08/the-neural-networks-behind-google-voice.html
|
[27] |
H. Sak, A. Senior, K. Rao, F. Beaufays, and J. Schalkwyk. (2015, Sept.). Google voice search: Faster and more accurate. Google, Mountain View, CA. [Online]. Available: https://ai.googleblog.com/2015/09/google-voice-search-faster-and-more.html
|
[28] |
C. Smith. (2016, Jun. 13). iOS 10: Siri now works in third-party apps, comes with extra AI features. Apple Inc., Cupertino, CA. [Online]. Available: https://bgr.com/2016/06/13/ios-10-siri-third-party-apps/
|
[29] |
W. Vogels. (2016, Nov. 30). Bringing the Magic of Amazon AI and Alexa to Apps on AWS. Amazon, Seattle, Washington, [Online]. Available: https://www.allthingsdistributed.com/2016/11/amazon-ai-and-alexa-for-all-aws-apps.html
|
[30] |
AlphaStar team: Mastering the Real-Time Strategy Game StarCraft II. DeepMind, London, UK. [Online]. Available: https://deepmind.com/blog/article/alphastar-mastering-real-time-strategy-game-starcraft-ii
|
[31] |
C. K. Li, Y. H. Hou, P. C. Wang, and W. Q. Li, “Multiview-based 3-D action recognition using deep networks,” IEEE Trans. Human-Machine Systems, vol. 49, no. 1, pp. 95–104, Feb. 2019. doi: 10.1109/THMS.2018.2883001
|
[32] |
D. Tran, L. Bourdev, R. Fergus, L. Torresani, and M. Paluri, “Learning spatiotemporal features with 3D convolutional networks,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 4489–4497.
|
[33] |
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei, “Large-scale video classification with convolutional neural networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Columbus, USA, 2014, pp. 1725–1732.
|
[34] |
C. J. Tsai, Y. W. Tsai, S. L. Hsu, and Y. C. Wu, “Synthetic training of deep CNN for 3D hand gesture identification,” in Proc. Int. Conf. Control, Artificial Intelligence, Robotics & Optimization, Prague, Czech Republic, 2017, pp. 165–170.
|
[35] |
C. Y. Li, X. Zhang, and L. W. Jin, “LPSNet: A novel log path signature feature based hand gesture recognition framework,” in Proc. IEEE Int. Conf. Computer Vision Workshops, Venice, Italy, 2017, pp. 631–639.
|
[36] |
O. Köpüklü, N. Köse, and G. Rigoll, “Motion fused frames: Data level fusion strategy for hand gesture recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018, pp. 2184–21848.
|
[37] |
O. Köpüklü, A. Gunduz, N. Kose, and G. Rigoll, “Real-time hand gesture detection and classification using convolutional neural networks,” in Proc. 14th IEEE Int. Conf. Automatic Face & Gesture Recognition, Lille, France, 2019, pp. 1–8.
|
[38] |
P. C. Wang, W. Q. Li, P. Ogunbona, J. Wan, and S. Escalera, Sergio. (2019)., “RGB-D-based human motion recognition with deep learning: A survey,” Comput. Vis. Image Understanding, vol. 171, pp. 118–139, Jun. 2018. doi: 10.1016/j.cviu.2018.04.007
|
[39] |
O. Köpüklü, N. Kose, A. Gunduz, and G. Rigoll, “Resource Efficient 3D Convolutional Neural Networks,” in Proc. IEEE/CVF Int. Conf. Computer Vision Workshop, Seoul, Korea (South), 2019, pp. 1910–1919.
|
[40] |
W. J. Zhang and J. C. Wang, “Dynamic hand gesture recognition based on 3D convolutional neural network models,” in Proc. IEEE 16th Int. Conf. Networking, Sensing and Control, Banff, Canada, 2019, pp. 224–229.
|
[41] |
J. Y. H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici, “Beyond short snippets: Deep networks for video classification,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 4694–4702.
|
[42] |
S. J. Pan and Q. Yang, “A survey on transfer learning,” IEEE Trans. Knowl. Data Eng., vol. 22, no. 10, pp. 1345–1359, Oct. 2010. doi: 10.1109/TKDE.2009.191
|
[43] |
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 2818–2826.
|
[44] |
N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” J. Mach. Learn. Res., vol. 15, pp. 1929–1958, 2014.
|
[45] |
J. Materzynska, G. Berger, I. Bax, and R. Memisevic, “The jester dataset: A large-scale video dataset of human gestures,” in Proc. IEEE/CVF Int. Conf. Computer Vision Workshop, Seoul, Korea (South), 2019, pp. 2874–2882.
|
[46] |
Twentybn, Twenty Billion Neurons Inc. Toronto, Canada. (2017) [Online]. Available: https://20bn.com/datasets/jester.
|
[47] |
H. Wang, D. Oneata, J. Verbeek, and C. Schmid, “A robust and efficient video representation for action recognition,” Int. J. Comput. Vis., vol. 119, no. 3, pp. 219–238, Oct.–Dec. 2016. doi: 10.1007/s11263-015-0846-5
|
[48] |
P. Molchanov, X. D. Yang, S. Gupta, K. Kim, S. Tyree, and J. Kautz, “Online detection and classification of dynamic hand gestures with recurrent 3D convolutional neural network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 4207–4215.
|
[49] |
S. C. Gao, M. C. Zhou, Y. R. Wang, J. J. Cheng, H. Yachi, and J. H. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Netw. Learn. Syst., vol. 30, no. 2, pp. 601–614, Feb. 2019. doi: 10.1109/TNNLS.2018.2846646
|
[50] |
J. J. Wang and T. Kumbasar, “Parameter optimization of interval Type-2 fuzzy neural networks based on PSO and BBBC methods,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 247–257, Jan. 2019. doi: 10.1109/JAS.2019.1911348
|