Wenjin Zhang, Jiacun Wang and Fangping Lan, "Dynamic Hand Gesture Recognition Based on Short-Term Sampling Neural Networks," IEEE/CAA J. Autom. Sinica, vol. 8, no. 1, pp. 110-120, Jan. 2021. doi: 10.1109/JAS.2020.1003465
Dynamic Hand Gesture Recognition Based on Short-Term Sampling Neural Networks

doi: 10.1109/JAS.2020.1003465
  • Hand gestures are a natural way for human-robot interaction. Vision based dynamic hand gesture recognition has become a hot research topic due to its various applications. This paper presents a novel deep learning network for hand gesture recognition. The network integrates several well-proved modules together to learn both short-term and long-term features from video inputs and meanwhile avoid intensive computation. To learn short-term features, each video input is segmented into a fixed number of frame groups. A frame is randomly selected from each group and represented as an RGB image as well as an optical flow snapshot. These two entities are fused and fed into a convolutional neural network (ConvNet) for feature extraction. The ConvNets for all groups share parameters. To learn long-term features, outputs from all ConvNets are fed into a long short-term memory (LSTM) network, by which a final classification result is predicted. The new model has been tested with two popular hand gesture datasets, namely the Jester dataset and Nvidia dataset. Comparing with other models, our model produced very competitive results. The robustness of the new model has also been proved with an augmented dataset with enhanced diversity of hand gestures.


    • This study designed a new deep learning neural network model that integrates several state-of-the-art techniques for action recognition to tackle the complexity and performance issues in dynamic hand gesture recognition. Short-term video sampling, feature fusion, ConvNets with transfer learning and LSTMs are the key components of the new model.
    • This study developed a novel approach to “zoom-out” the existing dataset to increase the diversity of the dataset and thus ensure the robustness of a trained model.
    • Compared with existing models, the proposed network achieved a very competitive recognition performance on the two most popular hand gesture datasets, Jester and Nvidia.


