A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation

Jiahong Jiang; Nan Xia; Siyao Zhou

doi:10.1109/JAS.2024.124953

Volume 12 Issue 4

Apr. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(4): 789-805

J. Jiang, N. Xia, and S. Zhou, “A multi-type feature fusion network based on importance weighting for occluded human pose estimation,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 4, pp. 789–805, Apr. 2025. doi: 10.1109/JAS.2024.124953

Citation:

J. Jiang, N. Xia, and S. Zhou, “A multi-type feature fusion network based on importance weighting for occluded human pose estimation,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 4, pp. 789–805, Apr. 2025. doi: 10.1109/JAS.2024.124953

Citation:

PDF( 14584 KB)

A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation

doi: 10.1109/JAS.2024.124953

Funds: This work was supported by Ministry of Education Industry-University Cooperation and Collaborative Education Project (China) (220603231024713)

More Information

Abstract

Abstract

Human pose estimation is a challenging task in computer vision. Most algorithms perform well in regular scenes, but lack good performance in occlusion scenarios. Therefore, we propose a multi-type feature fusion network based on importance weighting, which consists of three modules. In the first module, we propose a multi-resolution backbone with two feature enhancement sub-modules, which can extract features from different scales and enhance the feature expression ability. In the second module, we enhance the expressiveness of keypoint features by suppressing obstacle features and compensating for the unique and shared attributes of keypoints and topology. In the third module, we perform importance weighting on the adjacency matrix to enable it to describe the correlation among nodes, thereby improving the feature extraction ability. We conduct comparative experiments on the keypoint detection datasets of common objects in Context 2017 (COCO2017), COCO-Wholebody and CrowdPose, achieving the accuracy of 78.9%, 67.1% and 77.6%, respectively. Additionally, a series of ablation experiments are designed to show the performance of our work. Finally, we present the visualization of different scenarios to verify the effectiveness of our work.
- Human keypoint detection,
- human pose estimation,
- importance weighting,
- multi-type feature fusion,
- occlusion environments

FullText(HTML)

References(52)

References

[1]	K. Lee, W. Kim, and S. Lee, “From human pose similarity metric to 3D human pose estimator: Temporal propagating LSTM networks,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 2, pp. 1781–1797, Feb. 2023. doi: 10.1109/TPAMI.2022.3164344
[2]	J. H. White and R. W. Beard, “An iterative pose estimation algorithm based on epipolar geometry with application to multi-target tracking,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 4, pp. 942–953, Jul. 2020. doi: 10.1109/JAS.2020.1003222
[3]	Y. Wu, H. Ding, M. Gong, A. K. Qin, W. Ma, Q. Miao, and K. C. Tan, “Evolutionary multiform optimization with two-stage bidirectional knowledge transfer strategy for point cloud registration,” IEEE Trans. Evol. Comput., vol. 28, no. 1, pp. 62–76, Feb. 2024. doi: 10.1109/TEVC.2022.3215743
[4]	Y. Wu, J. Liu, Y. Yuan, X. Hu, X. Fan, K. Tu, M. Gong, Q. Miao, and W. Ma, “Correspondence-free point cloud registration via feature interaction and dual branch [application notes],” IEEE Comput. Intell. Mag., vol. 18, no. 4, pp. 66–79, Nov. 2023. doi: 10.1109/MCI.2023.3304144
[5]	S. Kreiss, L. Bertoni, and A. Alahi, “OpenPifPaf: Composite fields for semantic keypoint detection and spatio-temporal association,” IEEE Trans. Intell. Transp. Syst., vol. 23, no. 8, pp. 13498–13511, Aug. 2022. doi: 10.1109/TITS.2021.3124981
[6]	Z. Huo, H. Jin, Y. Qiao, and F. Luo, “Deep high-resolution network with double attention residual blocks for human pose estimation,” IEEE Access, vol. 8, pp. 224947–224957, Jan. 2020. doi: 10.1109/ACCESS.2020.3044885
[7]	N. Saini, E. Bonetto, E. Price, A. Ahmad, and M. J. Black, “AirPose: Multi-view fusion network for aerial 3D human pose and shape estimation,” IEEE Robot. Autom. Lett., vol. 7, no. 2, pp. 4805–4812, Apr. 2022. doi: 10.1109/LRA.2022.3145494
[8]	S. Sun, R. Liu, and X. Yang, “Depth-hand: 3D Hand keypoint detection with dense depth estimation,” IEEE Signal Process. Lett., vol. 30, pp. 962–966, 2023.
[9]	Y. Gao, Z. Kuang, G. Li, W. Zhang, and L. Lin, “Hierarchical reasoning network for human-object interaction detection,” IEEE Trans. Image Process., vol. 30, pp. 8306–8317, 2021.
[10]	Y. Lu, G. Chen, C. Pang, H. Zhang, and B. Zhang, “Subject-specific human modeling for human pose estimation,” IEEE Trans. Hum. Mach. Syst., vol. 53, no. 1, pp. 54–64, Feb. 2023. doi: 10.1109/THMS.2022.3195952
[11]	Y. Yuan, Y. Wu, X. Fan, M. Gong, W. Ma, and Q. Miao, “EGST: Enhanced geometric structure transformer for point cloud registration,” IEEE Trans. Vis. Comput. Graph., vol. 30, no. 9, pp. 6222–6234, Sep. 2024. doi: 10.1109/TVCG.2023.3329578
[12]	B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” in Proc. 15th European Conf. Computer Vision, Munich, Germany, 2018, pp. 472–487.
[13]	S. Peng, X. Zhou, Y. Liu, H. Lin, Q. Huang, and H. Bao, “PVNet: Pixel-wise voting network for 6DoF object pose estimation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 6, pp. 3212–3223, Jun. 2022. doi: 10.1109/TPAMI.2020.3047388
[14]	J. Wang, K. Sun, T. Cheng, B. Jiang, C. Deng, Y. Zhao, D. Liu, Y. Mu, M. Tan, X. Wang, W. Liu, and B. Xiao, “Deep high-resolution representation learning for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, pp. 3349–3364, Oct. 2021. doi: 10.1109/TPAMI.2020.2983686
[15]	T. Zhang, J. Lian, J. Wen, and C. L. P. Chen, “Multi-person pose estimation in the wild: Using adversarial method to train a top-down pose estimation network,” IEEE Trans. Syst. Man Cybern. Syst., vol. 53, no. 7, pp. 3919–3929, Jul. 2023. doi: 10.1109/TSMC.2023.3234611
[16]	L. Zhou, Y. Chen, and J. Wang, “Progressive direction-aware pose gram mar for human pose estimation,” IEEE Trans. Biom. Behav. Identity Sci., vol. 5, no. 4, pp. 593–605, Oct. 2023. doi: 10.1109/TBIOM.2023.3315509
[17]	G. Kim, H. Kim, K. Kong, J.-W. Song, and S.-J. Kang, “Human body-aware feature extractor using attachable feature corrector for human pose estimation,” IEEE Trans. Multimedia, vol. 25, pp. 5789–5799, Jan. 2023. doi: 10.1109/TMM.2022.3199098
[18]	Z. Zhang, M. Liu, J. Shen, Y. Cheng, and S. Wang, “Lightweight whole-body human pose estimation with two-stage refinement training strategy,” IEEE Trans. Hum. Mach. Syst., vol. 54, no. 1, pp. 121–130, Feb. 2024. doi: 10.1109/THMS.2024.3349652
[19]	A. M. Hafiz, M. Hassaballah, A. Alqahtani, S. Alsubai, and M. A. Hameed, “Reinforcement learning with an ensemble of binary action deep Q-networks,” Comput. Syst. Sci. Eng., vol. 46, no. 3, pp. 2651–2666, Apr. 2023. doi: 10.32604/csse.2023.031720
[20]	Y. Liu and J. Hua, “L-HRNet: A Lightweight high-resolution network for human pose estimation,” in Proc. 8th Int. Conf. Intelligent Informatics and Biomedical Sciences, Okinawa, Japan, 2023, pp. 219–224.
[21]	Z. Cao, G. Hidalgo, T. Simon, S.-E. Wei, and Y. Sheikh, “OpenPose: Realtime multi-person 2D pose estimation using part affinity fields,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 172–186, Jan. 2021. doi: 10.1109/TPAMI.2019.2929257
[22]	Z. Zhang, Y. Luo, and J. Gou, “Double anchor embedding for accurate multi-person 2D pose estimation,” Image Vis. Comput., vol. 111, p. 104198, Jul. 2021. doi: 10.1016/j.imavis.2021.104198
[23]	L. Ke, M.-C. Chang, H. Qi, and S. Lyu, “DetPoseNet: Improving multi-person pose estimation via coarse-pose filtering,” IEEE Trans. Image Process., vol. 31, pp. 2782–2795, Apr. 2022. doi: 10.1109/TIP.2022.3161081
[24]	H.-S. Fang, J. Li, H. Tang, C. Xu, H. Zhu, Y. Xiu, Y.-L. Li, and C. Lu, “AlphaPose: Whole-body regional multi-person pose estimation and tracking in real-time,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 6, pp. 7157–7173, Jun. 2023. doi: 10.1109/TPAMI.2022.3222784
[25]	Y.-J. Wang, Y.-M. Luo, G.-H. Bai, and J.-M. Guo, “UformPose: A U-shaped hierarchical multi-scale keypoint-aware framework for human pose estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 4, pp. 1697–1709, Apr. 2023. doi: 10.1109/TCSVT.2022.3213206
[26]	L. Zhao, J. Xu, C. Gong, J. Yang, W. Zuo, and X. Gao, “Learning to acquire the quality of human pose estimation,” IEEE Trans. Circuits Syst. Video Technol., vol. 31, no. 4, pp. 1555–1568, Apr. 2021. doi: 10.1109/TCSVT.2020.3005522
[27]	M. Ghafoor and A. Mahmood, “Quantification of occlusion handling capability of a 3D human pose estimation framework,” IEEE Trans. Multimedia, vol. 25, pp. 3311–3318, Jan. 2023. doi: 10.1109/TMM.2022.3158068
[28]	M. C. F. Macedo and A. L. Apolinário, “Occlusion handling in augmented reality: Past, present and future,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 2, pp. 1590–1609, Feb. 2023. doi: 10.1109/TVCG.2021.3117866
[29]	D. Poux, B. Allaert, N. Ihaddadene, I. M. Bilasco, C. Djeraba, and M. Bennamoun, “Dynamic facial expression recognition under partial occlusion with optical flow reconstruction,” IEEE Trans. Image Process., vol. 31, pp. 446–457, Jan. 2022. doi: 10.1109/TIP.2021.3129120
[30]	Z. Su, L. Xu, D. Zhong, Z. Li, F. Deng, S. Quan, and L. Fang, “RobustFusion: Robust volumetric performance reconstruction under human-object interactions from monocular RGBD stream,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 5, pp. 6196–6213, May 2023.
[31]	G. A. Wang, S. Yang, H. Liu, Z. Wang, Y. Yang, S. Wang, G. Yu, E. Zhou, and J. Sun, “High-order information matters: Learning relation and topology for occluded person Re-identification,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 6448–6457.
[32]	Q. Li, Z. Zhang, F. Zhang, and F. Xiao, “HRNeXt: High-resolution context network for crowd pose estimation,” IEEE Trans. Multimedia, vol. 25, pp. 1521–1528, Jan. 2023. doi: 10.1109/TMM.2023.3248144
[33]	Y. Liu, “Study on human pose estimation based on channel and spatial attention,” in Proc. 3rd Int. Conf. Consumer Electronics and Computer Engineering, Guangzhou, China, 2023, pp. 47–50.
[34]	W. Yu, Y. Li, R. Wang, W. Cao, and W. Xiang, “PCFN: Progressive cross-modal fusion network for human pose transfer,” IEEE Trans. Circuits Syst. Video Technol., vol. 33, no. 7, pp. 3369–3382, Jul. 2023. doi: 10.1109/TCSVT.2022.3233060
[35]	V. Crescitelli, A. Kosuge, and T. Oshima, “POISON: Human pose estimation in insufficient lighting conditions using sensor fusion,” IEEE Trans. Instrum. Meas., vol. 70, p. 2504408, 2021.
[36]	S. Kim, S. Kang, H. Choi, S. S. Kim, and K. Seo, “Keypoint aware robust representation for transformer-based re-identification of occluded person,” IEEE Signal Process. Lett., vol. 30, pp. 65–69, 2023.
[37]	S. W. Chu, C. Zhang, Y. Song, and W. Cai, “Channel-position self-attention with query refinement skeleton graph neural network in human pose estimation,” in Proc. IEEE Int. Conf. Image Processing, Bordeaux, France, 2022, pp. 971–975.
[38]	L. Rui, Y. Gao, and H. Ren, “EDite-HRNet: Enhanced dynamic lightweight high-resolution network for human pose estimation,” IEEE Access, vol. 11, pp. 95948–95957, Aug. 2023. doi: 10.1109/ACCESS.2023.3310817
[39]	X.-W. Yu and G.-S. Chen, “HRPoseFormer: High-resolution transformer for human pose estimation via multi-scale token aggregation,” in Proc. IEEE 16th Int. Conf. Solid-State & Integrated Circuit Technology, Nanjing, China, 2022, pp. 1–3.
[40]	J. Li and M. Wang, “Multi-person pose estimation with accurate heatmap regression and greedy association,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 8, pp. 5521–5535, Aug. 2022. doi: 10.1109/TCSVT.2022.3153044
[41]	T. Zou, G. Chen, Z. Li, W. He, S. Qu, S. Gu, and A. Knoll, “KAM-Net: Keypoint-aware and keypoint-matching network for vehicle detection from 2-D point cloud,” IEEE Trans. Artif. Intell., vol. 3, no. 2, pp. 207–217, Apr. 2022. doi: 10.1109/TAI.2021.3112945
[42]	B. A. Pearlmutter and P. Sanatchandran, “Comments on “Dynamic programming approach to optimal weight selection in multilayer neural networks” [with reply],” IEEE Trans. Neural Netw., vol. 3, no. 6, pp. 1028–1029, Nov. 1992. doi: 10.1109/72.165609
[43]	L. Duan, F. Duan, F. Chapeau-Blondeau, and D. Abbott, “Noise-boosted backpropagation learning of feedforward threshold neural networks for function approximation,” IEEE Trans. Instrum. Meas., vol. 70, p. 1010612, 2021.
[44]	C. Li, G. Chen, G. Liang, and Z. Y. Dong, “A novel high-performance deep learning framework for load recognition: Deep-shallow model based on fast backpropagation,” IEEE Trans. Power Syst., vol. 37, no. 3, pp. 1718–1729, May 2022. doi: 10.1109/TPWRS.2021.3114416
[45]	Z.-C. Fan, T.-S. T. Chan, Y.-H. Yang, and J.-S. R. Jang, “Backpropagation with N-D vector-valued neurons using arbitrary bilinear products,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 7, pp. 2638–2652, Jul. 2020.
[46]	R. Wang, C. Huang and X. Wang, “Global relation reasoning graph convolutional networks for human pose estimation,” IEEE Access, vol. 8, pp. 38472–38480, Feb. 2020. doi: 10.1109/ACCESS.2020.2973039
[47]	X. Yang, S. Li, S. Sun, and J. Yan, “Anti-occlusion infrared aerial target recognition with multisemantic graph skeleton model,” IEEE Trans. Geosci. Remote Sens., vol. 60, p. 5629813, 2022.
[48]	S. Zhang, W. Zhao, Z. Guan, X. Peng, and J. Peng, “Keypoint-graph-driven learning framework for object pose estimation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 1065–1073.
[49]	T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
[50]	L. Xu, S. Jin, W. Liu, C. Qian, W. Ouyang, P. Luo, and X. Wang, “ZoomNAS: Searching for whole-body human pose estimation in the wild,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 4, pp. 5296–5313, Apr. 2023.
[51]	J. Li, C. Wang, H. Zhu, Y. Mao, H.-S. Fang, and C. Lu, “CrowdPose: Efficient crowded scenes pose estimation and a new benchmark,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 10855–10864.
[52]	X. Jiang, H. Tao, J.-N. Hwang, and Z. J. Fang, “A multiscale coarse-to-fine human pose estimation network with hard keypoint mining,” IEEE Trans. Syst. Man Cybern. Syst., vol. 54, no. 3, pp. 1730–1741, Mar. 2024. doi: 10.1109/TSMC.2023.3328876

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(9) / Tables(8)

Get Citation

PDF

XML

Article Metrics

Article views (65) PDF downloads(12)

A Multi-Type Feature Fusion Network Based on Importance Weighting for Occluded Human Pose Estimation

doi: 10.1109/JAS.2024.124953

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content