A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 8 Issue 6
Jun.  2021

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
A. A. M. Muzahid, W. G. Wan, F. Sohel, L. Y. Wu, and L. Hou, "CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition," IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177-1187, Jun. 2021. doi: 10.1109/JAS.2020.1003324
Citation: A. A. M. Muzahid, W. G. Wan, F. Sohel, L. Y. Wu, and L. Hou, "CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition," IEEE/CAA J. Autom. Sinica, vol. 8, no. 6, pp. 1177-1187, Jun. 2021. doi: 10.1109/JAS.2020.1003324

CurveNet: Curvature-Based Multitask Learning Deep Networks for 3D Object Recognition

doi: 10.1109/JAS.2020.1003324
Funds:  This paper was partially supported by a project of the Shanghai Science and Technology Committee (18510760300), Anhui Natural Science Foundation (1908085MF178), and Anhui Excellent Young Talents Support Program Project (gxyqZD2019069)
More Information
  • In computer vision fields, 3D object recognition is one of the most important tasks for many real-world applications. Three-dimensional convolutional neural networks (CNNs) have demonstrated their advantages in 3D object recognition. In this paper, we propose to use the principal curvature directions of 3D objects (using a CAD model) to represent the geometric features as inputs for the 3D CNN. Our framework, namely CurveNet, learns perceptually relevant salient features and predicts object class labels. Curvature directions incorporate complex surface information of a 3D object, which helps our framework to produce more precise and discriminative features for object recognition. Multitask learning is inspired by sharing features between two related tasks, where we consider pose classification as an auxiliary task to enable our CurveNet to better generalize object label classification. Experimental results show that our proposed framework using curvature vectors performs better than voxels as an input for 3D object classification. We further improved the performance of CurveNet by combining two networks with both curvature direction and voxels of a 3D object as the inputs. A Cross-Stitch module was adopted to learn effective shared features across multiple representations. We evaluated our methods using three publicly available datasets and achieved competitive performance in the 3D object recognition task.

     

  • loading
  • [1]
    J. Q. Gu, H. F. Hu, and H. X. Li, “Local robust sparse representation for face recognition with single sample per person,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 2, pp. 547–554, Mar. 2018. doi: 10.1109/JAS.2017.7510658
    [2]
    Y. Xing, C. Lv, L. Chen, H. J. Wang, H. Wang, D. P. Cao, E. Velenis, and F. Y. Wang, “Advances in vision-based lane detection: Algorithms, integration, assessment, and perspectives on ACP-based parallel vision,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 3, pp. 645–661, May 2018. doi: 10.1109/JAS.2018.7511063
    [3]
    A. Ioannidou, E. Chatzilari, S. Nikolopoulos, and I. Kompatsiaris, “Deep learning advances in computer vision with 3D data: A survey,” ACM Comput. Surv., vol. 50, no. 2, Article No. 20, Apr. 2017.
    [4]
    E. Ahmed, A. Saint, A. E. R. Shabayek, K. Cherenkova, R. Das, G. Gusev, D. Aouada, and B. Ottersten, “A survey on deep learning advances on different 3D data representations,” arXiv preprint arXiv: 1808.01462v2, 2019.
    [5]
    M. Rezaei, M. Rezaeian, V. Derhami, F. Sohel, and M. Bennamoun, “Deep learning-based 3D local feature descriptor from Mercator projections,” Comput. Aided Geom. Des., vol. 74, Article No. 101771, Oct. 2019. doi: 10.1016/j.cagd.2019.101771
    [6]
    Z. R. Wu, S. R. Song, A. Khosla, F. Yu, L. G. Zhang, X. O. Tang, and J. X. Xiao, “3D ShapeNets: A deep representation for volumetric shapes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 1912–1920.
    [7]
    D. Maturana and S. Scherer, “VoxNet: A 3D convolutional neural network for real-time object recognition,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Hamburg, Germany, 2015, pp. 922–928.
    [8]
    C. Wang, M. Cheng, F. Sohel, M. Bennamoun, and J. Li, “NormalNet: A voxel-based CNN for 3D object classification and retrieval,” Neurocomputing, vol. 323, pp. 139–147, Jan. 2019. doi: 10.1016/j.neucom.2018.09.075
    [9]
    S. F. Zhi, Y. X. Liu, X. Li, and Y. L. Guo, “Toward real-time 3D object recognition: A lightweight volumetric CNN framework using multitask learning,” Comput. &Graph., vol. 71, pp. 199–207, Apr. 2018.
    [10]
    X. H. Liu, Z. Z. Han, Y. S. Liu, and M. Zwicker, “Point2Sequence: Learning the shape representation of 3D point clouds with an attention-based sequence to sequence network,” in Proc. AAAI Conf. Artificial Intelligence, 2019, pp. 8778–8785.
    [11]
    C. R. Qi, L. Yi, H. Su, and L. J. Guibas, “PointNet++: Deep hierarchical feature learning on point sets in a metric space,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Red Hook, USA, 2017.
    [12]
    R. Q. Charles, H. Su, M. Kaichun, and L. J. Guibas, “PointNet: Deep learning on point sets for 3D classification and segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 77–85.
    [13]
    H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller, “Multi-view convolutional neural networks for 3D shape recognition,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 945–953.
    [14]
    C. R. Qi, H. Su, M. Nießner, A. Dai, M. Y. Yan, and L. J. Guibas, “Volumetric and multi-view CNNs for object classification on 3D data,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 5648–5656.
    [15]
    A. A. Liu, N. Hu, D. Song, F. B. Guo, H. Y. Zhou, and T. Hao, “Multi-view hierarchical fusion network for 3D object retrieval and classification,” IEEE Access, vol. 7, pp. 153021–153030, Oct. 2019. doi: 10.1109/ACCESS.2019.2947245
    [16]
    X. Feng, W. G. Wan, R. Y. D. Xu, H. Y. Chen, P. F. Li, and J. A. Sánchez, “A perceptual quality metric for 3D triangle meshes based on spatial pooling,” Front. Comput. Sci., vol. 12, no. 4, pp. 798–812, Jun. 2018. doi: 10.1007/s11704-017-6328-x
    [17]
    F. Torkhani, K. Wang, and J. M. Chassery, “A curvature-tensor-based perceptual quality metric for 3D triangular meshes,” Mach. Graph. &Vis., vol. 23, no. 1–2, pp. 59–82, 2014.
    [18]
    S. Ruder, “An overview of multi-task learning in deep neural networks,” arXiv preprint arXiv: 1706.05098, 2017.
    [19]
    Z. L. Cai and W. Zhu, “Feature selection for multi-label classification using neighborhood preservation,” IEEE/CAA J. Autom. Sinica, vol. 5, no. 1, pp. 320–330, Jan. 2018. doi: 10.1109/JAS.2017.7510781
    [20]
    A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,” in Proc. 25th Int. Conf. Neural Information Processing Systems, Lake Tahoe, USA, 2012, pp. 1097–1105.
    [21]
    C. C. Liu, X. F. Sun, C. Y. Chen, P. L. Rosin, Y. T. Yan, L. C. Jin, and X. Y. Peng, “Multi-scale residual hierarchical dense networks for single image super-resolution,” IEEE Access, vol. 7, pp. 60572–60583, May 2019. doi: 10.1109/ACCESS.2019.2915943
    [22]
    L. F. Bo, X. F. Ren, and D. Fox, “Unsupervised feature learning for RGB-D based object recognition,” in Experimental Robotics: The 13th Int. Symposium on Experimental Robotics, J. P. Desai, G. Dudek, O. Khatib, and V. Kumar, Eds. Heidelberg, Germany: Springer, 2013, pp. 387–402.
    [23]
    S. Gupta, P. Arbeláez, R. Girshick, and J. Malik, “Aligning 3D models to RGB-D images of cluttered scenes,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, Jun. 2015, pp. 4731–4740.
    [24]
    A. X. Chang, T. Funkhouser, L. Guibas, P. Hanrahan, Q. X. Huang, Z. M. Li, S. Savarese, M. Savva, S. R. Song, H. Su, J. X. Xiao, L. Yi, and F. Yu, “ShapeNet: An information-rich 3D model repository,” arXiv preprint arXiv: 1512.03012, 2015.
    [25]
    A. Kanezaki, Y. Matsushita, and Y. Nishida, “RotationNet: Joint object categorization and pose estimation using multiviews from unsupervised viewpoints,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018.
    [26]
    C. Ma, Y. L. Guo, J. G. Yang, and W. An, “Learning multi-view representation with LSTM for 3-D shape recognition and retrieval,” IEEE Trans. Multimed., vol. 21, no. 5, pp. 1169–1182, May 2019. doi: 10.1109/TMM.2018.2875512
    [27]
    Y. Ben-Shabat, M. Lindenbaum, and A. Fischer, “3D point cloud classification and segmentation using 3D modified fisher vector representation for convolutional neural networks,” arXiv preprint arXiv: 1711.08241, 2017.
    [28]
    Y. C. Liu, B. Fan, S. M. Xiang, and C. H. Pan, “Relation-shape convolutional neural network for point cloud analysis,” in Proc. CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, Jan. 2019.
    [29]
    C. Ma, W. An, Y. J. Lei, and Y. L. Guo, “BV-CNNs: Binary volumetric convolutional networks for 3D object recognition,” in Proc. British Machine Vision Conf., London, UK, 2017, pp. 148.
    [30]
    G. Riegler, A. O. Ulusoy, and A. Geiger, “OctNet: Learning deep 3D representations at high resolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017.
    [31]
    P. S. Wang, Y. Liu, Y. X. Guo, C. Y. Sun, and X. Tong, “O-CNN: Octree-based convolutional neural networks for 3D shape analysis,” ACM Trans. Graph., vol. 36, no. 4, Article No. 72, Jul. 2017.
    [32]
    A. A. M. Muzahid, W. G. Wan, F. Sohel, N. U. Khan, O. D. C. Villagómez, and H. Ullah, “3D object classification using a volumetric deep neural network: An efficient octree guided auxiliary learning approach,” IEEE Access, vol. 8, pp. 23802–23816, Jan. 2020. doi: 10.1109/ACCESS.2020.2968506
    [33]
    N. Sedaghat, M. Zolfaghari, E. Amiri, and T. Brox, “Orientation-boosted voxel nets for 3D object recognition,” arXiv preprint arXiv: 1604.03351, 2017.
    [34]
    V. Hegde and R. Zadeh, “FusionNet: 3D object classification using multiple data representations,” arXiv preprint arXiv: 1607.05695, 2016.
    [35]
    A. Brock, T. Lim, J. M. Ritchie, and N. Weston, “Generative and discriminative voxel modeling with convolutional neural networks,” arXiv preprint arXiv: 1608.04236, 2016.
    [36]
    S. Ghadai, X. Lee, A. Balu, S. Sarkar, and A. Krishnamurthy, “Multi-level 3D CNN for learning multi-scale spatial features,” arXiv preprint arXiv: 1805.12254v2, 2019.
    [37]
    L. Luciano and A. B. Hamza, “Deep similarity network fusion for 3D shape classification,” Vis. Comput., vol. 35, no. 6–8, pp. 1171–1180, Jun. 2019. doi: 10.1007/s00371-019-01668-9
    [38]
    Y. Hechtlinger, P. Chakravarti, and J. N. Qin, “A generalization of convolutional neural networks to graph-structured data,” arXiv preprint arXiv: 1704.08165, 2017.
    [39]
    K. G. Zhang, M. Hao, J. Wang, C. W. De Silva, and C. L. Fu, “Linked dynamic graph CNN: Learning on point cloud via linking hierarchical features,” arXiv preprint arXiv: 1904.10014, 2019.
    [40]
    Y. F. Feng, H. X. You, Z. Z. Zhang, R. R. Ji, and Y. Gao, “Hypergraph neural networks,” arXiv preprint arXiv: 1809.09401, 2019.
    [41]
    Z. Z. Zhang, H. J. Lin, X. B. Zhao, R. R. Ji, and Y. Gao, “Inductive multi-hypergraph learning and its application on view-based 3D object classification,” IEEE Trans. Image Process., vol. 27, no. 12, pp. 5957–5968, Dec. 2018. doi: 10.1109/TIP.2018.2862625
    [42]
    S. H. Khan, Y. L. Guo, M. Hayat, and N. Barnes, “Unsupervised primitive discovery for improved 3D generative modeling,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, Jun. 2019, pp. 9731–9740.
    [43]
    Y. Q. Yang, C. Feng, Y. R. Shen, and D. Tian, “FoldingNet: Point cloud auto-encoder via deep grid deformation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 206–215.
    [44]
    J. X. Li, B. M. Chen, and G. H. Lee, “SO-Net: Self-organizing network for point cloud analysis,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 9397–9406.
    [45]
    L. Minto, P. Zanuttigh, and G. Pagnutti, “Deep learning for 3D shape classification based on volumetric density and surface approximation clues,” in Proc. 13th Int. Joint Conf. Computer Vision, Imaging and Computer Graphics Theory and Applications, Funchal, Madeira, Portugal, 2018, pp. 317–324.
    [46]
    D. Cohen-Steiner and J. M. Morvan, “Restricted Delaunay triangulations and normal cycle,” in Proc. 9th Conf. Computational Geometry, San Diego, USA, 2003, pp. 312–321.
    [47]
    P. Cignoni, M. Callieri, M. Corsini, M. Dellepiane, F. Ganovelli, and G. Ranzuglia, “MeshLab: An open-source mesh processing tool,” Eurographics Italian Chapter Conf., V. Scarano, R. De Chiara, and U. Erra, Eds. The Eurographics Association, 2008.
    [48]
    L. Y. Deng, “The cross-entropy method: A unified approach to combinatorial optimization, monte-carlo simulation, and machine learning,” Technometrics, vol. 48, no. 1, pp. 147–148, Feb. 2006. doi: 10.1198/tech.2006.s353
    [49]
    I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 3994–4003.
    [50]
    I. Sutskever, J. Martens, G. Dahl, and G. Hinton, “On the importance of initialization and momentum in deep learning,” in Proc. 30th Int. Conf. Machine Learning, Atlanta, USA, 2013, pp. 11139–47.
    [51]
    R. Y. Rubinstein and D. P. Kroese, The Cross-Entropy Method: A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning. New York, USA: Springer, 2004.
    [52]
    N. Sedaghat and T. Brox, “Unsupervised generation of a view point annotated car dataset from videos,” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 1314–1322.
    [53]
    A. Cheraghian and L. Petersson, “3DCapsule: Extending the capsule architecture to classify 3D point clouds,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1194–1202.
    [54]
    J. J. Yin, N. N. Huang, J. Tang, and M. E. Fang, “Recognition of 3D shapes based on 3V-DepthPano CNN,” Math. Probl. Eng., vol. 2020, Article No. 7584576, Jan. 2020.
    [55]
    I. Misra, A. Shrivastava, A. Gupta, and M. Hebert, “Cross-stitch networks for multi-task learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 3994–4003.
    [56]
    M. De Deuge, A. Quadros, C. Hung, and B. Douillard, “Unsupervised feature learning for classification of outdoor 3D scans,” in Proc. Australasian Conf. Robotics and Automation, Australia, 2013.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(8)  / Tables(5)

    Article Metrics

    Article views (2061) PDF downloads(110) Cited by()

    Highlights

    • CurveNet is a novel volumetric CNN 3D for object classification
    • It takes curvature points features as input
    • It applies network parameter sharing and shows soft sharing achives the best results
    • It achieves state-of-the-art voxels representation results.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return