A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Volume 10 Issue 3
Mar.  2023

IEEE/CAA Journal of Automatica Sinica

  • JCR Impact Factor: 15.3, Top 1 (SCI Q1)
    CiteScore: 23.5, Top 2% (Q1)
    Google Scholar h5-index: 77, TOP 5
Turn off MathJax
Article Contents
Y. Liu, B. Jiang, and J. M. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, Mar. 2023. doi: 10.1109/JAS.2022.105863
Citation: Y. Liu, B. Jiang, and J. M. Xu, “Axial assembled correspondence network for few-shot semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 3, pp. 711–721, Mar. 2023. doi: 10.1109/JAS.2022.105863

Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation

doi: 10.1109/JAS.2022.105863
Funds:  This work was supported in part by the Key Research and Development Program of Guangdong Province (2021B0101200001) and the Guangdong Basic and Applied Basic Research Foundation (2020B1515120071)
More Information
  • Few-shot semantic segmentation aims at training a model that can segment novel classes in a query image with only a few densely annotated support exemplars. It remains a challenge because of large intra-class variations between the support and query images. Existing approaches utilize 4D convolutions to mine semantic correspondence between the support and query images. However, they still suffer from heavy computation, sparse correspondence, and large memory. We propose axial assembled correspondence network (AACNet) to alleviate these issues. The key point of AACNet is the proposed axial assembled 4D kernel, which constructs the basic block for semantic correspondence encoder (SCE). Furthermore, we propose the deblurring equations to provide more robust correspondence for the aforementioned SCE and design a novel fusion module to mix correspondences in a learnable manner. Experiments on PASCAL-5i reveal that our AACNet achieves a mean intersection-over-union score of 65.9 % for 1-shot segmentation and 70.6 % for 5-shot segmentation, surpassing the state-of-the-art method by 5.8 % and 5.0 % respectively.

     

  • loading
  • [1]
    J. M. Xu and Y. Liu, “Multiple guidance network for industrial product surface inspection with one labeled target sample,” IEEE Trans. Neural Netw. Learn. Syst., 2022, DOI: 10.1109/TNNLS.2022.3165575.
    [2]
    I. Ahmed, S. Din, G. Jeon, F. Piccialli, and G. Fortino, “Towards collaborative robotics in top view surveillance: A framework for multiple object tracking by detection using deep learning,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 7, pp. 1253–1270, Jul. 2021. doi: 10.1109/JAS.2020.1003453
    [3]
    Y. Liu, J. M. Xu, and Y. L. Wu, “A CISG method for internal defect detection of solar cells in different production processes,” IEEE Trans. Ind. Electron., vol. 69, no. 8, pp. 8452–8462, Aug. 2022. doi: 10.1109/TIE.2021.3104584
    [4]
    L. C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 40, no. 4, pp. 834–848, Apr. 2018. doi: 10.1109/TPAMI.2017.2699184
    [5]
    E. Shelhamer, J. Long, and T. Darrell, “Fully convolutional networks for semantic segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 4, pp. 640–651, Apr. 2017. doi: 10.1109/TPAMI.2016.2572683
    [6]
    J. H. Dong, Y. Cong, G. Sun, B. N. Zhong, and X. W. Xu, “What can be transferred: Unsupervised domain adaptation for endoscopic lesions segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 4022–4031.
    [7]
    K. M. He, X. Y. Zhang, S. Q. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
    [8]
    G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2261–2269.
    [9]
    J. Deng, W. Dong, R. Socher, L. J. Li, K. Li, and F. F. Li, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
    [10]
    S. D. Li, K. Han, T. W. Costain, H. Howard-Jenkins, and V. Prisacariu, “Correspondence networks with adaptive neighbourhood consensus,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 10193–10202.
    [11]
    C. B. Choy, J. Y. Gwak, S. Savarese, and M. Chandraker, “Universal correspondence network,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 2414–2422.
    [12]
    K. Han, R. S. Rezende, B. Ham, K. Y. K. Wong, M. Cho, C. Schmid, and J. Ponce, “SCNet: Learning semantic correspondence”, in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 1849–1858.
    [13]
    I. Rocco, M. Cimpoi, R. Arandjelovicć, A. Torii, T. Pajdla, and J. Sivic, “Neighbourhood consensus networks,” in Proc. 32nd Int. Conf. Neural Information Processing Systems, Montréal, Canada, 2018, pp. 1658–1669.
    [14]
    J. Long, N. Zhang, and T. Darrell, “Do convnets learn correspondence?” in Proc. 28th Int. Conf. Neural Information Processing Systems, Montreal, Canada, 2014, pp. 1601–1609.
    [15]
    Y. B. Liu, L. C. Zhu, M. Yamada, and Y. Yang, “Semantic correspondence as an optimal transport problem,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 4462–4471.
    [16]
    G. S. Yang and D. Ramanan, “Volumetric correspondence networks for optical flow,” in Proc. 33rd Int. Conf. Neural Information Processing Systems, Vancouver, Canada, 2019, pp. 794–805.
    [17]
    J. Min, D. Kang, and M. Cho, “Hypercorrelation squeeze for few-shot segmenation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Montréal, Canada, 2021, pp. 6921–6932.
    [18]
    A. Shaban, S. Bansal, Z. Liu, I. Essa, and B. Boots, “One-shot learning for semantic segmentation,” in Proc. British Machine Vision Conf., London, UK, 2017, pp. 1–9.
    [19]
    K. Nguyen and S. Todorovic, “Feature weighting and boosting for few-shot segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 622–631.
    [20]
    H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, and J. Y. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 6230–6239.
    [21]
    O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” in Proc. 18th Int. Conf. Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 2015, pp. 234–241.
    [22]
    Y. F. Xia, H. Yu, and F.-Y. Wang, “Accurate and robust eye center localization via fully convolutional networks,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1127–1138, Sept. 2019. doi: 10.1109/JAS.2019.1911684
    [23]
    K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y. Wang, “FISS GAN: A generative adversarial network for foggy image semantic segmentation,” IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1428–1439, Aug. 2021. doi: 10.1109/JAS.2021.1004057
    [24]
    O. Vinyals, C. Blundell, T. Lillicrap, K. Kavukcuoglu, and D. Wierstra, “Matching networks for one shot learning,” in Proc. 30th Int. Conf. Neural Information Processing Systems, Barcelona, Spain, 2016, pp. 3637–3645.
    [25]
    J. Snell, K. Swersky, and R. S. Zemel, “Prototypical networks for few-shot learning,” in Proc. 31st Int. Conf. Neural Information Processing Systems, Long Beach, USA, 2017, pp. 4080–4090.
    [26]
    C. Finn, P. Abbeel, and S. Levine, “Model-agnostic meta-learning for fast adaptation of deep networks,” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1126–1135.
    [27]
    Q. Cai, Y. W. Pan, T. Yao, C. G. Yan, and T. Mei, “Memory matching networks for one-shot image recognition,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 4080–4088.
    [28]
    X. L. Zhang, Y. C. Wei, Y. Yang, and T. S. Huang, “SG-One: Similarity guidance network for one-shot semantic segmentation,” IEEE Trans. Cybern., vol. 50, no. 9, pp. 3855–3865, Sept. 2020. doi: 10.1109/TCYB.2020.2992433
    [29]
    K. X. Wang, J. H. Liew, Y. T. Zou, D. Q. Zhou, and J. S. Feng, “PANet: Few-shot image semantic segmentation with prototype alignment,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 9196–9205.
    [30]
    C. Zhang, G. S. Lin, F. Y. Liu, R. Yao, and C. H. Shen, “CANet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5212–5221.
    [31]
    Z. T. Tian, H. S. Zhao, M. Shu, Z. C. Yang, R. Y. Li, and J. Y. Jia, “Prior guided feature enrichment network for few-shot segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 2, pp. 1050–1065, Feb. 2022. doi: 10.1109/TPAMI.2020.3013717
    [32]
    W. Liu, A. Rabinovich, and A. C. Berg, “ParseNet: Looking wider to see better,” arXiv preprint arXiv: 1506.04579, 2015.
    [33]
    H. C. Wang, X. D. Zhang, Y. T. Hu, Y. D. Yang, X. B. Cao, and X. T. Zhen, “Few-shot semantic segmentation with democratic attention networks,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 730–746.
    [34]
    C. Zhang, G. S. Lin, F. Y. Liu, J. S. Guo, Q. Y. Wu, and R. Yao, “Pyramid graph networks with connection attentions for region-based one-shot semantic segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Seoul, Korea (South), 2019, pp. 9586–9594.
    [35]
    D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis., vol. 60, no. 2, pp. 91–110, Nov. 2004. doi: 10.1023/B:VISI.0000029664.99615.94
    [36]
    N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Diego, USA, 2005, pp. 886–893.
    [37]
    S. Kim, D. Min, B. Ham, S. Lin, and K. Sohn, “FCSS: Fully convolutional self-similarity for dense semantic correspondence,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 41, no. 3, pp. 581–595, Mar. 2019. doi: 10.1109/TPAMI.2018.2803169
    [38]
    Y. X. Wu and K. M. He, “Group normalization,” Int. J. Comput. Vis., vol. 128, no. 3, pp. 742–755, Mar. 2020. doi: 10.1007/s11263-019-01198-w
    [39]
    M. Everingham, S. M. A. Eslami, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The PASCAL visual object classes challenge: A retrospective,” Int. J. Comput. Vis., vol. 111, no. 1, pp. 98–136, Jan. 2015. doi: 10.1007/s11263-014-0733-5
    [40]
    B. Hariharan, P. Arbeláez, R. Girshick, and J. Malik, “Simultaneous detection and segmentation,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 297–312.
    [41]
    X. Li, T. H. Wei, Y. P. Chen, Y. W. Tai, and C. K. Tang, “FSS-1000: A 1000-class dataset for few-shot segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 2866–2875.
    [42]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proc. 3rd Int. Conf. Learning Representations, San Diego, USA, 2015.
    [43]
    K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Conditional networks for few-shot semantic segmentation,” in Proc. 6th Int. Conf. Learning Representations, Vancouver, Canada, 2018.
    [44]
    G. Li, V. Jampani, L. Sevilla-Lara, D. Q. Sun, J. Kim, and J. Kim, “Adaptive prototype learning and allocation for few-shot segmentation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 8330–8339.
    [45]
    M. Boudiaf, H. Kervadec, Z. I. Masud, P. Piantanida, I. B. Ayed, and J. Dolz, “Few-shot segmentation without meta-learning: A good transductive inference is all you need?” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 13974–13983.
    [46]
    T. Hu, P. W. Yang, C. L. Zhang, G. Yu, Y. D. Mu, and C. G. M. Snoek, “Attention-based multi-context guiding for few-shot semantic segmentation,” in Proc. Proc. 33rd AAAI Conf. Artificial Intelligence and 31st Innovative Applications of Artificial Intelligence Conf. and 9th AAAI Symp. Educational Advances in Artificial Intelligence, Honolulu, USA, 2019, pp. 8441–8448.
    [47]
    Y. F. Liu, X. Y. Zhang, S. Y. Zhang, and X. M. He, “Part-aware prototype network for few-shot semantic segmentation,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 142–158.
    [48]
    B. Y. Yang, C. Liu, B. H. Li, J. B. Jiao, and Q. X. Ye, “Prototype mixture models for few-shot semantic segmentation,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 763–778.
    [49]
    T. Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: Common objects in context,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 740–755.
    [50]
    K. Rakelly, E. Shelhamer, T. Darrell, A. A. Efros, and S. Levine, “Few-shot segmentation propagation with guided networks,” arXiv preprint arXiv: 1806.07373, 2018.
    [51]
    R. Azad, A. R. Fayjie, C. Kauffmann, I. B. Ayed, M. Pedersoli, and J. Dolz, “On the texture bias for few-shot CNN segmentation,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2021, pp. 2673–2682.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(5)  / Tables(8)

    Article Metrics

    Article views (676) PDF downloads(88) Cited by()

    Highlights

    • We develop a novel 4d kernel (AA-Conv4d). It conducts an appropriate weight-sparsification while keeping sufficient communications between the support and the query subspaces
    • We propose a simple but effective preprocessing module to modify the statistical distribution of the semantic correspondences, which can effectively improve segmentation performance
    • By mixing pyramid correspondences with a learnable concatenation operation, our FM helps adaptively refine the squeezed correspondences for query segmentation
    • This work achieves a mean Intersection-over-Union score of $65.9\%$ and $70.6\%$ on PASCAL-5$^i$ for 1-shot and 5-shot settings respectively, outperforming state-of-the-art results by $5.8\%$ and $5.0\%$ respectively

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return