Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing

Qimin Cheng; Yuzhuo Zhou; Haiyan Huang; Zhongyuan Wang

doi:10.1109/JAS.2022.105773

Volume 9 Issue 8

Aug. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(8): 1532-1535

Q. M. Cheng, Y. Z. Zhou, H. Y. Huang, and Z. Y. Wang, “Multi-attention fusion and fine-grained alignment for bidirectional image-sentence retrieval in remote sensing,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1532–1535, Aug. 2022. doi: 10.1109/JAS.2022.105773

Citation:

Q. M. Cheng, Y. Z. Zhou, H. Y. Huang, and Z. Y. Wang, “Multi-attention fusion and fine-grained alignment for bidirectional image-sentence retrieval in remote sensing,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1532–1535, Aug. 2022. doi: 10.1109/JAS.2022.105773

Citation:

PDF( 663 KB)

Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing

doi: 10.1109/JAS.2022.105773

1.
School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan 430074, China
2.
State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University, Wuhan 430079, China
3.
School of Computer Science, Wuhan University, Wuhan 430079, China

More Information

Abstract

FullText(HTML)

References(19)

References

[1]	H. Chen, G. Ding, X. Liu, Z. Lin, J. Liu, and J. Han, “IMRAM: Iterative matching with recurrent attention memory for cross-modal image-text retrieval,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Jun. 2020, pp. 12652−12660.
[2]	K. Lee, X. Chen, G. Hua, H. Hu, and X. He, “Stacked cross attention for image-text matching,” in Proc. 15th European Conf. Computer Vision, Sep. 2018, pp. 201−216.
[3]	T. Wang, X. Xu, Y. Yang, A. Hanjalic, H. Shen, and J. Song, “Matching images and text with multi-modal tensor fusion and re-ranking,” in Proc. 27th ACM Int. Conf. Multimedia, 2019, pp. 12-20.
[4]	Y. Wang, H. Yang, X. Qian, L. Ma, and X. Fan, “Position focused attention network for image-text matching,” in Proc. 28th Int. Joint Conf. Artificial Intelligence, Aug. 2019, pp. 3792−3798.
[5]	G. Wu, J. Han, Z. Lin, G. Ding, B. Zhang, and Q. Ni, “Joint image-text hashing for fast large-scale cross-media retrieval using self-supervised deep learning,” IEEE Trans. Industrial Electronics, vol. 66, no. 12, pp. 9868–9877, Dec. 2019. doi: 10.1109/TIE.2018.2873547
[6]	Abdullah, Ba zi, Rahhal A, et al, “TextRS: Deep bidirectional triplet network for matching text to remote sensing images,” Remote Sensing, vol. 12, no. 3, pp. 405–423, Jan. 2020. doi: 10.3390/rs12030405
[7]	Q. Cheng, Y. Zhou, P. Fu, Y. Xu, and L. Zhang, “A deep semantic alignment network for the cross-modal image-text retrieval in remote sensing,” IEEE J. Selected Topics in Applied Earth Observations and Remote Sensing, vol. 14, pp. 4284–4297, Apr. 2021. doi: 10.1109/JSTARS.2021.3070872
[8]	Y. Lv, W. Xiong, X. Zhang, and Y. Cui, “Fusion-based correlation learning model for cross-modal remote sensing image retrieval,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, Jan. 2022.
[9]	Z. Yuan, W. Zhang, K. Fu, X. Li, C. Deng, H. Wang, and X. Sun, “Exploring a fine-grained multiscale method for cross-modal remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 4404119, May. 2022.
[10]	Z. Yuan, W. Zhang, X. Rong, X. Li, J. Chen, H. Wang, K. Fu, and X. Sun, “A lightweight multi-scale cross-modal text-image retrieval method in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 60, p. 5612819, Apr. 2022.
[11]	G. Mikriukov, M. Ravanbakhsh, and B. Demir. “Deep unsupervised contrastive hashing for large-scale cross-modal text-image retrieval in remote sensing,” arXiv preprint arXiv: 2201.08125v1, Jan. 2022.
[12]	Y. Chen, X. Lu, and S. Wang, “Deep cross-modal image-voice retrieval in remote sensing,” IEEE Trans. Geoscience and Remote Sensing, vol. 58, no. 10, pp. 7049–7061, Oct. 2020. doi: 10.1109/TGRS.2020.2979273
[13]	U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “Attention-driven cross-modal remote sensing image retrieval”, in Proc. IEEE Int. Geoscience and Remote Sensing Symposium, 2021, pp. 4783−4786.
[14]	U. Chaudhuri, B. Banerjee, A. Bhattacharya, and M. Datcu, “CMIR-NET: A deep learning based model for cross-modal retrieval in remote sensing,” Pattern Recognition Letters, vol. 131, no. 2, pp. 456–462, 2020.
[15]	Y. Li, Y. Zhang, X. Huang, and J. Ma, “Learning source-invariant deep hashing convolutional neural networks for cross-source remote sensing image retrieval,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 11, pp. 6521–6536, Nov. 2018. doi: 10.1109/TGRS.2018.2839705
[16]	Y. Li, D. Kong, Y. Zhang, Y. Tan, and L. Chen, “Robust deep alignment network with remote sensing knowledge graph for zero-shot and generalized zero-shot remote sensing image scene classification,” ISPRS J. Photogrammetry and Remote Sensing, vol. 179, pp. 145–158, 2021. doi: 10.1016/j.isprsjprs.2021.08.001
[17]	X. Lu, B. Wang, X. Zheng, and X. Li, “Exploring models and data for remote sensing image caption generation,” IEEE Trans. Geoscience and Remote Sensing, vol. 56, no. 4, pp. 2183–2195, Apr. 2018. doi: 10.1109/TGRS.2017.2776321
[18]	B. Qu, X. Li, D. Tao, and X. Lu, “Deep semantic understanding of high resolution remote sensing image,” in Proc. Int. Conf. Computer, Inform. and Telecomm. Syst., pp. 124−128, Jul. 2016.
[19]	H. J. Hu, H. S. Wang, Z. Liu, and W. D. Chen, “Domain-invariant similarity activation map contrastive learning for retrieval-based long-term visual localization,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 2, pp. 313–328, Feb. 2022. doi: 10.1109/JAS.2021.1003907

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(6) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (640) PDF downloads(60)

Multi-Attention Fusion and Fine-Grained Alignment for Bidirectional Image-Sentence Retrieval in Remote Sensing

doi: 10.1109/JAS.2022.105773

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content