Citation: | J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, and B. Xu, “PromptFusion: Harmonized semantic prompt learning for infrared and visible image fusion,” IEEE/CAA J. Autom. Sinica, 2024. doi: 10.1109/JAS.2024.124878 |
[1] |
G. Pajares and J. M. De La Cruz, “A wavelet-based image fusion tutorial,” Pattern Recognition, vol. 37, no. 9, pp. 1855–1872, 2004. doi: 10.1016/j.patcog.2004.03.010
|
[2] |
S. Li, B. Yang, and J. Hu, “Performance comparison of different multiresolution transforms for image fusion,” Information Fusion, vol. 12, no. 2, pp. 74–84, 2011. doi: 10.1016/j.inffus.2010.03.002
|
[3] |
Z. Zhang and R. S. Blum, “A categorization of multiscaledecomposition-based image fusion schemes with a performance study for a digital camera application,” Proc. the IEEE, vol. 87, no. 8, pp. 1315–1326, 1999. doi: 10.1109/5.775414
|
[4] |
J. Wang, J. Peng, X. Feng, G. He, and J. Fan, “Fusion method for infrared and visible images by using non-negative sparse representation,” Infrared Physics & Technology, vol. 67, pp. 477–489, 2014.
|
[5] |
S. Li, H. Yin, and L. Fang, “Group-sparse representation with dictionary learning for medical image denoising and fusion,” IEEE Trans. biomedical engineering, vol. 59, no. 12, pp. 3450–3459, 2012. doi: 10.1109/TBME.2012.2217493
|
[6] |
R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, and C. Yu, “Vif-net: An unsupervised framework for infrared and visible image fusion,” IEEE Trans. Computational Imaging, vol. 6, pp. 640–651, 2020. doi: 10.1109/TCI.2020.2965304
|
[7] |
Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, “Didfuse: Deep image decomposition for infrared and visible image fusion,” arXiv preprint arXiv: 2003.09210, 2020.
|
[8] |
H. Li, Y. Cen, Y. Liu, X. Chen, and Z. Yu, “Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion,” IEEE Trans. Image Processing, vol. 30, pp. 4070–4083, 2021. doi: 10.1109/TIP.2021.3069339
|
[9] |
J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a deep multiscale feature ensemble and an edge-attention guidance for image fusion,” IEEE Trans. Circuits and Systems for Video Technology, vol. 32, no. 1, pp. 105–119, 2021.
|
[10] |
H. Xu, J. Yuan, and J. Ma, “Murf: Mutually reinforcing multi-modal image registration and fusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2023.
|
[11] |
H. Li, J. Liu, Y. Zhang, and Y. Liu, “A deep learning framework for infrared and visible image fusion without strict registration,” Int. Journal of Computer Vision, pp. 1–20, 2023.
|
[12] |
Y. Rao, D. Wu, M. Han, T. Wang, Y. Yang, T. Lei, C. Zhou, H. Bai, and L. Xing, “At-gan: A generative adversarial network with attention and transition for infrared and visible image fusion,” Information Fusion, vol. 92, pp. 336–349, 2023. doi: 10.1016/j.inffus.2022.12.007
|
[13] |
H. Li, J. Zhao, J. Li, Z. Yu, and G. Lu, “Feature dynamic alignment and refinement for infrared–visible image fusion: Translation robust fusion,” Information Fusion, vol. 95, pp. 26–41, 2023. doi: 10.1016/j.inffus.2023.02.011
|
[14] |
J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “Ddcgan: A dual-discriminator conditional generative adversarial network for multiresolution image fusion,” IEEE Trans. Image Processing, vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573
|
[15] |
H. Xu, X. Wang, and J. Ma, “Drf: Disentangled representation for visible and infrared image fusion,” IEEE Trans. Instrumentation and Measurement, vol. 70, pp. 1–13, 2021.
|
[16] |
W. Tang, F. He, and Y. Liu, “Ydtr: Infrared and visible image fusion via y-shape dynamic transformer,” IEEE Trans. Multimedia, 2022.
|
[17] |
M. Han, K. Yu, J. Qiu, H. Li, D. Wu, Y. Rao, Y. Yang, L. Xing, H. Bai, and C. Zhou, “Boosting target-level infrared and visible image fusion with regional information coordination,” Information Fusion, vol. 92, pp. 268–288, 2023. doi: 10.1016/j.inffus.2022.12.005
|
[18] |
J. Yue, L. Fang, S. Xia, Y. Deng, and J. Ma, “Dif-fusion: Towards high color fidelity in infrared and visible image fusion with diffusion models,” IEEE Trans. Image Processing, 2023.
|
[19] |
X. Yi, L. Tang, H. Zhang, H. Xu, and J. Ma, “Diff-if: Multi-modality image fusion via diffusion model with fusion knowledge prior,” Information Fusion, p. 102450, 2024.
|
[20] |
Y. Liu, Y. Shi, F. Mu, J. Cheng, and X. Chen, “Glioma segmentationoriented multi-modal mr image fusion with adversarial learning,” IEEE/CAA journal of automatica sinica, vol. 9, no. 8, pp. 1528–1531, 2022. doi: 10.1109/JAS.2022.105770
|
[21] |
Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Image fusion with convolutional sparse representation,” IEEE signal processing letters, vol. 23, no. 12, pp. 1882–1886, 2016. doi: 10.1109/LSP.2016.2618776
|
[22] |
Y. Liu, C. Yu, J. Cheng, Z. J. Wang, and X. Chen, “Mm-net: A mixformer-based multi-scale network for anatomical and functional image fusion,” IEEE Trans. Image Processing, vol. 33, pp. 2197–2212, 2024. doi: 10.1109/TIP.2024.3374072
|
[23] |
Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 5906–5916.
|
[24] |
R. Liu, Z. Liu, J. Liu, X. Fan, and Z. Luo, “A task-guided, implicitlysearched and metainitialized deep model for image fusion,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2024.
|
[25] |
J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multimodality benchmark to fuse infrared and visible for object detection,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 5802–5811.
|
[26] |
W. Zhao, S. Xie, F. Zhao, Y. He, and H. Lu, “Metafusion: Infrared and visible image fusion via meta-feature embedding from object detection,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 13955–13965.
|
[27] |
J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2023, pp. 8115–8124.
|
[28] |
Z. Liu, J. Liu, G. Wu, L. Ma, X. Fan, and R. Liu, “Bi-level dynamic learning for jointly multi-modality image fusion and beyond,” arXiv preprint arXiv: 2305.06720, 2023.
|
[29] |
J. Li, J. Chen, J. Liu, and H. Ma, “Learning a graph neural network with cross modality interaction for image fusion,” in Proc. the 31st ACM Int. Conf. on Multimedia, 2023, pp. 4471–4479.
|
[30] |
D. Wang, J. Liu, R. Liu, and X. Fan, “An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection,” Information Fusion, vol. 98, p. 101828, 2023. doi: 10.1016/j.inffus.2023.101828
|
[31] |
Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Trans. Circuits and Systems for Video Technology, vol. 32, no. 3, pp. 1186–1196, 2021.
|
[32] |
R. Nie, C. Ma, J. Cao, H. Ding, and D. Zhou, “A total variation with joint norms for infrared and visible image fusion,” IEEE Trans. Multimedia, vol. 24, pp. 1460–1472, 2021.
|
[33] |
R. Liu, Z. Liu, J. Liu, and X. Fan, “Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion,” in Proc. the 29th ACM Int. Conf. on Multimedia, 2021, pp. 1600–1608.
|
[34] |
X. Tian, W. Zhang, D. Yu, and J. Ma, “Sparse tensor prior for hyperspectral, multispectral, and panchromatic image fusion,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 1, pp. 284–286, 2022.
|
[35] |
J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “Swinfusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA Journal of Automatica Sinica, vol. 9, no. 7, pp. 1200–1217, 2022. doi: 10.1109/JAS.2022.105686
|
[36] |
X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Text-if: Leveraging semantic text guidance for degradation-aware and interactive image fusion,” arXiv preprint arXiv: 2403.16387, 2024.
|
[37] |
D. Wu, M. Han, Y. Yang, S. Zhao, Y. Rao, H. Li, X. Lin, C. Zhou, and H. Bai, “Dcfusion: A dual-frequency cross-enhanced fusion network for infrared and visible image fusion,” IEEE Trans. Instrumentation and Measurement, 2023.
|
[38] |
H. Li and X.-J. Wu, “Densefuse: A fusion approach to infrared and visible images,” IEEE Trans. Image Processing, vol. 28, no. 5, pp. 2614–2623, 2018.
|
[39] |
H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2fusion: A unified unsupervised image fusion network,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2020.
|
[40] |
J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “Fusiongan: A generative adversarial network for infrared and visible image fusion,” Information Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004
|
[41] |
D. Wang, J. Liu, X. Fan, and R. Liu, “Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration,” arXiv preprint arXiv: 2205.11876, 2022.
|
[42] |
Z. Liu, J. Liu, B. Zhang, L. Ma, X. Fan, and R. Liu, “Paif: Perceptionaware infrared-visible image fusion for attack-tolerant semantic segmentation,” in Proc. the 31st ACM Int. Conf. on Multimedia, 2023, pp. 3706–3714.
|
[43] |
A. Joulin, L. Van Der Maaten, A. Jabri, and N. Vasilache, “Learning visual features from large weakly supervised data,” in Proc. the European Conf. on Computer Vision, 2016, pp. 67–84.
|
[44] |
A. Li, A. Jabri, A. Joulin, and L. Van Der Maaten, “Learning visual n-grams from web data,” in Proc. the IEEE Int. Conf. on Computer Vision, 2017, pp. 4183–4192.
|
[45] |
K. Desai and J. Johnson, “Virtex: Learning visual representations from textual annotations,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 11162–11173.
|
[46] |
P. Müller, G. Kaissis, C. Zou, and D. Rueckert, “Joint learning of local-ized representations from medical images and reports,” in Proc. the European Conf. on Computer Vision, 2022, pp. 685–701.
|
[47] |
M. B. Sariyildiz, J. Perez, and D. Larlus, “Learning visual representations with caption annotations,” in Proc. the European Conf. on Computer Vision, 2020, pp. 153–170.
|
[48] |
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “Imagenet: A large-scale hierarchical image database,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2009, pp. 248–255.
|
[49] |
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” arXiv preprint 2103.00020, 2021.
|
[50] |
M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, “Maple: Multi-modal prompt learning,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 19113–19122.
|
[51] |
C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang, “Domain adaptation via prompt learning,” IEEE Trans. Neural Networks and Learning Systems, 2023.
|
[52] |
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Conditional prompt learning for vision-language models,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 16816–16825.
|
[53] |
S. Ma, C.-W. Xie, Y. Wei, S. Sun, J. Fan, X. Bao, Y. Guo, and Y. Zheng, “Understanding the multi-modal prompts of the pre-trained vision-language model,” arXiv preprint 2312.11570, 2024.
|
[54] |
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for visionlanguage models,” Int. Journal of Computer Vision, vol. 130, no. 9, pp. 2337–2348, 2022. doi: 10.1007/s11263-022-01653-1
|
[55] |
Z. Zhao, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, and L. V. Gool, “Image fusion via vision-language model,” in Proc. the Int. Conf. on Machine Learning (ICML), 2024.
|
[56] |
X. Li, Y. Zou, J. Liu, Z. Jiang, L. Ma, X. Fan, and R. Liu, “From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion,” arXiv preprint arXiv: 2401.00421, 2023.
|
[57] |
O. Patashnik, Z. Wu, E. Shechtman, D. Cohen-Or, and D. Lischinski, “Styleclip: Text-driven manipulation of stylegan imagery,” arXiv preprint arXiv: 2103.17249, 2021.
|
[58] |
R. Gal, O. Patashnik, H. Maron, G. Chechik, and D. Cohen-Or, “Stylegan-nada: Clip-guided domain adaptation of image generators,” arXiv preprint 2108.00946, 2021.
|
[59] |
M. Liu, L. Jiao, X. Liu, L. Li, F. Liu, and S. Yang, “C-cnn: Contourlet convolutional neural networks,” IEEE Trans. Neural Networks and Learning Systems, vol. 32, no. 6, pp. 2636–2649, 2020.
|
[60] |
L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “Piafusion: A progressive infrared and visible image fusion network based on illumination aware,” Information Fusion, vol. 83, pp. 79–92, 2022.
|
[61] |
A. Toet and M. A. Hogervorst, “Progress in color night vision,” Optical Engineering, vol. 51, no. 1, pp. 010 901–010 901, 2012. doi: 10.1117/1.OE.51.1.010901
|
[62] |
H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “Fusiondn: A unified densely connected network for image fusion,” in Proc. the AAAI Conf. on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 12484–12491.
|
[63] |
Z. Zhao, H. Bai, J. Zhang, Y. Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, and L. Van Gool, “Equivariant multi-modality image fusion,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2024, pp. 25912–25921.
|
[64] |
J. Liu, R. Lin, G. Wu, R. Liu, Z. Luo, and X. Fan, “Coconet: Coupled contrastive learning network with multi-level feature ensemble for multimodality image fusion,” Int. Journal of Computer Vision, pp. 1–28, 2023.
|
[65] |
H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “Lrrnet: A novel representation learning guided fusion network for infrared and visible images,” IEEE Trans. Pattern Analysis and Machine Intelligence, 2023.
|
[66] |
Z. Zhao, H. Bai, Y. Zhu, J. Zhang, S. Xu, Y. Zhang, K. Zhang, D. Meng, R. Timofte, and L. Van Gool, “Ddfm: denoising diffusion model for multi-modality image fusion,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2023, pp. 8082–8093.
|
[67] |
Z. Huang, J. Liu, X. Fan, R. Liu, W. Zhong, and Z. Luo, “Reconet: Recurrent correction network for fast and efficient multi-modality image fusion,” in Proc. the European Conf. on Computer Vision, 2022, pp. 539–555.
|
[68] |
J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Information Fusion, vol. 45, pp. 153–178, 2019. doi: 10.1016/j.inffus.2018.02.004
|
[69] |
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Proc. the Advances in Neural Information Processing Systems, vol. 34, pp. 12077–12090, 2021.
|