IEEE/CAA Journal of Automatica Sinica
Citation: | J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, and B. Xu, “PromptFusion: Harmonized semantic prompt learning for infrared and visible image fusion,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 3, pp. 502–515, Mar. 2025. doi: 10.1109/JAS.2024.124878 |
[1] |
G. Pajares and J. M. de la Cruz, “A wavelet-based image fusion tutorial,” Pattern Recognit., vol. 37, no. 9, pp. 1855–1872, Sep. 2004. doi: 10.1016/j.patcog.2004.03.010
|
[2] |
S. Li, B. Yang, and J. Hu, “Performance comparison of different multi-resolution transforms for image fusion,” Inf. Fusion, vol. 12, no. 2, pp. 74–84, Apr. 2011. doi: 10.1016/j.inffus.2010.03.002
|
[3] |
Z. Zhang and R. S. Blum, “A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application,” Proc. IEEE, vol. 87, no. 8, pp. 1315–1326, Aug. 1999. doi: 10.1109/5.775414
|
[4] |
J. Wang, J. Peng, X. Feng, G. He, and J. Fan, “Fusion method for infrared and visible images by using non-negative sparse representation,” Infrared Phys. Technol., vol. 67, pp. 477–489, Nov. 2014. doi: 10.1016/j.infrared.2014.09.019
|
[5] |
S. Li, H. Yin, and L. Fang, “Group-sparse representation with dictionary learning for medical image denoising and fusion,” IEEE Trans. Biomed. Eng., vol. 59, no. 12, pp. 3450–3459, Dec. 2012. doi: 10.1109/TBME.2012.2217493
|
[6] |
R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, and C. Yu, “VIF-Net: An unsupervised framework for infrared and visible image fusion,” IEEE Trans. Comput. Imaging, vol. 6, pp. 640–651, Jan. 2020. doi: 10.1109/TCI.2020.2965304
|
[7] |
Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, “DIDFuse: Deep image decomposition for infrared and visible image fusion,” in Proc. 29th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2020, pp. 970–976.
|
[8] |
H. Li, Y. Cen, Y. Liu, X. Chen, and Z. Yu, “Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion,” IEEE Trans. Image Process., vol. 30, pp. 4070–4083, Apr. 2021. doi: 10.1109/TIP.2021.3069339
|
[9] |
J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 1, pp. 105–119, Jan. 2022. doi: 10.1109/TCSVT.2021.3056725
|
[10] |
H. Xu, J. Yuan, and J. Ma, “MURF: Mutually reinforcing multi-modal image registration and fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12148–12166, Oct. 2023. doi: 10.1109/TPAMI.2023.3283682
|
[11] |
H. Li, J. Liu, Y. Zhang, and Y. Liu, “A deep learning framework for infrared and visible image fusion without strict registration,” Int. J. Comput. Vis., vol. 132, no. 5, pp. 1625–1644, May. 2024. doi: 10.1007/s11263-023-01948-x
|
[12] |
Y. Rao, D. Wu, M. Han, T. Wang, Y. Yang, T. Lei, C. Zhou, H. Bai, and L. Xing, “AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion,” Inf. Fusion, vol. 92, pp. 336–349, Apr. 2023. doi: 10.1016/j.inffus.2022.12.007
|
[13] |
H. Li, J. Zhao, J. Li, Z. Yu, and G. Lu, “Feature dynamic alignment and refinement for infrared-visible image fusion: Translation robust fusion,” Inf. Fusion, vol. 95, pp. 26–41, Jul. 2023. doi: 10.1016/j.inffus.2023.02.011
|
[14] |
J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion,” IEEE Trans. Image Process., vol. 29, pp. 4980–4995, Mar. 2020. doi: 10.1109/TIP.2020.2977573
|
[15] |
H. Xu, X. Wang, and J. Ma, “DRF: Disentangled representation for visible and infrared image fusion,” IEEE Trans. Instrum. Meas., vol. 70, p. 5006713, Feb. 2021.
|
[16] |
W. Tang, F. He, and Y. Liu, “YDTR: Infrared and visible image fusion via Y-shape dynamic transformer,” IEEE Trans. Multimedia, vol. 25, pp. 5413–5428, Jul. 2023. doi: 10.1109/TMM.2022.3192661
|
[17] |
M. Han, K. Yu, J. Qiu, H. Li, D. Wu, Y. Rao, Y. Yang, L. Xing, H. Bai, and C. Zhou, “Boosting target-level infrared and visible image fusion with regional information coordination,” Inf. Fusion, vol. 92, pp. 268–288, Apr. 2023. doi: 10.1016/j.inffus.2022.12.005
|
[18] |
J. Yue, L. Fang, S. Xia, Y. Deng, and J. Ma, “Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models,” IEEE Trans. Image Process., vol. 32, pp. 5705–5720, Oct. 2023. doi: 10.1109/TIP.2023.3322046
|
[19] |
X. Yi, L. Tang, H. Zhang, H. Xu, and J. Ma, “Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior,” Inf. Fusion, vol. 110, p. 102450, Oct. 2024. doi: 10.1016/j.inffus.2024.102450
|
[20] |
Y. Liu, Y. Shi, F. Mu, J. Cheng, and X. Chen, “Glioma segmentation-oriented multi-modal MR image fusion with adversarial learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1528–1531, Aug. 2022. doi: 10.1109/JAS.2022.105770
|
[21] |
Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1882–1886, Dec. 2016. doi: 10.1109/LSP.2016.2618776
|
[22] |
Y. Liu, C. Yu, J. Cheng, Z. J. Wang, and X. Chen, “MM-Net: A MixFormer-based multi-scale network for anatomical and functional image fusion,” IEEE Trans. Image Process., vol. 33, pp. 2197–2212, Mar. 2024. doi: 10.1109/TIP.2024.3374072
|
[23] |
Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 5906–5916.
|
[24] |
R. Liu, Z. Liu, J. Liu, X. Fan, and Z. Luo, “A task-guided, implicitly-searched and meta-initialized deep model for image fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 10, pp. 6594–6609, Oct. 2024. doi: 10.1109/TPAMI.2024.3382308
|
[25] |
J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 5802–5811.
|
[26] |
W. Zhao, S. Xie, F. Zhao, Y. He, and H. Lu, “MetaFusion: Infrared and visible image fusion via meta-feature embedding from object detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 13955–13965.
|
[27] |
J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 8115–8124.
|
[28] |
Z. Liu, J. Liu, G. Wu, L. Ma, X. Fan, and R. Liu, “Bi-level dynamic learning for jointly multi-modality image fusion and beyond,” in Proc. 32nd Int. Joint Conf. Artificial Intelligence, Macao, China, 2023, pp. 1240–1248.
|
[29] |
J. Li, J. Chen, J. Liu, and H. Ma, “Learning a graph neural network with cross modality interaction for image fusion,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 4471–4479.
|
[30] |
D. Wang, J. Liu, R. Liu, and X. Fan, “An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection,” Inf. Fusion, vol. 98, p. 101828, Oct. 2023. doi: 10.1016/j.inffus.2023.101828
|
[31] |
Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1186–1196, Mar. 2022. doi: 10.1109/TCSVT.2021.3075745
|
[32] |
R. Nie, C. Ma, J. Cao, H. Ding, and D. Zhou, “A total variation with joint norms for infrared and visible image fusion,” IEEE Trans. Multimedia, vol. 24, pp. 1460–1472, Mar. 2022. doi: 10.1109/TMM.2021.3065496
|
[33] |
R. Liu, Z. Liu, J. Liu, and X. Fan, “Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion,” in Proc. 29th ACM Int. Conf. Multimedia, China, 2021, pp. 1600–1608.
|
[34] |
X. Tian, W. Zhang, D. Yu, and J. Ma, “Sparse tensor prior for hyperspectral, multispectral, and panchromatic image fusion,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 284–286, Jan. 2023. doi: 10.1109/JAS.2022.106013
|
[35] |
J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, Jul. 2022. doi: 10.1109/JAS.2022.105686
|
[36] |
X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Text-IF: Leveraging semantic text guidance for degradation-aware and interactive image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2024, pp. 27016–27025.
|
[37] |
D. Wu, M. Han, Y. Yang, S. Zhao, Y. Rao, H. Li, X. Lin, C. Zhou, and H. Bai, “DCFusion: A dual-frequency cross-enhanced fusion network for infrared and visible image fusion,” IEEE Trans. Instrum. Meas., vol. 72, p. 5011815, Apr. 2023.
|
[38] |
H. Li and X.-J. Wu, “DenseFuse: A fusion approach to infrared and visible images,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614–2623, May 2019. doi: 10.1109/TIP.2018.2887342
|
[39] |
H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A unified unsupervised image fusion network,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 502–518, Jan. 2022. doi: 10.1109/TPAMI.2020.3012548
|
[40] |
J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “FusionGAN: A generative adversarial network for infrared and visible image fusion,” Inf. Fusion, vol. 48, pp. 11–26, Aug. 2019. doi: 10.1016/j.inffus.2018.09.004
|
[41] |
D. Wang, J. Liu, X. Fan, and R. Liu, “Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration,” in Proc. 31st Int. Joint Conf. Artificial Intelligence, Austria, 2022, pp. 3508–3515.
|
[42] |
Z. Liu, J. Liu, B. Zhang, L. Ma, X. Fan, and R. Liu, “PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 3706–3714.
|
[43] |
A. Joulin, L. van der Maaten, A. Jabri, and N. Vasilache, “Learning visual features from large weakly supervised data,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 67–84.
|
[44] |
A. Li, A. Jabri, A. Joulin, and L. van der Maaten, “Learning visual N-grams from web data,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 4183–4192.
|
[45] |
K. Desai and J. Johnson, “VirTex: Learning visual representations from textual annotations,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 11162–11173.
|
[46] |
P. Müller, G. Kaissis, C. Zou, and D. Rueckert, “Joint learning of localized representations from medical images and reports,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 685–701.
|
[47] |
M. B. Sariyildiz, J. Perez, and D. Larlus, “Learning visual representations with caption annotations,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 153–170.
|
[48] |
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
|
[49] |
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Proc. 38th Int. Conf. Machine Learning, 2021, pp. 8748–8763.
|
[50] |
M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, “MaPLe: Multi-modal prompt learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 19113–19122.
|
[51] |
C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang, “Domain adaptation via prompt learning,” IEEE Trans. Neural Netw. Learn. Syst., 2023, doi: 10.1109/TNNLS.2023.3327962.
|
[52] |
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Conditional prompt learning for vision-language models,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 16816–16825.
|
[53] |
S. Ma, C.-W. Xie, Y. Wei, S. Sun, J. Fan, X. Bao, Y. Guo, and Y. Zheng, “Understanding the multi-modal prompts of the pre-trained vision-language model,” arXiv preprint arXiv: 2312.11570, 2024.
|
[54] |
K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” Int. J. Comput. Vis., vol. 130, no. 9, pp. 2337–2348, Jul. 2022. doi: 10.1007/s11263-022-01653-1
|
[55] |
Z. Zhao, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, and L. Van Gool, “Image fusion via vision-language model,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, pp. 60749–60765.
|
[56] |
X. Li, Y. Zou, J. Liu, Z. Jiang, L. Ma, X. Fan, and R. Liu, “From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion,” arXiv preprint arXiv: 2401.00421, 2023.
|
[57] |
O. Patashnik, Z. Wu, E. Shechtman, D. Cohen-Or, and D. Lischinski, “StyleCLIP: Text-driven manipulation of styleGAN imagery,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 2065–2074.
|
[58] |
R. Gal, O. Patashnik, H. Maron, A. H. Bermano, G. Chechik, and D. Cohen-Or, “StyleGAN-NADA: CLIP-guided domain adaptation of image generators,” ACM Trans. Graphics, vol. 41, no. 4, p. 141, Jul. 2022.
|
[59] |
M. Liu, L. Jiao, X. Liu, L. Li, F. Liu, and S. Yang, “C-CNN: Contourlet convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2636–2649, Jun. 2021. doi: 10.1109/TNNLS.2020.3007412
|
[60] |
L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “PIAFusion: A progressive infrared and visible image fusion network based on illumination aware,” Inf. Fusion, vol. 83-84, pp. 79–92, Jul. 2022. doi: 10.1016/j.inffus.2022.03.007
|
[61] |
A. Toet and M. A. Hogervorst, “Progress in color night vision,” Opt. Eng., vol. 51, no. 1, p. 010901, Feb. 2012. doi: 10.1117/1.OE.51.1.010901
|
[62] |
H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “FusionDN: A unified densely connected network for image fusion,” in Proc. 34th AAAI Conf. Artificial Intelligence, New York, USA, 2020, pp. 12484–12491.
|
[63] |
Z. Zhao, H. Bai, J. Zhang, Y. Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, and L. Van Gool, “Equivariant multi-modality image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2024, pp. 25912–25921.
|
[64] |
J. Liu, R. Lin, G. Wu, R. Liu, Z. Luo, and X. Fan, “CoCoNet: Coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion,” Int. J. Comput. Vis., vol. 132, no. 5, pp. 1748–1775, May. 2024. doi: 10.1007/s11263-023-01952-1
|
[65] |
H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “LRRNet: A novel representation learning guided fusion network for infrared and visible images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 11040–11052, Sep. 2023. doi: 10.1109/TPAMI.2023.3268209
|
[66] |
Z. Zhao, H. Bai, Y. Zhu, J. Zhang, S. Xu, Y. Zhang, K. Zhang, D. Meng, R. Timofte, and L. Van Gool, “DDFM: Denoising diffusion model for multi-modality image fusion,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 8082–8093.
|
[67] |
Z. Huang, J. Liu, X. Fan, R. Liu, W. Zhong, and Z. Luo, “ReCoNet: Recurrent correction network for fast and efficient multi-modality image fusion,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 539–555.
|
[68] |
J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, Jan. 2019. doi: 10.1016/j.inffus.2018.02.004
|
[69] |
E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Proc. 35th Int. Conf. Neural Information Processing Systems, 2021, pp. 924.
|