PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion

Jinyuan Liu; Xingyuan Li; Zirui Wang; Zhiying Jiang; Wei Zhong; Wei Fan; Bin Xu

doi:10.1109/JAS.2024.124878

Volume 12 Issue 3

Mar. 2025

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > 12(3): 502-515

J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, and B. Xu, “PromptFusion: Harmonized semantic prompt learning for infrared and visible image fusion,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 3, pp. 502–515, Mar. 2025. doi: 10.1109/JAS.2024.124878

Citation:

J. Liu, X. Li, Z. Wang, Z. Jiang, W. Zhong, W. Fan, and B. Xu, “PromptFusion: Harmonized semantic prompt learning for infrared and visible image fusion,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 3, pp. 502–515, Mar. 2025. doi: 10.1109/JAS.2024.124878

Citation:

PDF( 14826 KB)

PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion

doi: 10.1109/JAS.2024.124878

Funds: This work was partially supported by China Postdoctoral Science Foundation (2023M730741) and the National Natural Science Foundation of China (U22B2052, 52102432, 52202452, 62372080, 62302078)

More Information

Author Bio:
Jinyuan Liu received the M.S. degree in computer science from Dalian University, in 2018. He received the Ph.D. degree in software engineering from the Dalian University of Technology, in 2022. He is currently a Postdoctoral Fellow in the School of Software Technology at Dalian University of Technology. He is also affiliated with the Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province. His research interests include computer vision, image fusion, and deep learning

Xingyuan Li received the bachelor degree from the the School of Software Technology, Dalian University of Technology, in 2022. He is currently pursuing the master degree in the same institution. He is also affiliated with the Key Laboratory for Ubiquitous Network and Service Software of Liaoning Province at Dalian University of Technology. His research interests include computer vision, low-level vision, image fusion, and image enhancement

Zirui Wang has been an undergraduate student at the School of Software Technology at Dalian University of Technology, since 2021. His major is digital media and technology. His research interests focus on low-level vision and image fusion

Zhiying Jiang received the B.E. degree in soft-ware engineering from Dalian Maritime University, in 2017, and the M.S. and Ph.D. degrees in software engineering from Dalian University of Technology, in 2020 and 2024, respectively. She is currently with the School of Software Technology, Dalian University of Technology. Her research interests include computer vision, image restoration, and image stitching

Wei Zhong received the B.S. degree from Dalian University of Technology, in 2008, the M.S. and Ph.D. degrees from Waseda University, Japan, in 2010 and 2014, respectively. He served as a Research Assistant in the Information, Production and Systems Research Center of Waseda University, Japan, from 2010 to 2011, an Associate Specialist in the Central Research Laboratory of Ricoh Company, Japan, from 2011 to 2014, a Chief Designer and Director of the Institute of Image Processing Technology in the State Key Laboratory of Digital Multimedia Technology, Hisense Group, from 2014 to 2018. From 2015 to 2017, he also served as a Postdoctoral Researcher in the University of Science and Technology of China. He is currently a Professor in the School of Software Technology, Dalian University of Technology. His research interests include several aspects of computer vision and image processing algorithms, VLSI design automation, networks on chips, and hardware-software co-design of embedded systems

Wei Fan received the B.S. and Ph.D. degrees in mechanical engineering from the Beijing Institute of Technology, in 2013 and 2019, respectively. From 2019 to 2021, he researched as a Postdoctor with Beijing Institute of Technology. He is currently an Associate Professor with the School of Mechanical Engineering, Beijing Institute of Technology. His research interests include control and navigation theory, and land-air amphibious equipment

Bin Xu received the B.S. and Ph.D. degrees in mechanical engineering from the Beijing Institute of Technology, in 2005 and 2013, respectively. He is currently a Professor with the School of Mechanical Engineering, Beijing Institute of Technology. His research interests include aerial and ground vehicle, and dynamic control
Corresponding author: Bin Xu, e-mail: bitxubin@bit.edu.cn
Received Date: 2024-07-11
Revised Date: 2024-08-11
Accepted Date: 2024-08-22

Available Online: 2024-11-22

Abstract

Abstract

The goal of infrared and visible image fusion (IVIF) is to integrate the unique advantages of both modalities to achieve a more comprehensive understanding of a scene. However, existing methods struggle to effectively handle modal disparities, resulting in visual degradation of the details and prominent targets of the fused images. To address these challenges, we introduce PromptFusion, a prompt-based approach that harmoniously combines multi-modality images under the guidance of semantic prompts. Firstly, to better characterize the features of different modalities, a contourlet autoencoder is designed to separate and extract the high-/low-frequency components of different modalities, thereby improving the extraction of fine details and textures. We also introduce a prompt learning mechanism using positive and negative prompts, leveraging Vision-Language Models to improve the fusion model’s understanding and identification of targets in multi-modality images, leading to improved performance in downstream tasks. Furthermore, we employ bi-level asymptotic convergence optimization. This approach simplifies the intricate non-singleton non-convex bi-level problem into a series of convergent and differentiable single optimization problems that can be effectively resolved through gradient descent. Our approach advances the state-of-the-art, delivering superior fusion quality and boosting the performance of related downstream tasks. Project page: https://github.com/hey-it-s-me/PromptFusion.
- Bi-level optimization,
- image fusion,
- infrared and visible image,
- prompt learning

FullText(HTML)

References(69)

References

[1]	G. Pajares and J. M. de la Cruz, “A wavelet-based image fusion tutorial,” Pattern Recognit., vol. 37, no. 9, pp. 1855–1872, Sep. 2004. doi: 10.1016/j.patcog.2004.03.010
[2]	S. Li, B. Yang, and J. Hu, “Performance comparison of different multi-resolution transforms for image fusion,” Inf. Fusion, vol. 12, no. 2, pp. 74–84, Apr. 2011. doi: 10.1016/j.inffus.2010.03.002
[3]	Z. Zhang and R. S. Blum, “A categorization of multiscale-decomposition-based image fusion schemes with a performance study for a digital camera application,” Proc. IEEE, vol. 87, no. 8, pp. 1315–1326, Aug. 1999. doi: 10.1109/5.775414
[4]	J. Wang, J. Peng, X. Feng, G. He, and J. Fan, “Fusion method for infrared and visible images by using non-negative sparse representation,” Infrared Phys. Technol., vol. 67, pp. 477–489, Nov. 2014. doi: 10.1016/j.infrared.2014.09.019
[5]	S. Li, H. Yin, and L. Fang, “Group-sparse representation with dictionary learning for medical image denoising and fusion,” IEEE Trans. Biomed. Eng., vol. 59, no. 12, pp. 3450–3459, Dec. 2012. doi: 10.1109/TBME.2012.2217493
[6]	R. Hou, D. Zhou, R. Nie, D. Liu, L. Xiong, Y. Guo, and C. Yu, “VIF-Net: An unsupervised framework for infrared and visible image fusion,” IEEE Trans. Comput. Imaging, vol. 6, pp. 640–651, Jan. 2020. doi: 10.1109/TCI.2020.2965304
[7]	Z. Zhao, S. Xu, C. Zhang, J. Liu, P. Li, and J. Zhang, “DIDFuse: Deep image decomposition for infrared and visible image fusion,” in Proc. 29th Int. Joint Conf. Artificial Intelligence, Yokohama, Japan, 2020, pp. 970–976.
[8]	H. Li, Y. Cen, Y. Liu, X. Chen, and Z. Yu, “Different input resolutions and arbitrary output resolution: A meta learning-based deep framework for infrared and visible image fusion,” IEEE Trans. Image Process., vol. 30, pp. 4070–4083, Apr. 2021. doi: 10.1109/TIP.2021.3069339
[9]	J. Liu, X. Fan, J. Jiang, R. Liu, and Z. Luo, “Learning a deep multi-scale feature ensemble and an edge-attention guidance for image fusion,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 1, pp. 105–119, Jan. 2022. doi: 10.1109/TCSVT.2021.3056725
[10]	H. Xu, J. Yuan, and J. Ma, “MURF: Mutually reinforcing multi-modal image registration and fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 10, pp. 12148–12166, Oct. 2023. doi: 10.1109/TPAMI.2023.3283682
[11]	H. Li, J. Liu, Y. Zhang, and Y. Liu, “A deep learning framework for infrared and visible image fusion without strict registration,” Int. J. Comput. Vis., vol. 132, no. 5, pp. 1625–1644, May. 2024. doi: 10.1007/s11263-023-01948-x
[12]	Y. Rao, D. Wu, M. Han, T. Wang, Y. Yang, T. Lei, C. Zhou, H. Bai, and L. Xing, “AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion,” Inf. Fusion, vol. 92, pp. 336–349, Apr. 2023. doi: 10.1016/j.inffus.2022.12.007
[13]	H. Li, J. Zhao, J. Li, Z. Yu, and G. Lu, “Feature dynamic alignment and refinement for infrared-visible image fusion: Translation robust fusion,” Inf. Fusion, vol. 95, pp. 26–41, Jul. 2023. doi: 10.1016/j.inffus.2023.02.011
[14]	J. Ma, H. Xu, J. Jiang, X. Mei, and X.-P. Zhang, “DDcGAN: A dual-discriminator conditional generative adversarial network for multi-resolution image fusion,” IEEE Trans. Image Process., vol. 29, pp. 4980–4995, Mar. 2020. doi: 10.1109/TIP.2020.2977573
[15]	H. Xu, X. Wang, and J. Ma, “DRF: Disentangled representation for visible and infrared image fusion,” IEEE Trans. Instrum. Meas., vol. 70, p. 5006713, Feb. 2021.
[16]	W. Tang, F. He, and Y. Liu, “YDTR: Infrared and visible image fusion via Y-shape dynamic transformer,” IEEE Trans. Multimedia, vol. 25, pp. 5413–5428, Jul. 2023. doi: 10.1109/TMM.2022.3192661
[17]	M. Han, K. Yu, J. Qiu, H. Li, D. Wu, Y. Rao, Y. Yang, L. Xing, H. Bai, and C. Zhou, “Boosting target-level infrared and visible image fusion with regional information coordination,” Inf. Fusion, vol. 92, pp. 268–288, Apr. 2023. doi: 10.1016/j.inffus.2022.12.005
[18]	J. Yue, L. Fang, S. Xia, Y. Deng, and J. Ma, “Dif-fusion: Toward high color fidelity in infrared and visible image fusion with diffusion models,” IEEE Trans. Image Process., vol. 32, pp. 5705–5720, Oct. 2023. doi: 10.1109/TIP.2023.3322046
[19]	X. Yi, L. Tang, H. Zhang, H. Xu, and J. Ma, “Diff-IF: Multi-modality image fusion via diffusion model with fusion knowledge prior,” Inf. Fusion, vol. 110, p. 102450, Oct. 2024. doi: 10.1016/j.inffus.2024.102450
[20]	Y. Liu, Y. Shi, F. Mu, J. Cheng, and X. Chen, “Glioma segmentation-oriented multi-modal MR image fusion with adversarial learning,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 8, pp. 1528–1531, Aug. 2022. doi: 10.1109/JAS.2022.105770
[21]	Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, “Image fusion with convolutional sparse representation,” IEEE Signal Process. Lett., vol. 23, no. 12, pp. 1882–1886, Dec. 2016. doi: 10.1109/LSP.2016.2618776
[22]	Y. Liu, C. Yu, J. Cheng, Z. J. Wang, and X. Chen, “MM-Net: A MixFormer-based multi-scale network for anatomical and functional image fusion,” IEEE Trans. Image Process., vol. 33, pp. 2197–2212, Mar. 2024. doi: 10.1109/TIP.2024.3374072
[23]	Z. Zhao, H. Bai, J. Zhang, Y. Zhang, S. Xu, Z. Lin, R. Timofte, and L. Van Gool, “CDDFuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 5906–5916.
[24]	R. Liu, Z. Liu, J. Liu, X. Fan, and Z. Luo, “A task-guided, implicitly-searched and meta-initialized deep model for image fusion,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 46, no. 10, pp. 6594–6609, Oct. 2024. doi: 10.1109/TPAMI.2024.3382308
[25]	J. Liu, X. Fan, Z. Huang, G. Wu, R. Liu, W. Zhong, and Z. Luo, “Target-aware dual adversarial learning and a multi-scenario multi-modality benchmark to fuse infrared and visible for object detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 5802–5811.
[26]	W. Zhao, S. Xie, F. Zhao, Y. He, and H. Lu, “MetaFusion: Infrared and visible image fusion via meta-feature embedding from object detection,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 13955–13965.
[27]	J. Liu, Z. Liu, G. Wu, L. Ma, R. Liu, W. Zhong, Z. Luo, and X. Fan, “Multi-interactive feature learning and a full-time multi-modality benchmark for image fusion and segmentation,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 8115–8124.
[28]	Z. Liu, J. Liu, G. Wu, L. Ma, X. Fan, and R. Liu, “Bi-level dynamic learning for jointly multi-modality image fusion and beyond,” in Proc. 32nd Int. Joint Conf. Artificial Intelligence, Macao, China, 2023, pp. 1240–1248.
[29]	J. Li, J. Chen, J. Liu, and H. Ma, “Learning a graph neural network with cross modality interaction for image fusion,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 4471–4479.
[30]	D. Wang, J. Liu, R. Liu, and X. Fan, “An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection,” Inf. Fusion, vol. 98, p. 101828, Oct. 2023. doi: 10.1016/j.inffus.2023.101828
[31]	Z. Zhao, S. Xu, J. Zhang, C. Liang, C. Zhang, and J. Liu, “Efficient and model-based infrared and visible image fusion via algorithm unrolling,” IEEE Trans. Circuits Syst. Video Technol., vol. 32, no. 3, pp. 1186–1196, Mar. 2022. doi: 10.1109/TCSVT.2021.3075745
[32]	R. Nie, C. Ma, J. Cao, H. Ding, and D. Zhou, “A total variation with joint norms for infrared and visible image fusion,” IEEE Trans. Multimedia, vol. 24, pp. 1460–1472, Mar. 2022. doi: 10.1109/TMM.2021.3065496
[33]	R. Liu, Z. Liu, J. Liu, and X. Fan, “Searching a hierarchically aggregated fusion architecture for fast multi-modality image fusion,” in Proc. 29th ACM Int. Conf. Multimedia, China, 2021, pp. 1600–1608.
[34]	X. Tian, W. Zhang, D. Yu, and J. Ma, “Sparse tensor prior for hyperspectral, multispectral, and panchromatic image fusion,” IEEE/CAA J. Autom. Sinica, vol. 10, no. 1, pp. 284–286, Jan. 2023. doi: 10.1109/JAS.2022.106013
[35]	J. Ma, L. Tang, F. Fan, J. Huang, X. Mei, and Y. Ma, “SwinFusion: Cross-domain long-range learning for general image fusion via swin transformer,” IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200–1217, Jul. 2022. doi: 10.1109/JAS.2022.105686
[36]	X. Yi, H. Xu, H. Zhang, L. Tang, and J. Ma, “Text-IF: Leveraging semantic text guidance for degradation-aware and interactive image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2024, pp. 27016–27025.
[37]	D. Wu, M. Han, Y. Yang, S. Zhao, Y. Rao, H. Li, X. Lin, C. Zhou, and H. Bai, “DCFusion: A dual-frequency cross-enhanced fusion network for infrared and visible image fusion,” IEEE Trans. Instrum. Meas., vol. 72, p. 5011815, Apr. 2023.
[38]	H. Li and X.-J. Wu, “DenseFuse: A fusion approach to infrared and visible images,” IEEE Trans. Image Process., vol. 28, no. 5, pp. 2614–2623, May 2019. doi: 10.1109/TIP.2018.2887342
[39]	H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, “U2Fusion: A unified unsupervised image fusion network,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 44, no. 1, pp. 502–518, Jan. 2022. doi: 10.1109/TPAMI.2020.3012548
[40]	J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, “FusionGAN: A generative adversarial network for infrared and visible image fusion,” Inf. Fusion, vol. 48, pp. 11–26, Aug. 2019. doi: 10.1016/j.inffus.2018.09.004
[41]	D. Wang, J. Liu, X. Fan, and R. Liu, “Unsupervised misaligned infrared and visible image fusion via cross-modality image generation and registration,” in Proc. 31st Int. Joint Conf. Artificial Intelligence, Austria, 2022, pp. 3508–3515.
[42]	Z. Liu, J. Liu, B. Zhang, L. Ma, X. Fan, and R. Liu, “PAIF: Perception-aware infrared-visible image fusion for attack-tolerant semantic segmentation,” in Proc. 31st ACM Int. Conf. Multimedia, Ottawa, Canada, 2023, pp. 3706–3714.
[43]	A. Joulin, L. van der Maaten, A. Jabri, and N. Vasilache, “Learning visual features from large weakly supervised data,” in Proc. 14th European Conf. Computer Vision, Amsterdam, The Netherlands, 2016, pp. 67–84.
[44]	A. Li, A. Jabri, A. Joulin, and L. van der Maaten, “Learning visual N-grams from web data,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 4183–4192.
[45]	K. Desai and J. Johnson, “VirTex: Learning visual representations from textual annotations,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Nashville, USA, 2021, pp. 11162–11173.
[46]	P. Müller, G. Kaissis, C. Zou, and D. Rueckert, “Joint learning of localized representations from medical images and reports,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 685–701.
[47]	M. B. Sariyildiz, J. Perez, and D. Larlus, “Learning visual representations with caption annotations,” in Proc. 16th European Conf. Computer Vision, Glasgow, UK, 2020, pp. 153–170.
[48]	J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: A large-scale hierarchical image database,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Miami, USA, 2009, pp. 248–255.
[49]	A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervision,” in Proc. 38th Int. Conf. Machine Learning, 2021, pp. 8748–8763.
[50]	M. U. Khattak, H. Rasheed, M. Maaz, S. Khan, and F. S. Khan, “MaPLe: Multi-modal prompt learning,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Vancouver, Canada, 2023, pp. 19113–19122.
[51]	C. Ge, R. Huang, M. Xie, Z. Lai, S. Song, S. Li, and G. Huang, “Domain adaptation via prompt learning,” IEEE Trans. Neural Netw. Learn. Syst., 2023, doi: 10.1109/TNNLS.2023.3327962.
[52]	K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Conditional prompt learning for vision-language models,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, New Orleans, USA, 2022, pp. 16816–16825.
[53]	S. Ma, C.-W. Xie, Y. Wei, S. Sun, J. Fan, X. Bao, Y. Guo, and Y. Zheng, “Understanding the multi-modal prompts of the pre-trained vision-language model,” arXiv preprint arXiv: 2312.11570, 2024.
[54]	K. Zhou, J. Yang, C. C. Loy, and Z. Liu, “Learning to prompt for vision-language models,” Int. J. Comput. Vis., vol. 130, no. 9, pp. 2337–2348, Jul. 2022. doi: 10.1007/s11263-022-01653-1
[55]	Z. Zhao, L. Deng, H. Bai, Y. Cui, Z. Zhang, Y. Zhang, H. Qin, D. Chen, J. Zhang, P. Wang, and L. Van Gool, “Image fusion via vision-language model,” in Proc. 41st Int. Conf. Machine Learning, Vienna, Austria, 2024, pp. 60749–60765.
[56]	X. Li, Y. Zou, J. Liu, Z. Jiang, L. Ma, X. Fan, and R. Liu, “From text to pixels: A context-aware semantic synergy solution for infrared and visible image fusion,” arXiv preprint arXiv: 2401.00421, 2023.
[57]	O. Patashnik, Z. Wu, E. Shechtman, D. Cohen-Or, and D. Lischinski, “StyleCLIP: Text-driven manipulation of styleGAN imagery,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Montreal, Canada, 2021, pp. 2065–2074.
[58]	R. Gal, O. Patashnik, H. Maron, A. H. Bermano, G. Chechik, and D. Cohen-Or, “StyleGAN-NADA: CLIP-guided domain adaptation of image generators,” ACM Trans. Graphics, vol. 41, no. 4, p. 141, Jul. 2022.
[59]	M. Liu, L. Jiao, X. Liu, L. Li, F. Liu, and S. Yang, “C-CNN: Contourlet convolutional neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 6, pp. 2636–2649, Jun. 2021. doi: 10.1109/TNNLS.2020.3007412
[60]	L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, “PIAFusion: A progressive infrared and visible image fusion network based on illumination aware,” Inf. Fusion, vol. 83-84, pp. 79–92, Jul. 2022. doi: 10.1016/j.inffus.2022.03.007
[61]	A. Toet and M. A. Hogervorst, “Progress in color night vision,” Opt. Eng., vol. 51, no. 1, p. 010901, Feb. 2012. doi: 10.1117/1.OE.51.1.010901
[62]	H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, “FusionDN: A unified densely connected network for image fusion,” in Proc. 34th AAAI Conf. Artificial Intelligence, New York, USA, 2020, pp. 12484–12491.
[63]	Z. Zhao, H. Bai, J. Zhang, Y. Zhang, K. Zhang, S. Xu, D. Chen, R. Timofte, and L. Van Gool, “Equivariant multi-modality image fusion,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2024, pp. 25912–25921.
[64]	J. Liu, R. Lin, G. Wu, R. Liu, Z. Luo, and X. Fan, “CoCoNet: Coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion,” Int. J. Comput. Vis., vol. 132, no. 5, pp. 1748–1775, May. 2024. doi: 10.1007/s11263-023-01952-1
[65]	H. Li, T. Xu, X.-J. Wu, J. Lu, and J. Kittler, “LRRNet: A novel representation learning guided fusion network for infrared and visible images,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 11040–11052, Sep. 2023. doi: 10.1109/TPAMI.2023.3268209
[66]	Z. Zhao, H. Bai, Y. Zhu, J. Zhang, S. Xu, Y. Zhang, K. Zhang, D. Meng, R. Timofte, and L. Van Gool, “DDFM: Denoising diffusion model for multi-modality image fusion,” in Proc. IEEE/CVF Int. Conf. Computer Vision, Paris, France, 2023, pp. 8082–8093.
[67]	Z. Huang, J. Liu, X. Fan, R. Liu, W. Zhong, and Z. Luo, “ReCoNet: Recurrent correction network for fast and efficient multi-modality image fusion,” in Proc. 17th European Conf. Computer Vision, Tel Aviv, Israel, 2022, pp. 539–555.
[68]	J. Ma, Y. Ma, and C. Li, “Infrared and visible image fusion methods and applications: A survey,” Inf. Fusion, vol. 45, pp. 153–178, Jan. 2019. doi: 10.1016/j.inffus.2018.02.004
[69]	E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “SegFormer: Simple and efficient design for semantic segmentation with transformers,” Proc. 35th Int. Conf. Neural Information Processing Systems, 2021, pp. 924.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(13) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (283) PDF downloads(220)

Highlights

To break through the bottleneck of task-oriented fusion, we propose PromptFusion, a semantic-guided fusion method that leverages textual prompts to bridge the semantic gaps between modalities, improving machine perception while preserving visual fidelity
To perceive the modal-specific information, we introduce the contourlet autoencoder, a frequency-aware spectra encoder that decomposes and aggregates the low- and high-pass subbands from infrared and visible images to improve the multi-modality feature integration
For superior downstream task performance, we developed a two-stage prompt learning framework that uses task-specific design prompts to constrain the fusion process, accurately distinguishing the targets and scenes by learning typical characteristics of modalities
To tackle the challenges in jointly optimizing image fusion and prompt learning, we introduce a bi-level asymptotic convergence optimization method approximating the complex bi-level problem into single-level tasks to ensure efficient resolution using gradient descent

PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion

doi: 10.1109/JAS.2024.124878

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content