SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer

Jiayi Ma; Linfeng Tang; Fan Fan; Jun Huang; Xiaoguang Mei; Yong Ma

doi:10.1109/JAS.2022.105686

Volume 9 Issue 7

Jul. 2022

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2022 > 9(7): 1200-1217

Jiayi Ma, Linfeng Tang, Fan Fan, Jun Huang, Xiaoguang Mei, and Yong Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200-1217, Jul. 2022. doi: 10.1109/JAS.2022.105686

Citation:

Jiayi Ma, Linfeng Tang, Fan Fan, Jun Huang, Xiaoguang Mei, and Yong Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200-1217, Jul. 2022. doi: 10.1109/JAS.2022.105686

Citation:

PDF( 22803 KB)

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer

doi: 10.1109/JAS.2022.105686

Electronic Information School, Wuhan University, Wuhan 430072, China

Funds:

the National Natural Science Foundation of China 62075169

the National Natural Science Foundation of China 62003247

the National Natural Science Foundation of China 62061160370

the Key Research and Development Program of Hubei Province 2020BAB113

More Information

Author Bio:
Jiayi Ma (Senior Member, IEEE) received the B.S. degree in information and computing science and the Ph.D. degree in control science and engineering from the Huazhong University of Science and Technology, Wuhan, China, in 2008 and 2014, respectively. He is currently a Professor with the Electronic Information School, Wuhan University. He has authored or co-authored more than 200 refereed journal and conference papers, including IEEE TPAMI/TIP, IJCV, CVPR, ICCV, ECCV, etc. His research interests include computer vision, machine learning, and robotics. Dr. Ma has been identified in the 2019-2021 Highly Cited Researcher lists from the Web of Science Group. He is an Area Editor of Information Fusion and an Associate Editor of Neurocomputing

Linfeng Tang received the B.E. degree from the School of Computer Science and Engineering, Central South University, Changsha, China, in 2020. He is currently a Master student with the Electronic Information School, Wuhan University. His research interests include computer vision, machine learning, and pattern recognition

Fan Fan received the B.S. degree of Communication Engineering, and the Ph.D. degree of Electronic Circuit and System, both from the Huazhong University of Science and Technology, Wuhan, China, in 2009, and 2015, respectively. He is currently an Associate Professor in the Electronic Information School, Wuhan University, China. His current research interests include infrared thermal imaging, machine learning and computer vision

Jun Huang received the B.S. and Ph.D. degree from the Department of Electronic and Information Engineering, Huazhong University of Science and Technology, Wuhan, China, in 2008 and 2014 respectively. He is currently an Associate Professor at Electronic Information School in Wuhan University, China. His main research interest is infrared image and infrared spectrum processing

Xiaoguang Mei received the B.S. degree in communication engineering from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2007, the M.S. degree in communications and information systems from Central China Normal University, Wuhan, in 2011, and the Ph.D. degree in circuits and systems from the HUST, in 2016. From 2010 to 2012, he was a Software Engineer with the 722 Research Institute, China Shipbuilding Industry Corporation, Wuhan. He is currently an Associate Professor with Wuhan University. His research interests include hyperspectral image processing, image fusion and machine learning

Yong Ma graduated from the Department of Automatic Control, Beijing Institute of Technology, Beijing, China, in 1997. He received the Ph.D. degree from the Huazhong University of Science and Technology (HUST), Wuhan, China, in 2003. Between 2004 and 2006, he was a Lecturer at the University of the West of England, Bristol, U.K. Between 2006 and 2014, he was with the Wuhan National Laboratory for Optoelectronics, HUST, Wuhan, where he was a Professor of electronics. He is now a Professor with the Electronic Information School, Wuhan University. His general field of research is in signal and systems. His research interests include infrared image processing, pattern recognition, interface circuits to sensors and actuators
Corresponding author: Yong Ma, e-mail: mayong@whu.edu.cn
Recommended by Associate Editor Qing-Long Han.
Received Date: 2022-04-04
Revised Date: 2022-05-25
Accepted Date: 2022-06-10

Abstract

Abstract

This study proposes a novel general image fusion framework based on cross-domain long-range learning and Swin Transformer, termed as SwinFusion. On the one hand, an attention-guided cross-domain module is devised to achieve sufficient integration of complementary information and global interaction. More specifically, the proposed method involves an intra-domain fusion unit based on self-attention and an inter-domain fusion unit based on cross-attention, which mine and integrate long dependencies within the same domain and across domains. Through long-range dependency modeling, the network is able to fully implement domain-specific information extraction and cross-domain complementary information integration as well as maintaining the appropriate apparent intensity from a global perspective. In particular, we introduce the shifted windows mechanism into the self-attention and cross-attention, which allows our model to receive images with arbitrary sizes. On the other hand, the multi-scene image fusion problems are generalized to a unified framework with structure maintenance, detail preservation, and proper intensity control. Moreover, an elaborate loss function, consisting of SSIM loss, texture loss, and intensity loss, drives the network to preserve abundant texture details and structural information, as well as presenting optimal apparent intensity. Extensive experiments on both multi-modal image fusion and digital photography image fusion demonstrate the superiority of our SwinFusion compared to the state-of-the-art unified image fusion algorithms and task-specific alternatives. Implementation code and pre-trained weights can be accessed at https://github.com/Linfeng-Tang/SwinFusion.
- Cross-domain long-range learning,
- image fusion,
- Swin transformer

FullText(HTML)

Recommended by Associate Editor Qing-Long Han.

References(89)

References

[1]	H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective," Information Fusion, vol. 76, pp. 323–336, 2021. doi: 10.1016/j.inffus.2021.06.008
[2]	J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, "Stdfusionnet: An infrared and visible image fusion network based on salient target detection," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5009513, 2021.
[3]	M. Awad, A. Elliethy, and H. A. Aly, "Adaptive near-infrared and visible fusion for fast image enhancement," IEEE Transactions on Computational Imaging, vol. 6, pp. 408–418, 2019.
[4]	H. Xu and J. Ma, "Emfusion: An unsupervised enhanced medical image fusion network," Information Fusion, vol. 76, pp. 177–186, 2021. doi: 10.1016/j.inffus.2021.06.001
[5]	Y. Liu, X. Chen, J. Cheng, and H. Peng, "A medical image fusion method based on convolutional neural networks," in Proceedings of the International Conference on Information Fusion, 2017, pp. 1–7.
[6]	J. Ma, Z. Le, X. Tian, and J. Jiang, "Smfuse: Multi-focus image fusion via self-supervised mask-optimization," IEEE Transactions on Computational Imaging, vol. 7, pp. 309–320, 2021. doi: 10.1109/TCI.2021.3063872
[7]	L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28–42, 2022. doi: 10.1016/j.inffus.2021.12.004
[8]	G. Bhatnagar and Q. J. Wu, "A fractal dimension based framework for night vision fusion," IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 220–227, 2018. doi: 10.1109/JAS.2018.7511102
[9]	Y. Zhang and Q. Ji, "Active and dynamic information fusion for facial expression understanding from image sequences," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 699–714, 2005. doi: 10.1109/TPAMI.2005.93
[10]	H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2fusion: A unified unsupervised image fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2022. doi: 10.1109/TPAMI.2020.3012548
[11]	H. Xu, X. Wang, and J. Ma, "Drf: Disentangled representation for visible and infrared image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5006713, 2021.
[12]	Q. Han and C. Jung, "Deep selective fusion of visible and near-infrared images using unsupervised u-net," IEEE Transactions on Neural Networks and Learning Systems, 2022.
[13]	K. Ram Prabhakar, V. Sai Srikar, and R. Venkatesh Babu, "Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4714–4722.
[14]	Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
[15]	Y. Liu, S. Liu, and Z. Wang, "A general framework for image fusion based on multi-scale transform and sparse representation," Information Fusion, vol. 24, pp. 147–164, 2015. doi: 10.1016/j.inffus.2014.09.004
[16]	H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity." in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 797–12 804.
[17]	J. Ma, C. Chen, C. Li, and J. Huang, "Infrared and visible image fusion via gradient transfer and total variation minimization," Information Fusion, vol. 31, pp. 100–109, 2016. doi: 10.1016/j.inffus.2016.02.001
[18]	H. Li, X. -J. Wu, and J. Kitler, "Mdlatlrr: A novel decomposition method for infrared and visible image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4733–4746, 2020. doi: 10.1109/TIP.2020.2975984
[19]	W. Zhao, H. Lu, and D. Wang, "Multisensor image fusion and enhancement in spectral total variation domain," IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 866–879, 2017.
[20]	Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "Ifcnn: A general image fusion framework based on convolutional neural network," Information Fusion, vol. 54, pp. 99–118, 2020. doi: 10.1016/j.inffus.2019.07.011
[21]	F. Zhao, W. Zhao, L. Yao, and Y. Liu, "Self-supervised feature adaption for infrared and visible image fusion," Information Fusion, vol. 76, pp. 189–203, 2021. doi: 10.1016/j.inffus.2021.06.002
[22]	H. Li and X. -J. Wu, "Densefuse: A fusion approach to infrared and visible images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2019. doi: 10.1109/TIP.2018.2887342
[23]	H. Xu, H. Zhang, and J. Ma, "Classification saliency-based rule for visible and infrared image fusion," IEEE Transactions on Computational Imaging, vol. 7, pp. 824–836, 2021. doi: 10.1109/TCI.2021.3100986
[24]	J. Ma, H. Xu, J. Jiang, X. Mei, and X. -P. Zhang, "Ddcgan: A dualdiscriminator conditional generative adversarial network for multiresolution image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573
[25]	H. Xu, J. Ma, and X. -P. Zhang, "Mef-gan: Multi-exposure image fusion via generative adversarial networks," IEEE Transactions on Image Processing, vol. 29, pp. 7203–7216, 2020. doi: 10.1109/TIP.2020.2999855
[26]	J. Huang, Z. Le, Y. Ma, F. Fan, H. Zhang, and L. Yang, "Mgmdcgan: Medical image fusion using multi-generator multi-discriminator conditional generative adversarial network," IEEE Access, vol. 8, pp. 55 145–55 157, 2020. doi: 10.1109/ACCESS.2020.2982016
[27]	J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, "Infrared and visible image fusion based on target-enhanced multiscale transform decomposition," Information Sciences, vol. 508, pp. 64–78, 2020. doi: 10.1016/j.ins.2019.08.066
[28]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 1–11.
[29]	A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–12.
[30]	N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.
[31]	S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881–6890.
[32]	H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, "Pre-trained image processing transformer," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
[33]	J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, "Swinir: Image restoration using swin transformer," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
[34]	L. Qu, S. Liu, M. Wang, and Z. Song, "Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning," arXiv preprint arXiv: 2112.01030, 2021.
[35]	V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, "Image fusion transformer," arXiv preprint arXiv: 2107.09011, 2021.
[36]	D. Rao, X. -J. Wu, and T. Xu, "Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network," arXiv preprint arXiv: 2201.10147, 2022.
[37]	Y. Fu, T. Xu, X. Wu, and J. Kittler, "Ppt fusion: Pyramid patch transformerfor a case study in image fusion," arXiv preprint arXiv: 2107.13967, 2021.
[38]	Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
[39]	Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Medical image fusion via convolutional sparsity based morphological component analysis," IEEE Signal Processing Letters, vol. 26, no. 3, pp. 485–489, 2019. doi: 10.1109/LSP.2019.2895749
[40]	Y. Liu, S. Liu, and Z. Wang, "Multi-focus image fusion with dense sift," Information Fusion, vol. 23, pp. 139–155, 2015. doi: 10.1016/j.inffus.2014.05.004
[41]	K. Ma, H. Li, H. Yong, Z. Wang, D. Meng, and L. Zhang, "Robust multi-exposure image fusion: a structural patch decomposition approach," IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2519–2532, 2017. doi: 10.1109/TIP.2017.2671921
[42]	H. Li, L. Li, and J. Zhang, "Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering," Optics Communications, vol. 342, pp. 1–11, 2015. doi: 10.1016/j.optcom.2014.12.048
[43]	H. Li, X. -J. Wu, and J. Kittler, "Infrared and visible image fusion using a deep learning framework," in Proceedings of the International Conference on Pattern Recognition, 2018, pp. 2705–2710.
[44]	Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
[45]	H. Ma, Q. Liao, J. Zhang, S. Liu, and J. -H. Xue, "An α-matte boundary defocus model-based cascaded network for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 8668–8679, 2020. doi: 10.1109/TIP.2020.3018261
[46]	J. Li, X. Guo, G. Lu, B. Zhang, Y. Xu, F. Wu, and D. Zhang, "Drpl: Deep regression pair learning for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4816–4831, 2020. doi: 10.1109/TIP.2020.2976190
[47]	F. Zhao, W. Zhao, H. Lu, Y. Liu, L. Yao, and Y. Liu, "Depth-distilled multi-focus image fusion," IEEE Transactions on Multimedia, 2021.
[48]	W. Zhao, B. Zheng, Q. Lin, and H. Lu, "Enhancing diversity of defocus blur detectors via cross-ensemble network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8905–8913.
[49]	W. Zhao, X. Hou, Y. He, and H. Lu, "Defocus blur detection via boosting diversity of deep ensemble networks," IEEE Transactions on Image Processing, vol. 30, pp. 5426–5438, 2021. doi: 10.1109/TIP.2021.3084101
[50]	D. Han, L. Li, X. Guo, and J. Ma, "Multi-exposure image fusion via deep perceptual enhancement," Information Fusion, vol. 79, pp. 248–262, 2022. doi: 10.1016/j.inffus.2021.10.006
[51]	Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, "Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion," Information Fusion, vol. 69, pp. 128–141, 2021. doi: 10.1016/j.inffus.2020.11.009
[52]	H. Li, X. -J. Wu, and T. Durrani, "Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645–9656, 2020. doi: 10.1109/TIM.2020.3005230
[53]	H. Li, X. -J. Wu, and J. Kittler, "Rfn-nest: An end-to-end residual fusion network for infrared and visible images," Information Fusion, vol. 73, pp. 720–86, 2021.
[54]	L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, "Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 20151911, 2021.
[55]	J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "Fusiongan: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004
[56]	H. Zhang, Z. Le, Z. Shao, H. Xu, and J. Ma, "Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion," Information Fusion, vol. 66, pp. 40–53, 2021. doi: 10.1016/j.inffus.2020.08.022
[57]	J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, "Pan-gan: An unsupervised pan-sharpening method for remote sensing image fusion," Information Fusion, vol. 62, pp. 110–120, 2020. doi: 10.1016/j.inffus.2020.04.006
[58]	J. Li, H. Huo, C. Li, R. Wang, C. Sui, and Z. Liu, "Multigrained attention network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5002412, 2021.
[59]	J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, "Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks," IEEE Transactions on Multimedia, vol. 23, pp. 1383–1396, 2020.
[60]	H. Zhang and J. Ma, "Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion," International Journal of Computer Vision, vol. 129, no. 10, pp. 2761–2785, 2021. doi: 10.1007/s11263-021-01501-8
[61]	F. Zhao and W. Zhao, "Learning specific and general realm feature representations for image fusion," IEEE Transactions on Multimedia, vol. 23, pp. 2745–2756, 2020.
[62]	H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 484–12 491.
[63]	A. Srinivas, T. -Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, "Bottleneck transformers for visual recognition," in Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, 2021, pp. 16 519–16 529.
[64]	M. Chen, H. Peng, J. Fu, and H. Ling, "Autoformer: Searching transformers for visual recognition," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 12 270–12 280.
[65]	X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–11.
[66]	Z. Sun, S. Cao, Y. Yang, and K. M. Kitani, "Rethinking transformerbased set prediction for object detection," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 3611–3620.
[67]	P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "Transtrack: Multiple object tracking with transformer," arXiv preprint arXiv: 2012.15460, 2020.
[68]	X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, "Transformer tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8126–8135.
[69]	L. Lin, H. Fan, Y. Xu, and H. Ling, "Swintrack: A simple and strong baseline for transformer tracking," arXiv preprint arXiv: 2112.00995, 2021.
[70]	H. Wang, Y. Zhu, H. Adam, A. Yuille, and L. -C. Chen, "Maxdeeplab: End-to-end panoptic segmentation with mask transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5463–5474.
[71]	F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, "Learning texture transformer network for image super-resolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5791–5800.
[72]	H. Zhao and R. Nie, "Dndt: Infrared and visible image fusion via densenet and dual-transformer," in Proceedings of the International Conference on Information Technology and Biomedical Engineering, 2021, pp. 71–75.
[73]	J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, "Cgtf: Convolution-guided transformer for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 71, p. 5012314, 2022.
[74]	T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick, "Early convolutions help transformers see better," in Advances in Neural Information Processing Systems, 2021, pp. 30 392–30 400.
[75]	Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. doi: 10.1109/TIP.2003.819861
[76]	L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "Piafusion: A progressive infrared and visible image fusion network based on illumination aware," Information Fusion, vol. 83, pp. 79–92, 2022.
[77]	Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes," in Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2017, pp. 5108–5115.
[78]	M. Brown and S. Süsstrunk, "Multi-spectral sift for scene category recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177–184.
[79]	J. Cai, S. Gu, and L. Zhang, "Learning a deep single image contrast enhancer from multi-exposure images," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018. doi: 10.1109/TIP.2018.2794218
[80]	X. Zhang, "Benchmarking and comparing multi-exposure image fusion algorithms," Information Fusion, vol. 74, pp. 111–131, 2021. doi: 10.1016/j.inffus.2021.02.005
[81]	M. Nejati, S. Samavi, and S. Shirani, "Multi-focus image fusion using dictionary-based sparse representation," Information Fusion, vol. 25, pp. 72–84, 2015. doi: 10.1016/j.inffus.2014.10.004
[82]	J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, "Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5005014, 2021.
[83]	K. Ma, Z. Duanmu, H. Zhu, Y. Fang, and Z. Wang, "Deep guided learning for fast multi-exposure image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 2808–2819, 2020. doi: 10.1109/TIP.2019.2952716
[84]	M. B. A. Haghighat, A. Aghagolzadeh, and H. Seyedarabi, "A nonreference image fusion metric based on mutual information of image features," Computers & Electrical Engineering, vol. 37, no. 5, pp. 744–756, 2011.
[85]	A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, high-performance deep learning library," in Advances in Neural Information Processing Systems, 2019, pp. 8026–8037.
[86]	C. Peng, T. Tian, C. Chen, X. Guo, and J. Ma, "Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation," Neural Networks, vol. 137, pp. 188–199, 2021. doi: 10.1016/j.neunet.2021.01.021
[87]	J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
[88]	H. Zhang, J. Yuan, X. Tian, and J. Ma, "GAN-FM: Infrared and visible image fusion using gan with full-scale skip connection and dual Markovian discriminators," IEEE Transactions on Computational Imaging, vol. 7, pp. 1134–1147, 2021. doi: 10.1109/TCI.2021.3119954
[89]	S. F. Bhat, I. Alhashim, and P. Wonka, "Adabins: Depth estimation using adaptive bins," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4009–4018.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(17) / Tables(6)

Get Citation

PDF

XML

Article Metrics

Article views (4221) PDF downloads(1256)

SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer

doi: 10.1109/JAS.2022.105686

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content