IEEE/CAA Journal of Automatica Sinica
Citation: | Jiayi Ma, Linfeng Tang, Fan Fan, Jun Huang, Xiaoguang Mei, and Yong Ma, "SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer," IEEE/CAA J. Autom. Sinica, vol. 9, no. 7, pp. 1200-1217, Jul. 2022. doi: 10.1109/JAS.2022.105686 |
[1] |
H. Zhang, H. Xu, X. Tian, J. Jiang, and J. Ma, "Image fusion meets deep learning: A survey and perspective," Information Fusion, vol. 76, pp. 323–336, 2021. doi: 10.1016/j.inffus.2021.06.008
|
[2] |
J. Ma, L. Tang, M. Xu, H. Zhang, and G. Xiao, "Stdfusionnet: An infrared and visible image fusion network based on salient target detection," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5009513, 2021.
|
[3] |
M. Awad, A. Elliethy, and H. A. Aly, "Adaptive near-infrared and visible fusion for fast image enhancement," IEEE Transactions on Computational Imaging, vol. 6, pp. 408–418, 2019.
|
[4] |
H. Xu and J. Ma, "Emfusion: An unsupervised enhanced medical image fusion network," Information Fusion, vol. 76, pp. 177–186, 2021. doi: 10.1016/j.inffus.2021.06.001
|
[5] |
Y. Liu, X. Chen, J. Cheng, and H. Peng, "A medical image fusion method based on convolutional neural networks," in Proceedings of the International Conference on Information Fusion, 2017, pp. 1–7.
|
[6] |
J. Ma, Z. Le, X. Tian, and J. Jiang, "Smfuse: Multi-focus image fusion via self-supervised mask-optimization," IEEE Transactions on Computational Imaging, vol. 7, pp. 309–320, 2021. doi: 10.1109/TCI.2021.3063872
|
[7] |
L. Tang, J. Yuan, and J. Ma, "Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network," Information Fusion, vol. 82, pp. 28–42, 2022. doi: 10.1016/j.inffus.2021.12.004
|
[8] |
G. Bhatnagar and Q. J. Wu, "A fractal dimension based framework for night vision fusion," IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 220–227, 2018. doi: 10.1109/JAS.2018.7511102
|
[9] |
Y. Zhang and Q. Ji, "Active and dynamic information fusion for facial expression understanding from image sequences," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no. 5, pp. 699–714, 2005. doi: 10.1109/TPAMI.2005.93
|
[10] |
H. Xu, J. Ma, J. Jiang, X. Guo, and H. Ling, "U2fusion: A unified unsupervised image fusion network," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 1, pp. 502–518, 2022. doi: 10.1109/TPAMI.2020.3012548
|
[11] |
H. Xu, X. Wang, and J. Ma, "Drf: Disentangled representation for visible and infrared image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5006713, 2021.
|
[12] |
Q. Han and C. Jung, "Deep selective fusion of visible and near-infrared images using unsupervised u-net," IEEE Transactions on Neural Networks and Learning Systems, 2022.
|
[13] |
K. Ram Prabhakar, V. Sai Srikar, and R. Venkatesh Babu, "Deepfuse: A deep unsupervised approach for exposure fusion with extreme exposure image pairs," in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 4714–4722.
|
[14] |
Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
|
[15] |
Y. Liu, S. Liu, and Z. Wang, "A general framework for image fusion based on multi-scale transform and sparse representation," Information Fusion, vol. 24, pp. 147–164, 2015. doi: 10.1016/j.inffus.2014.09.004
|
[16] |
H. Zhang, H. Xu, Y. Xiao, X. Guo, and J. Ma, "Rethinking the image fusion: A fast unified image fusion network based on proportional maintenance of gradient and intensity." in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 797–12 804.
|
[17] |
J. Ma, C. Chen, C. Li, and J. Huang, "Infrared and visible image fusion via gradient transfer and total variation minimization," Information Fusion, vol. 31, pp. 100–109, 2016. doi: 10.1016/j.inffus.2016.02.001
|
[18] |
H. Li, X. -J. Wu, and J. Kitler, "Mdlatlrr: A novel decomposition method for infrared and visible image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4733–4746, 2020. doi: 10.1109/TIP.2020.2975984
|
[19] |
W. Zhao, H. Lu, and D. Wang, "Multisensor image fusion and enhancement in spectral total variation domain," IEEE Transactions on Multimedia, vol. 20, no. 4, pp. 866–879, 2017.
|
[20] |
Y. Zhang, Y. Liu, P. Sun, H. Yan, X. Zhao, and L. Zhang, "Ifcnn: A general image fusion framework based on convolutional neural network," Information Fusion, vol. 54, pp. 99–118, 2020. doi: 10.1016/j.inffus.2019.07.011
|
[21] |
F. Zhao, W. Zhao, L. Yao, and Y. Liu, "Self-supervised feature adaption for infrared and visible image fusion," Information Fusion, vol. 76, pp. 189–203, 2021. doi: 10.1016/j.inffus.2021.06.002
|
[22] |
H. Li and X. -J. Wu, "Densefuse: A fusion approach to infrared and visible images," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2614–2623, 2019. doi: 10.1109/TIP.2018.2887342
|
[23] |
H. Xu, H. Zhang, and J. Ma, "Classification saliency-based rule for visible and infrared image fusion," IEEE Transactions on Computational Imaging, vol. 7, pp. 824–836, 2021. doi: 10.1109/TCI.2021.3100986
|
[24] |
J. Ma, H. Xu, J. Jiang, X. Mei, and X. -P. Zhang, "Ddcgan: A dualdiscriminator conditional generative adversarial network for multiresolution image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4980–4995, 2020. doi: 10.1109/TIP.2020.2977573
|
[25] |
H. Xu, J. Ma, and X. -P. Zhang, "Mef-gan: Multi-exposure image fusion via generative adversarial networks," IEEE Transactions on Image Processing, vol. 29, pp. 7203–7216, 2020. doi: 10.1109/TIP.2020.2999855
|
[26] |
J. Huang, Z. Le, Y. Ma, F. Fan, H. Zhang, and L. Yang, "Mgmdcgan: Medical image fusion using multi-generator multi-discriminator conditional generative adversarial network," IEEE Access, vol. 8, pp. 55 145–55 157, 2020. doi: 10.1109/ACCESS.2020.2982016
|
[27] |
J. Chen, X. Li, L. Luo, X. Mei, and J. Ma, "Infrared and visible image fusion based on target-enhanced multiscale transform decomposition," Information Sciences, vol. 508, pp. 64–78, 2020. doi: 10.1016/j.ins.2019.08.066
|
[28] |
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, "Attention is all you need," in Advances in Neural Information Processing Systems, 2017, pp. 1–11.
|
[29] |
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., "An image is worth 16x16 words: Transformers for image recognition at scale," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–12.
|
[30] |
N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, "End-to-end object detection with transformers," in Proceedings of the European Conference on Computer Vision, 2020, pp. 213–229.
|
[31] |
S. Zheng, J. Lu, H. Zhao, X. Zhu, Z. Luo, Y. Wang, Y. Fu, J. Feng, T. Xiang, P. H. Torr et al., "Rethinking semantic segmentation from a sequence-to-sequence perspective with transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 6881–6890.
|
[32] |
H. Chen, Y. Wang, T. Guo, C. Xu, Y. Deng, Z. Liu, S. Ma, C. Xu, C. Xu, and W. Gao, "Pre-trained image processing transformer," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 12 299–12 310.
|
[33] |
J. Liang, J. Cao, G. Sun, K. Zhang, L. Van Gool, and R. Timofte, "Swinir: Image restoration using swin transformer," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 1833–1844.
|
[34] |
L. Qu, S. Liu, M. Wang, and Z. Song, "Transmef: A transformer-based multi-exposure image fusion framework using self-supervised multi-task learning," arXiv preprint arXiv: 2112.01030, 2021.
|
[35] |
V. VS, J. M. J. Valanarasu, P. Oza, and V. M. Patel, "Image fusion transformer," arXiv preprint arXiv: 2107.09011, 2021.
|
[36] |
D. Rao, X. -J. Wu, and T. Xu, "Tgfuse: An infrared and visible image fusion approach based on transformer and generative adversarial network," arXiv preprint arXiv: 2201.10147, 2022.
|
[37] |
Y. Fu, T. Xu, X. Wu, and J. Kittler, "Ppt fusion: Pyramid patch transformerfor a case study in image fusion," arXiv preprint arXiv: 2107.13967, 2021.
|
[38] |
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, and B. Guo, "Swin transformer: Hierarchical vision transformer using shifted windows," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 10 012–10 022.
|
[39] |
Y. Liu, X. Chen, R. K. Ward, and Z. J. Wang, "Medical image fusion via convolutional sparsity based morphological component analysis," IEEE Signal Processing Letters, vol. 26, no. 3, pp. 485–489, 2019. doi: 10.1109/LSP.2019.2895749
|
[40] |
Y. Liu, S. Liu, and Z. Wang, "Multi-focus image fusion with dense sift," Information Fusion, vol. 23, pp. 139–155, 2015. doi: 10.1016/j.inffus.2014.05.004
|
[41] |
K. Ma, H. Li, H. Yong, Z. Wang, D. Meng, and L. Zhang, "Robust multi-exposure image fusion: a structural patch decomposition approach," IEEE Transactions on Image Processing, vol. 26, no. 5, pp. 2519–2532, 2017. doi: 10.1109/TIP.2017.2671921
|
[42] |
H. Li, L. Li, and J. Zhang, "Multi-focus image fusion based on sparse feature matrix decomposition and morphological filtering," Optics Communications, vol. 342, pp. 1–11, 2015. doi: 10.1016/j.optcom.2014.12.048
|
[43] |
H. Li, X. -J. Wu, and J. Kittler, "Infrared and visible image fusion using a deep learning framework," in Proceedings of the International Conference on Pattern Recognition, 2018, pp. 2705–2710.
|
[44] |
Y. Liu, X. Chen, H. Peng, and Z. Wang, "Multi-focus image fusion with a deep convolutional neural network," Information Fusion, vol. 36, pp. 191–207, 2017. doi: 10.1016/j.inffus.2016.12.001
|
[45] |
H. Ma, Q. Liao, J. Zhang, S. Liu, and J. -H. Xue, "An α-matte boundary defocus model-based cascaded network for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 8668–8679, 2020. doi: 10.1109/TIP.2020.3018261
|
[46] |
J. Li, X. Guo, G. Lu, B. Zhang, Y. Xu, F. Wu, and D. Zhang, "Drpl: Deep regression pair learning for multi-focus image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 4816–4831, 2020. doi: 10.1109/TIP.2020.2976190
|
[47] |
F. Zhao, W. Zhao, H. Lu, Y. Liu, L. Yao, and Y. Liu, "Depth-distilled multi-focus image fusion," IEEE Transactions on Multimedia, 2021.
|
[48] |
W. Zhao, B. Zheng, Q. Lin, and H. Lu, "Enhancing diversity of defocus blur detectors via cross-ensemble network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019, pp. 8905–8913.
|
[49] |
W. Zhao, X. Hou, Y. He, and H. Lu, "Defocus blur detection via boosting diversity of deep ensemble networks," IEEE Transactions on Image Processing, vol. 30, pp. 5426–5438, 2021. doi: 10.1109/TIP.2021.3084101
|
[50] |
D. Han, L. Li, X. Guo, and J. Ma, "Multi-exposure image fusion via deep perceptual enhancement," Information Fusion, vol. 79, pp. 248–262, 2022. doi: 10.1016/j.inffus.2021.10.006
|
[51] |
Y. Long, H. Jia, Y. Zhong, Y. Jiang, and Y. Jia, "Rxdnfuse: A aggregated residual dense network for infrared and visible image fusion," Information Fusion, vol. 69, pp. 128–141, 2021. doi: 10.1016/j.inffus.2020.11.009
|
[52] |
H. Li, X. -J. Wu, and T. Durrani, "Nestfuse: An infrared and visible image fusion architecture based on nest connection and spatial/channel attention models," IEEE Transactions on Instrumentation and Measurement, vol. 69, no. 12, pp. 9645–9656, 2020. doi: 10.1109/TIM.2020.3005230
|
[53] |
H. Li, X. -J. Wu, and J. Kittler, "Rfn-nest: An end-to-end residual fusion network for infrared and visible images," Information Fusion, vol. 73, pp. 720–86, 2021.
|
[54] |
L. Jian, X. Yang, Z. Liu, G. Jeon, M. Gao, and D. Chisholm, "Sedrfuse: A symmetric encoder–decoder with residual block network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 20151911, 2021.
|
[55] |
J. Ma, W. Yu, P. Liang, C. Li, and J. Jiang, "Fusiongan: A generative adversarial network for infrared and visible image fusion," Information Fusion, vol. 48, pp. 11–26, 2019. doi: 10.1016/j.inffus.2018.09.004
|
[56] |
H. Zhang, Z. Le, Z. Shao, H. Xu, and J. Ma, "Mff-gan: An unsupervised generative adversarial network with adaptive and gradient joint constraints for multi-focus image fusion," Information Fusion, vol. 66, pp. 40–53, 2021. doi: 10.1016/j.inffus.2020.08.022
|
[57] |
J. Ma, W. Yu, C. Chen, P. Liang, X. Guo, and J. Jiang, "Pan-gan: An unsupervised pan-sharpening method for remote sensing image fusion," Information Fusion, vol. 62, pp. 110–120, 2020. doi: 10.1016/j.inffus.2020.04.006
|
[58] |
J. Li, H. Huo, C. Li, R. Wang, C. Sui, and Z. Liu, "Multigrained attention network for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5002412, 2021.
|
[59] |
J. Li, H. Huo, C. Li, R. Wang, and Q. Feng, "Attentionfgan: Infrared and visible image fusion using attention-based generative adversarial networks," IEEE Transactions on Multimedia, vol. 23, pp. 1383–1396, 2020.
|
[60] |
H. Zhang and J. Ma, "Sdnet: A versatile squeeze-and-decomposition network for real-time image fusion," International Journal of Computer Vision, vol. 129, no. 10, pp. 2761–2785, 2021. doi: 10.1007/s11263-021-01501-8
|
[61] |
F. Zhao and W. Zhao, "Learning specific and general realm feature representations for image fusion," IEEE Transactions on Multimedia, vol. 23, pp. 2745–2756, 2020.
|
[62] |
H. Xu, J. Ma, Z. Le, J. Jiang, and X. Guo, "Fusiondn: A unified densely connected network for image fusion," in Proceedings of the AAAI Conference on Artificial Intelligence, 2020, pp. 12 484–12 491.
|
[63] |
A. Srinivas, T. -Y. Lin, N. Parmar, J. Shlens, P. Abbeel, and A. Vaswani, "Bottleneck transformers for visual recognition," in Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, 2021, pp. 16 519–16 529.
|
[64] |
M. Chen, H. Peng, J. Fu, and H. Ling, "Autoformer: Searching transformers for visual recognition," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 12 270–12 280.
|
[65] |
X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, "Deformable detr: Deformable transformers for end-to-end object detection," in Proceedings of the International Conference on Learning Representations, 2020, pp. 1–11.
|
[66] |
Z. Sun, S. Cao, Y. Yang, and K. M. Kitani, "Rethinking transformerbased set prediction for object detection," in Proceedings of the IEEE International Conference on Computer Vision, 2021, pp. 3611–3620.
|
[67] |
P. Sun, J. Cao, Y. Jiang, R. Zhang, E. Xie, Z. Yuan, C. Wang, and P. Luo, "Transtrack: Multiple object tracking with transformer," arXiv preprint arXiv: 2012.15460, 2020.
|
[68] |
X. Chen, B. Yan, J. Zhu, D. Wang, X. Yang, and H. Lu, "Transformer tracking," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 8126–8135.
|
[69] |
L. Lin, H. Fan, Y. Xu, and H. Ling, "Swintrack: A simple and strong baseline for transformer tracking," arXiv preprint arXiv: 2112.00995, 2021.
|
[70] |
H. Wang, Y. Zhu, H. Adam, A. Yuille, and L. -C. Chen, "Maxdeeplab: End-to-end panoptic segmentation with mask transformers," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 5463–5474.
|
[71] |
F. Yang, H. Yang, J. Fu, H. Lu, and B. Guo, "Learning texture transformer network for image super-resolution," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2020, pp. 5791–5800.
|
[72] |
H. Zhao and R. Nie, "Dndt: Infrared and visible image fusion via densenet and dual-transformer," in Proceedings of the International Conference on Information Technology and Biomedical Engineering, 2021, pp. 71–75.
|
[73] |
J. Li, J. Zhu, C. Li, X. Chen, and B. Yang, "Cgtf: Convolution-guided transformer for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 71, p. 5012314, 2022.
|
[74] |
T. Xiao, P. Dollar, M. Singh, E. Mintun, T. Darrell, and R. Girshick, "Early convolutions help transformers see better," in Advances in Neural Information Processing Systems, 2021, pp. 30 392–30 400.
|
[75] |
Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, "Image quality assessment: from error visibility to structural similarity," IEEE Transactions on Image Processing, vol. 13, no. 4, pp. 600–612, 2004. doi: 10.1109/TIP.2003.819861
|
[76] |
L. Tang, J. Yuan, H. Zhang, X. Jiang, and J. Ma, "Piafusion: A progressive infrared and visible image fusion network based on illumination aware," Information Fusion, vol. 83, pp. 79–92, 2022.
|
[77] |
Q. Ha, K. Watanabe, T. Karasawa, Y. Ushiku, and T. Harada, "Mfnet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes," in Proceedings of the IEEE International Conference on Intelligent Robots and Systems, 2017, pp. 5108–5115.
|
[78] |
M. Brown and S. Süsstrunk, "Multi-spectral sift for scene category recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 177–184.
|
[79] |
J. Cai, S. Gu, and L. Zhang, "Learning a deep single image contrast enhancer from multi-exposure images," IEEE Transactions on Image Processing, vol. 27, no. 4, pp. 2049–2062, 2018. doi: 10.1109/TIP.2018.2794218
|
[80] |
X. Zhang, "Benchmarking and comparing multi-exposure image fusion algorithms," Information Fusion, vol. 74, pp. 111–131, 2021. doi: 10.1016/j.inffus.2021.02.005
|
[81] |
M. Nejati, S. Samavi, and S. Shirani, "Multi-focus image fusion using dictionary-based sparse representation," Information Fusion, vol. 25, pp. 72–84, 2015. doi: 10.1016/j.inffus.2014.10.004
|
[82] |
J. Ma, H. Zhang, Z. Shao, P. Liang, and H. Xu, "Ganmcc: A generative adversarial network with multiclassification constraints for infrared and visible image fusion," IEEE Transactions on Instrumentation and Measurement, vol. 70, p. 5005014, 2021.
|
[83] |
K. Ma, Z. Duanmu, H. Zhu, Y. Fang, and Z. Wang, "Deep guided learning for fast multi-exposure image fusion," IEEE Transactions on Image Processing, vol. 29, pp. 2808–2819, 2020. doi: 10.1109/TIP.2019.2952716
|
[84] |
M. B. A. Haghighat, A. Aghagolzadeh, and H. Seyedarabi, "A nonreference image fusion metric based on mutual information of image features," Computers & Electrical Engineering, vol. 37, no. 5, pp. 744–756, 2011.
|
[85] |
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga et al., "Pytorch: An imperative style, high-performance deep learning library," in Advances in Neural Information Processing Systems, 2019, pp. 8026–8037.
|
[86] |
C. Peng, T. Tian, C. Chen, X. Guo, and J. Ma, "Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation," Neural Networks, vol. 137, pp. 188–199, 2021. doi: 10.1016/j.neunet.2021.01.021
|
[87] |
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.
|
[88] |
H. Zhang, J. Yuan, X. Tian, and J. Ma, "GAN-FM: Infrared and visible image fusion using gan with full-scale skip connection and dual Markovian discriminators," IEEE Transactions on Computational Imaging, vol. 7, pp. 1134–1147, 2021. doi: 10.1109/TCI.2021.3119954
|
[89] |
S. F. Bhat, I. Alhashim, and P. Wonka, "Adabins: Depth estimation using adaptive bins," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021, pp. 4009–4018.
|