MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism

Ke Zhang; Yukun Su; Xiwang Guo; Liang Qi; Zhenbing Zhao

doi:10.1109/JAS.2020.1003390

Volume 8 Issue 9

Sep. 2021

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2021 > 8(9): 1614-1626

K. Zhang, Y. K. Su, X. W. Guo, L. Qi, and Z. B. Zhao, "MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1614-1626, Sep. 2021. doi: 10.1109/JAS.2020.1003390

Citation:

K. Zhang, Y. K. Su, X. W. Guo, L. Qi, and Z. B. Zhao, "MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism," IEEE/CAA J. Autom. Sinica, vol. 8, no. 9, pp. 1614-1626, Sep. 2021. doi: 10.1109/JAS.2020.1003390

Citation:

PDF( 1837 KB)

MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism

doi: 10.1109/JAS.2020.1003390

Ke Zhang^{1, 2
,},
Yukun Su^{1, 2
,},
Xiwang Guo^{3, 4
,
,},
Liang Qi^5
,,
Zhenbing Zhao^{1, 2
,}

1.
Department of Electronic and Communication Engineering, North China Electric Power University, Hebei 071003, China
2.
Department of Hebei Key Laboratory of Power Internet of Things Technology, Hebei 071003, China
3.
Computer and Communication Engineering College, Liaoning Shihua University, Fushun 113001, China
4.
Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark NJ 07102, USA
5.
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266590, China

Funds: This work was supported in part by the National Natural Science Foundation of China (NSFC) (62076093, 61871182, 61302163, 61401154), the Beijing Natural Science Foundation (4192055), the Natural Science Foundation of Hebei Province of China (F2015502062, F2016502101, F2017502016), the Fundamental Research Funds for the Central Universities (2020YJ006, 2020MS099), and the Open Project Program of the National Laboratory of Pattern Recognition (NLPR) (201900051)

More Information

Author Bio:
Ke Zhang received the M. E. degree in signal and information processing from North China Electric Power University, Baoding, China in 2006, and the Ph. D. degree in signal and information processing from the Beijing University of Posts and Telecommunications, Beijing, China in 2012. He finished his Post Doctor in computer vision from the University of Missouri, Columbia, MO, USA in 2016. He is currently an Associate Professor with North China Electric Power University, China. His research interests include computer vision, deep learning, machine learning, robot navigation, natural language processing, and spatial relation description

Yukun Su received the B. S. degree in electronic information science and technology from North China Electric Power University, Baoding, China in 2018, He is currently a master student in communication and information engineering, at North China Electric Power University, China. His research interests include computer vision and generative adversarial networks

Xiwang Guo received the B. S. degree in computer science and technology from Shenyang Institute of Engineering, Shenyang, China in 2006, the M. S. degree in aeronautics and astronautics manufacturing engineering from Shenyang Aerospace University, Shenyang, China in 2009, the Ph. D. degree in system engineering from Northeastern University, Shenyang, China in 2015. He is currently an Associate Professor of the College of Computer and Communication Engineering at Liaoning Shihua University China. From 2016 to 2018, he was a Visiting Scholar of Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ, USA. His current research interests include Petri nets, remanufacturing, recycling and reuse of automotive, intelligent optimization algorithm. He has published 40+ technical papers in journals and conference proceedings, including IEEE Transactions on Cybernetics, IEEE Transactions on System, Man and Cybernetics: Systems, IEEE Transactions on Intelligent Transportation Systems, and IEEE/CAA Journal of Automatica Sinica

Liang Qi received the B. S. degree in information and computing science and the M. S. degree in computer software and theory from Shandong University of Science and Technology, Qingdao, China in 2009 and 2012, respectively, and the Ph. D. degree in computer software and theory from Tongji University, Shanghai, China in 2017. He is currently with the College of Computer Science and Technology at Shandong University of Science and Technology, Qingdao, China. From 2015 to 2017, he was a Visiting Student in the Department of Electrical and Computer Engineering, New Jersey Institute of Technology, Newark, NJ, USA. His current research interests include Petri nets, machine learning, optimization, and intelligent transportation systems. He has published over 40 technical papers in journals and conference proceedings, including IEEE Transactions on System, Man and Cybernetics: Systems, IEEE Transactions on Intelligent Transportation Systems, and IEEE/CAA Journal of Automatica Sinica. He received the Best Student Paper Award-Finalist in the 15th IEEE International Conference on Networking, Sensing and Control (ICNSC’2018)

Zhenbing Zhao received the B. S., M. S., and Ph. D. degrees from North China Electric Power University, Baoding, China in 2002, 2005 and 2009, respectively. He is currently an Associate Professor with North China Electric Power University. His research interests include machine learning, image processing, and the intelligent detection of electrical equipment
Corresponding author: Xiwang Guo, e-mail: x.w.guo@163.com
Received Date: 2020-05-24
Revised Date: 2020-06-16
Accepted Date: 2020-07-04

Available Online: 2020-07-22

Abstract

Abstract

Facial attribute editing has mainly two objectives: 1) translating image from a source domain to a target one, and 2) only changing the facial regions related to a target attribute and preserving the attribute-excluding details. In this work, we propose a multi-attention U-Net-based generative adversarial network (MU-GAN). First, we replace a classic convolutional encoder-decoder with a symmetric U-Net-like structure in a generator, and then apply an additive attention mechanism to build attention-based U-Net connections for adaptively transferring encoder representations to complement a decoder with attribute-excluding detail and enhance attribute editing ability. Second, a self-attention (SA) mechanism is incorporated into convolutional layers for modeling long-range and multi-level dependencies across image regions. Experimental results indicate that our method is capable of balancing attribute editing ability and details preservation ability, and can decouple the correlation among attributes. It outperforms the state-of-the-art methods in terms of attribute manipulation accuracy and image quality. Our code is available at https://github.com/SuSir1996/MU-GAN.
- Attention U-Net connection,
- encoder-decoder architecture,
- facial attribute editing,
- multi-attention mechanism

FullText(HTML)

References(53)

References

[1]	D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv: 1312.6114, 2013.
[2]	X. Wang, Z. Ning, M. Zhou, X. Hu, L. Wang, Y. Zhang, F. R. Yu, and B. Hu, “Privacy-preserving content dissemination for vehicular social networks: Challenges and solutions,” IEEE Communications Surveys &Tutorials, vol. 21, no. 2, pp. 1314–1345, 2018.
[3]	Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Trans. Image Processing, vol. 28, no. 11, pp. 5464–5478, 2019. doi: 10.1109/TIP.2019.2916751
[4]	M. Y. Liu, T. Breuel, and J. Kautz, “Unsupervised image-toimage translation networks, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 700–708.
[5]	G. Lample, N. Zeghidour, N. Usunier, A. Bordes, L. Denoyer, and M. Ranzato, “Fader networks: Manipulating images by sliding attributes, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5967–5976.
[6]	P. Li, Y. Hu, R. He, and Z. Sun, “Global and local consistent wavelet-domain age synthesis,” IEEE Trans. Information Forensics and Security, vol. 14, no. 11, pp. 2943–2957, 2019. doi: 10.1109/TIFS.2019.2907973
[7]	H. Yang, D. Huang, Y. Wang, and A. K. Jain, “Learning face age progression: A pyramid architecture of gans, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 31–39.
[8]	H. Dong, P. Neekhara, C. Wu, and Y. Guo, “Unsupervised image-to-image translation with generative adversarial networks, ” arXiv preprint arXiv: 1701.02676, 2017.
[9]	A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 818–833.
[10]	J. Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks, ” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
[11]	Y. Choi, M. Choi, M. Kim, J. W. Ha, S. Kim, and J. Choo, “Stargan: Unified generative adversarial networks for multidomain image-to-image translation, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8789–8797.
[12]	K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Las Vegas, USA, 2016, pp. 770–778.
[13]	K. Zhang, M. Sun, T. X. Han, X. Yuan, L. Guo, and T. Liu, “Residual networks of residual networks: Multilevel residual networks,” IEEE Trans. Circuits and Systems for Video Technology, vol. 28, no. 6, pp. 1303–1314, 2017.
[14]	O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation, ” in Proc. Int. Conf. on Medical Image Computing and Computerassisted Intervention, Munich, Germany, 2015, pp. 234–241.
[15]	M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, “Stgan: A unified selective transfer network for arbitrary image attribute editing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 3673–3682.
[16]	L. Chen, X. Hu, W. Tian, H. Wang, D. Cao, and F. Y. Wang, “Parallel planning: A new motion planning framework for autonomous driving,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 1, pp. 236–246, 2018.
[17]	H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Selfattention generative adversarial networks, ” arXiv preprint arXiv: 1805.08318, 2018.
[18]	M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein gan,” arXiv preprint arXiv: 1701.07875, 2017.
[19]	I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville, “Improved training of wasserstein gans,” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5767–5777.
[20]	X. Wang, Q. Kang, J. An, and M. Zhou, “Drifted twitter spam classification using multiscale detection test on KL divergence,” IEEE Access, vol. 7, pp. 108 384–108 394, 2019. doi: 10.1109/ACCESS.2019.2932018
[21]	M. Mirza and S. Osindero, “Conditional generative adversarial nets, ” arXiv preprint arXiv: 1411.1784, 2014.
[22]	A. Odena, “Semi-supervised learning with generative adversarial networks, ” arXiv preprint arXiv: 1606.01583, 2016.
[23]	A. Odena, C. Olah, and J. Shlens, “Conditional image synthesis with auxiliary classifier gans, ” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 2642–2651.
[24]	S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee, “Generative adversarial text to image synthesis, ” in Proc. 33th Int. Conf. Machine Learning, New York, USA, 2016, pp. 1060–1069.
[25]	H. Zhang, T. Xu, H. Li, S. Zhang, X. Wang, X. Huang, and D. N. Metaxas, “Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks, ” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 5907–5915.
[26]	Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and D. Samaras, “Neural face editing with intrinsic image disentangling, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 5541–5550.
[27]	Y. Taigman, A. Polyak, and L. Wolf, “Unsupervised crossdomain image generation, ” arXiv preprint arXiv: 1611.02200, 2016.
[28]	T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim, “Learning to discover cross-domain relations with generative adversarial networks, ” in Proc. 34th Int. Conf. Machine Learning, Sydney, Australia, 2017, pp. 1857–1865.
[29]	C. Ledig, L. Theis, F. Huszár, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, and W. Z. Shi, “Photo-realistic single image super-resolution using a generative adversarial network, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 4681–4690.
[30]	B. Xu, L. Ma, L. Zhang, H. Li, Q. Kang, and M. Zhou, “An adaptive wordpiece language model for learning chinese word embeddings, ” in Proc. IEEE 15th Int. Conf. Automation Science and Engineering. IEEE, 2019, pp. 812–817.
[31]	S. Gao, M. Zhou, Y. Wang, J. Cheng, H. Yachi, and J. Wang, “Dendritic neuron model with effective learning algorithms for classification, approximation, and prediction,” IEEE Trans. Neural Networks and Learning Systems, vol. 30, no. 2, pp. 601–614, 2018.
[32]	X. Guo, M. Zhou, S. Liu, and L. Qi, “Lexicographic multiobjective scatter search for the optimization of sequencedependent selective disassembly subject to multiresource constraints,” IEEE Transactions on Cybernetics, vol. 50, no. 7, pp. 3307–3317, 2020. doi: 10.1109/TCYB.2019.2901834
[33]	X. Guo, S. Liu, M. Zhou, and G. Tian, “Dual-objective program and scatter search for the optimization of disassembly sequences subject to multiresource constraints,” IEEE Trans. Automation Science and Engineering, vol. 15, no. 3, pp. 1091–1103, 2017.
[34]	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets, ” in Proc. Advances in Neural Information Processing Systems, Montreal, Canada, 2014, pp. 2672–2680.
[35]	K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang, “Generative adversarial networks: introduction and outlook,” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017. doi: 10.1109/JAS.2017.7510583
[36]	G. J. Qi, “Loss-sensitive generative adversarial networks on lipschitz densities,” Int. Journal of Computer Vision, vol. 128, no. 5, pp. 1118–1140, 2020. doi: 10.1007/s11263-019-01265-2
[37]	M. Y. Liu and O. Tuzel, “Coupled generative adversarial networks, ” in Proc. Advances Neural Information Processing Systems, Barcelona Spain, 2016, pp. 469–477.
[38]	A. Almahairi, S. Rajeshwar, A. Sordoni, P. Bachman, and A. Courville, “Augmented cyclegan: Learning many-to-many mappings from unpaired data, ” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 195–204.
[39]	P. Xiang, L. Wang, F. Wu, J. Cheng, and M. Zhou, “Singleimage de-raining with feature-supervised generative adversarial network,” IEEE Signal Processing Letters, vol. 26, no. 5, pp. 650–654, 2019. doi: 10.1109/LSP.2019.2903874
[40]	S. Zhou, T. Xiao, Y. Yang, D. Feng, Q. He, and W. He, “Genegan: Learning object transfiguration and attribute subspace from unpaired data, ” arXiv preprint arXiv: 1705.04932, 2017.
[41]	T. Xiao, J. Hong, and J. Ma, “Dna-gan: learning disentangled representations from multi-attribute images, ” arXiv preprint arXiv: 1711.05415, 2017.
[42]	G. Perarnau, J. Van De Weijer, B. Raducanu, and J. M. Álvarez, “Invertible conditional gans for image editing, ” arXiv preprint arXiv: 1611.06355, 2016.
[43]	P. Isola, J. Y. Zhu, T. Zhou, and A. A. Efros, “Image-toimage translation with conditional adversarial networks, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Hawaii, USA, 2017, pp. 1125–1134.
[44]	Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild, ” in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 3730–3738.
[45]	X. Guo, S. Liu, M. Zhou, and G. Tian, “Disassembly sequence optimization for large-scale products with multiresource constraints using scatter search and petri nets,” IEEE Trans. Cybernetics, vol. 46, no. 11, pp. 2435–2446, 2015.
[46]	G. Cai, Y. Wang, L. He, and M. Zhou, “Unsupervised domain adaptation with adversarial residual transform networks, ” IEEE Trans. Neural Networks and Learning Systems, 2019, to be published. DOI: 10.1109/TNNLS.2019.2935384.
[47]	X. Hu, J. Cheng, M. Zhou, B. Hu, X. Jiang, Y. Guo, K. Bai, and F. Wang, “Emotion-aware cognitive system in multi-channel cognitive radio ad hoc networks,” IEEE Communications Magazine, vol. 56, no. 4, pp. 180–187, 2018. doi: 10.1109/MCOM.2018.1700728
[48]	K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using rnn encoder-decoder for statistical machine translation, ” in Proc. Conf. Empirical Methods Natural Language Processing, Doha, Qatar, 2014, p. 1724–1734.
[49]	S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997. doi: 10.1162/neco.1997.9.8.1735
[50]	E. Principi, D. Rossetti, S. Squartini, and F. Piazza, “Unsupervised electric motor fault detection by using deep autoencoders,” IEEE/CAA Journal of Automatica Sinica, vol. 6, no. 2, pp. 441–451, 2019. doi: 10.1109/JAS.2019.1911393
[51]	K. Zhang, N. Liu, X. Yuan, X. Guo, C. Gao, Z. Zhao, and Z. Ma, “Fine-grained age estimation in the wild with attention lstm networks, ” IEEE Trans. Circuits and Systems for Video Technology, 2019, to be published. DOI: 10.1109/TCSVT.2019.2936410.
[52]	A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, “Attention is all you need, ” in Proc. Advances Neural Information Processing Systems, Long Beach, USA, 2017, pp. 5998–6008.
[53]	X. Wang, R. Girshick, A. Gupta, and K. He, “Non-local neural networks, ” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 7794–7803.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(10) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (1510) PDF downloads(103)

Highlights

Constructing a symmetric U-Net-like architecture generator based on an additive attention mechanism, which effectively enhances detail preservation and attribute manipulation abilities.
Taking a self-attention mechanism into the existing encoder-decoder architecture thus effectively enforcing geometric constraints on generated results.
Introducing a multi-attention mechanism to help attribute decoupling, i.e., it can deal with the interference among attributes and only change the attributes that need to be changed.

MU-GAN: Facial Attribute Editing Based on Multi-Attention Mechanism

doi: 10.1109/JAS.2020.1003390

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Highlights

Export File

Citation

Format

Content