Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation

Yifan Yuan; Guanqun Yang; James Z. Wang; Hui Zhang; Hongming Shan; Feiyue Wang; Junping Zhang

doi:10.1109/JAS.2024.124800

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2024 > In Press, Accepted Manuscript

Y. Yuan, G. Yang, James Z. Wang, H. Zhang, H. Shan, F. Wang, and J. Zhang, “Dissecting and mitigating semantic discrepancy in stable diffusion for image-to-image translation,” IEEE/CAA J. Autom. Sinica, 2024. doi: 10.1109/JAS.2024.124800

Citation:

Y. Yuan, G. Yang, James Z. Wang, H. Zhang, H. Shan, F. Wang, and J. Zhang, “Dissecting and mitigating semantic discrepancy in stable diffusion for image-to-image translation,” IEEE/CAA J. Autom. Sinica, 2024. doi: 10.1109/JAS.2024.124800

Citation:

PDF( 96202 KB)

Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation

doi: 10.1109/JAS.2024.124800

Funds: This work was supported in part by the National Natural Science Foundation of China (No. 62176059). The work of James Z. Wang was supported by The Pennsylvania State University

More Information

Author Bio:
Yifan Yuan (Student Member, IEEE) received the B.S. degree in Physics from Fudan University, Shanghai, China, in 2019. Since 2019, she has been pursuing her Ph.D. in Computer Science and Technology at Fudan University. Her research interests include image editing, generative models and disentanglement learning

Guanqun Yang is a senior learning R&D engineer in the Fuxi team of NetEase corp. Graduated from School of Computer Science at BUAA in spring 2022, where he has been working on reinforcement learning and computer vision research and won the first prize of HUAWEI digix global AI challenge in 2021. Currently, he is working on intelligent gaming robots

James Z. Wang (Senior Member, IEEE) is a Distinguished Professor of the Data Sciences and Artificial Intelligence section of the College of Information Sciences and Technology at The Pennsylvania State University. He received the bachelor’s degree in mathematics summa cum laude from the University of Minnesota (1994), and the MS degree in mathematics (1997), the MS degree in computer science (1997), and the PhD degree in medical information sciences (2000), all from Stanford University. His research interests include image analysis, affective computing, image modeling, image retrieval, and their applications. He was a visiting professor at the Robotics Institute at Carnegie Mellon University (2007-2008), a lead special section guest editor of the IEEE Transactions on Pattern Analysis and Machine Intelligence (2008), a program manager at the Office of the Director of the National Science Foundation (2011-2012), and a special issue guest editor of the IEEE BITS – The Information Theory Magazine (2022). He is also affiliated with the Department of Communication and Media, School of Social Sciences and Humanities, Loughborough University, UK (2023-2024). He was a recipient of a National Science Foundation Career Award (2004) and Amazon Research Awards (2018-2022)

Hui Zhang (Senior Member, IEEE) received the B.S. degree in automation from the Beijing Jiaotong University, Beijing, China, in 2015 and received the Ph.D. degree in control theory and control engineering from the University of Chinese Academy of Sciences (UCAS), Beijing, China, in 2020. From August 2018 to October 2019, she was supported by UCAS as a joint-supervision Ph.D. student with University of Rhode Island, Kingston, USA. She is currently a lecturer at the School of Computer and Information Technology, Beijing Jiaotong University. Her research interests include computer vision, pattern recognition, and intelligent transportation systems

Hongming Shan (Senior Member, IEEE) received the Ph.D. degree in machine learning from Fudan University, Shanghai, China, in 2017. From 2017 to 2020, he was a Postdoctoral Research Associate and Research Scientist at Rensselaer Polytechnic Institute, Troy, NY, USA. He is currently a professor at the Institute of Science and Technology for Braininspired Intelligence, Fudan University, Shanghai, China, and also a “Qiusuo” Research Leader with the Shanghai Center for Brain Science and Braininspired Technology, Shanghai, China. His research focuses on representation learning, generative AI, medical image reconstruction, and brain disease analysis. He was recognized with Youth Outstanding Paper Award at World Artificial Intelligence Conference 2021

Fei-Yue Wang (Fellow, IEEE) received the Ph.D. degree in computer and systems engineering from the Rensselaer Polytechnic Institute, Troy, NY, USA, in 1990. He joined The University of Arizona in 1990 and became a Professor and the Director of the Robotics and Automation Laboratory and the Program in Advanced Research for Complex Systems. In 1999, he founded the Intelligent Control and Systems Engineering Center at the Institute of Automation, Chinese Academy of Sciences (CAS), Beijing, China, under the support of the Outstanding Chinese Talents Program from the State Planning Council. In 2002, he was appointed as the Director of the Key Laboratory of Complex Systems and Intelligence Science, CAS. In 2011, he became the State Specially Appointed Expert and the Director of the State Key Laboratory for Management and Control of Complex Systems. He is also a Senior Professor and the Chair of the International Academic Advisory Committee for the Institute of Systems Engineering, Macau University of Science and Technology, as well as a Professor at various colleges, including the Artificial Intelligence, and Economics and Management, University of Chinese Academy of Sciences. His current research focuses on methods and applications for parallel intelligence, social computing, and knowledge automation

Junping Zhang (Senior Member, IEEE) received the B.S. degree in automation from Xiangtan University, Xiangtan, China, in 1992. He received the M.S. degree in control theory and control engineering from Hunan University, Changsha, China, in 2000. He received his Ph.D. degree in intelligent system and pattern recognition from the Institute of Automation, Chinese Academy of Sciences, in 2003. He is a professor at School of Computer Science, Fudan University since 2011. His research interests include machine learning, image processing, biometric authentication, and intelligent transportation systems. He has been an associate editor of IEEE Intelligent Systems since 2009. He has widely published in highly ranked international journals such as IEEE TPAMI and IEEE TNNLS, and leading international conferences such as ICML, CVPR, IJCAI, and ECCV
Corresponding author: Feiyue Wang, e-mail: feiyue.wang@ia.ac.cn
¹ https://huggingface.co/runwayml/stable-diffusion-v1-5
² https://huggingface.co/openai/clip-vit-large-patch14
³ https://huggingface.co/Salesforce/blip-image-captioning-large
⁴ https://civitai.com/models/40/papercut
⁵ https: //civitai.com/models/4201?modelVersionId=29460
⁶ https://civitai.com/models/7371?modelVersionId=19575
Received Date: 2024-05-19
Revised Date: 2024-07-08
Accepted Date: 2024-07-24

Available Online: 2024-10-16

Abstract

Abstract

Finding suitable initial noise that retains the original image’s information is crucial for image-to-image (I2I) translation using text-to-image (T2I) diffusion models. A common approach is to add random noise directly to the original image, as in SDEdit. However, we have observed that this can result in “semantic discrepancy” issues, wherein T2I diffusion models misinterpret the semantic relationships and generate content not present in the original image. We identify that the noise introduced by SDEdit disrupts the semantic integrity of the image, leading to unintended associations between unrelated regions after U-Net upsampling. Building on the widely-used latent diffusion model, Stable Diffusion, we propose a training-free, plug-and-play method to alleviate semantic discrepancy and enhance the fidelity of the translated image. By leveraging the deterministic nature of Denoising Diffusion Implicit Models (DDIMs) inversion, we correct the erroneous features and correlations from the original generative process with accurate ones from DDIM inversion. This approach alleviates semantic discrepancy and surpasses recent DDIM-inversion-based methods such as PnP with fewer priors, achieving a speedup of 11.2 times in experiments conducted on COCO, ImageNet, and ImageNet-R datasets across multiple I2I translation tasks. The codes are available at https://github.com/Sherlockyyf/Semantic_Discrepancy.
- DDIM inversion,
- diffusion model,
- image-to-image translation,
- semantic discrepancy,
- stable diffusion

FullText(HTML)

¹ https://huggingface.co/runwayml/stable-diffusion-v1-5
² https://huggingface.co/openai/clip-vit-large-patch14
³ https://huggingface.co/Salesforce/blip-image-captioning-large
⁴ https://civitai.com/models/40/papercut
⁵ https: //civitai.com/models/4201?modelVersionId=29460
⁶ https://civitai.com/models/7371?modelVersionId=19575

References(68)

References

[1]	I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in Advances in Neural Information Processing Systems, vol. 27, 2014.
[2]	Y. Chen, Y. Lv, and F.-Y. Wang, “Traffic flow imputation using parallel data and generative adversarial networks,” IEEE Trans. Intelli gent Transportation Systems, vol. 21, no. 4, pp. 1624–1630, 2019.
[3]	J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” in Advances in Neural Information Processing Systems, vol. 33, 2020, pp. 6840–6851.
[4]	P. Dhariwal and A. Nichol, “Diffusion models beat gans on image synthesis,” in Advances in Neural Information Processing Systems, 2021, pp. 8780–8794.
[5]	C. Wang, T. Chen, Z. Chen, Z. Huang, T. Jiang, Q. Wang, and H. Shan, “FLDM-VTON: Faithful latent diffusion model for virtual try-on,” in IJCAI, 2024.
[6]	Q. Gao, Z. Li, J. Zhang, Y. Zhang, and H. Shan, “CoreDiff: Contextual error-modulated generalized diffusion model for low-dose CT denoising and generalization,” IEEE Trans. Medical Imaging, vol. 43, no. 2, pp. 745–759, 2024. doi: 10.1109/TMI.2023.3320812
[7]	R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “Highresolution image synthesis with latent diffusion models,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 10684–10695.
[8]	A. Q. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. Mcgrew, I. Sutskever, and M. Chen, “GLIDE: Towards photorealistic image generation and editing with text-guided diffusion models,” in Int. Conf. on Machine Learning, 2022, pp. 16 784–16 804.
[9]	C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,” in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 36 479–36 494.
[10]	J. Yu, Y. Xu, J. Y. Koh, T. Luong, G. Baid, Z. Wang, V. Vasudevan, A. Ku, Y. Yang, B. K. Ayan et al., “Scaling autoregressive models for content-rich text-to-image generation,” Trans. Machine Learning Research, 2022.
[11]	A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Int. Conf. on Machine Learning, 2021, pp. 8821–8831.
[12]	A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical text-conditional image generation with CLIP latents,” arXiv preprint arXiv:2204.06125, 2022.
[13]	J. Betker, G. Goh, and e. a. Li Jing, “Improving image generation with better captions,” 2023, https://cdn.openai.com/papers/dall-e-3.pdf.
[14]	K. Wang, C. Gou, N. Zheng, J. M. Rehg, and F.-Y. Wang, “Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives,” Artificial Intelligence Review, vol. 48, pp. 299–329, 2017. doi: 10.1007/s10462-017-9569-z
[15]	H. Zhang, G. Luo, Y. Li, and F.-Y. Wang, “Parallel vision for intelligent transportation systems in metaverse: Challenges, solutions, and potential applications,” IEEE Trans. Systems, Man, and Cyberne tics: Systems, vol. 53, pp. 3400–3413, 2022.
[16]	H. Zhang, Y. Tian, K. Wang, W. Zhang, and F.-Y. Wang, “Mask ssd: An effective single-stage approach to object instance segmentation,” IEEE Trans. Image Processing, vol. 29, pp. 2078–2093, 2019.
[17]	B. Kawar, S. Zada, O. Lang, O. Tov, H. Chang, T. Dekel, I. Mosseri, and M. Irani, “Imagic: Text-based real image editing with diffusion models,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 6007–6017.
[18]	T. Brooks, A. Holynski, and A. A. Efros, “Instructpix2pix: Learning to follow image editing instructions,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 18392–18402.
[19]	X. Li, J. Thickstun, I. Gulrajani, P. S. Liang, and T. B. Hashimoto, “Diffusion-lm improves controllable text generation,” Advances in Neural Information Processing Systems, vol. 35, pp. 4328–4343, 2022.
[20]	S. Ge, S. Nah, G. Liu, T. Poon, A. Tao, B. Catanzaro, D. Jacobs, J.-B. Huang, M.-Y. Liu, and Y. Balaji, “Preserve your own correlation: A noise prior for video diffusion models,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2023, pp. 22930–22941.
[21]	A. Lugmayr, M. Danelljan, A. Romero, F. Yu, R. Timofte, and L. Van Gool, “Repaint: Inpainting using denoising diffusion probabilistic models,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2022, pp. 11461–11471.
[22]	L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2023, pp. 3836–3847.
[23]	J. Z. Wu, Y. Ge, X. Wang, S. W. Lei, Y. Gu, Y. Shi, W. Hsu, Y. Shan, X. Qie, and M. Z. Shou, “Tune-a-video: One-shot tuning of image diffusion models for text-to-video generation,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2023, pp. 7623–7633.
[24]	C. Meng, Y. He, Y. Song, J. Song, J. Wu, J.-Y. Zhu, and S. Ermon, “SDEdit: Guided image synthesis and editing with stochastic differential equations,” in Int. Conf. on Learning Representations, 2021.
[25]	J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” in Int. Conf. on Learning Representations, 2020.
[26]	S. Witteveen and M. Andrews, “Investigating prompt engineering in diffusion models,” arXiv preprint arXiv:2211.15462, 2022.
[27]	N. Tumanyan, M. Geyer, S. Bagon, and T. Dekel, “Plug-and-play diffusion features for text-driven image-to-image translation,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 1921–1930.
[28]	M. Kwon, J. Jeong, and Y. Uh, “Diffusion models already have a semantic latent space,” arXiv preprint arXiv:2210.10960, 2022.
[29]	R. Mokady, A. Hertz, K. Aberman, Y. Pritch, and D. Cohen-Or, “Nulltext inversion for editing real images using guided diffusion models,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047.
[30]	D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
[31]	D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in Int. Conf. on Machine Learning, 2015, pp. 1530–1538.
[32]	L. Dinh, D. Krueger, and Y. Bengio, “Nice: Non-linear independent components estimation,” arXiv preprint arXiv:1410.8516, 2014.
[33]	D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in Advances in Neural Information Processing Systems, vol. 31, 2018.
[34]	Z. Huang, S. Chen, J. Zhang, and H. Shan, “AgeFlow: Conditional age progression and regression with normalizing flows,” in IJCAI, 2021, pp. 743–750.
[35]	P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2017, pp. 1125–1134.
[36]	J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. the IEEE Int. Conf. on Computer Vision, 2017, pp. 2223–2232.
[37]	Z. Yi, H. Zhang, P. Tan, and M. Gong, “Dualgan: Unsupervised dual learning for image-to-image translation,” in Proc. the IEEE Int. Conf. on Computer Vision, 2017, pp. 2849–2857.
[38]	Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation,” in Proc. the IEEE Conf. on Computer Vision and Pattern Recognition, 2018, pp. 8789–8797.
[39]	A. Pumarola, A. Agudo, A. M. Martinez, A. Sanfeliu, and F. Moreno-Noguer, “Ganimation: Anatomically-aware facial animation from a single image,” in Proc. the European Conf. on Computer Vision, 2018, pp. 818–833.
[40]	Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “Attgan: Facial attribute editing by only changing what you want,” IEEE Trans. Image Processing, vol. 28, no. 11, pp. 5464–5478, 2019. doi: 10.1109/TIP.2019.2916751
[41]	T. Park, M.-Y. Liu, T.-C. Wang, and J.-Y. Zhu, “Semantic image synthesis with spatially-adaptive normalization,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 2337–2346.
[42]	K. Zhang, Y. Su, X. Guo, L. Qi, and Z. Zhao, “Mu-gan: Facial attribute editing based on multi-attention mechanism,” IEEE/CAA Journal of Automatica Sinica, vol. 8, no. 9, pp. 1614–1626, 2021. doi: 10.1109/JAS.2020.1003390
[43]	M. Liu, Y. Ding, M. Xia, X. Liu, E. Ding, W. Zuo, and S. Wen, “StGAN: A unified selective transfer network for arbitrary image attribute editing,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2019, pp. 3673–3682.
[44]	E. Richardson, Y. Alaluf, O. Patashnik, Y. Nitzan, Y. Azar, S. Shapiro, and D. Cohen-Or, “Encoding in style: a stylegan encoder for image-to-image translation,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2021, pp. 2287–2296.
[45]	H. Lin, Y. Liu, S. Li, and X. Qu, “How generative adversarial networks promote the development of intelligent transportation systems: A survey,” IEEE/CAA Journal of Automatica Sinica, vol. 10, no. 9, pp. 1781–1796, 2023. doi: 10.1109/JAS.2023.123744
[46]	K. Wang, C. Gou, Y. Duan, Y. Lin, X. Zheng, and F.-Y. Wang, “Generative adversarial networks: introduction and outlook,” IEEE/CAA Journal of Automatica Sinica, vol. 4, no. 4, pp. 588–598, 2017. doi: 10.1109/JAS.2017.7510583
[47]	Y. Yuan, S. Ma, and J. Zhang, “Vr-fam: Variance-reduced encoder with nonlinear transformation for facial attribute manipulation,” in ICASSP 2022-2022 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022, pp. 1755–1759.
[48]	Y. Yuan, S. Ma, H. Shan, and J. Zhang, “Do-fam: Disentangled nonlinear latent navigation for facial attribute manipulation,” in ICASSP 2023-2023 IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023, pp. 1–5.
[49]	A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in Int. Conf. on Machine Learning, 2021, pp. 8748–8763.
[50]	T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901, 2020.
[51]	G. Couairon, J. Verbeek, H. Schwenk, and M. Cord, “Diffedit: Diffusion-based semantic image editing with mask guidance,” arXiv preprint arXiv:2210.11427, 2022.
[52]	B. Wallace, A. Gokul, and N. Naik, “Edict: Exact diffusion inversion via coupled transformations,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 22 532–22 541.
[53]	L. Dinh, J. Sohl-Dickstein, and S. Bengio, “Density estimation using real nvp,” arXiv preprint arXiv:1605.08803, 2016.
[54]	A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y. Pritch, and D. Cohen-or, “Prompt-to-prompt image editing with cross-attention control,” in Int. Conf. on Learning Representations, 2022.
[55]	B. Liu, C. Wang, T. Cao, K. Jia, and J. Huang, “Towards understanding cross and self-attention in stable diffusion for text-guided image editing,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2024, pp. 7817–7826.
[56]	N. Ruiz, Y. Li, V. Jampani, Y. Pritch, M. Rubinstein, and K. Aberman, “Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation,” in Proc. the IEEE/CVF Conf. on Computer Vision and Pattern Recognition, 2023, pp. 22500–22510.
[57]	R. Gal, Y. Alaluf, Y. Atzmon, O. Patashnik, A. H. Bermano, G. Chechik, and D. Cohen-Or, “An image is worth one word: Personalizing text-to-image generation using textual inversion,” arXiv preprint arXiv:2208.01618, 2022.
[58]	H. Ye, J. Zhang, S. Liu, X. Han, and W. Yang, “Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models,” arXiv preprint arXiv:2308.06721, 2023.
[59]	A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” in Advances in Neural Information Processing Systems, vol. 30, 2017.
[60]	D. Hendrycks, S. Basart, N. Mu, S. Kadavath, F. Wang, E. Dorundo, R. Desai, T. Zhu, S. Parajuli, M. Guo et al., “The many faces of robustness: A critical analysis of out-of-distribution generalization,” in Proc. the IEEE/CVF Int. Conf. on Computer Vision, 2021, pp. 8340–8349.
[61]	O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, October 5–9, 2015, Part Ⅲ 18. Springer, 2015, pp. 234–241.
[62]	A. Mackiewiczx and W. Ratajczak, “Principal components analysis (PCA),” Computers & Geosciences, vol. 19, no. 3, pp. 303–342, 1993.
[63]	C. Si, Z. Huang, Y. Jiang, and Z. Liu, “FreeU: Free lunch in diffusion U-net,” arXiv preprint arXiv:2309.11497, 2023.
[64]	T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollar, x and C. L. Zitnick, “Microsoft coco: Common objects in context,” in Proc. the European Conf. on Computer Vision, 2014, pp. 740–755.
[65]	O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and L. Fei-Fei, “ImageNet large scale visual recognition challenge,” Int. Journal of Computer Vision, vol. 115, no. 3, pp. 211–252, 2015. doi: 10.1007/s11263-015-0816-y
[66]	J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 6, pp. 679–698, 1986.
[67]	Z. Wang, E. P. Simoncelli, and A. C. Bovik, “Multiscale structural similarity for image quality assessment,” in Asilomar Conf. on Signals, Systems, and Computers, vol. 2, 2003, pp. 1398–1402.
[68]	S. Xie and Z. Tu, “Holistically-nested edge detection,” in Proc. the IEEE Int. Conf. on Computer Vision, 2015, pp. 1395–1403.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(15) / Tables(5)

Get Citation

PDF

XML

Article Metrics

Article views (17) PDF downloads(3)

Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation

doi: 10.1109/JAS.2024.124800

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content