UNet-like network fused swin transformer and CNN for semantic image synthesis.

Aihua Ke, Jian Luo, Bo Cai
Author Information
  1. Aihua Ke: School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China.
  2. Jian Luo: School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China.
  3. Bo Cai: School of Cyber Science and Engineering, Wuhan University, Wuhan, 430072, China. caib@whu.edu.cn.

Abstract

Semantic image synthesis approaches has been dominated by the modelling of Convolutional Neural Networks (CNN). Due to the limitations of local perception, their performance improvement seems to have plateaued in recent years. To tackle this issue, we propose the SC-UNet model, which is a UNet-like network fused Swin Transformer and CNN for semantic image synthesis. Photorealistic image synthesis conditional on the given semantic layout depends on the high-level semantics and the low-level positions. To improve the synthesis performance, we design a novel conditional residual fusion module for the model decoder to efficiently fuse the hierarchical feature maps extracted at different scales. Moreover, this module combines the opposition-based learning mechanism and the weight assignment mechanism for enhancing and attending the semantic information. Compared to pure CNN-based models, our SC-UNet combines the local and global perceptions to better extract high- and low-level features and better fuse multi-scale features. We have conducted an extensive amount of comparison experiments, both in quantitative and qualitative terms, to validate the effectiveness of our proposed SC-UNet model for semantic image synthesis. The outcomes illustrate that SC-UNet distinctively outperforms the state-of-the-art model on three benchmark datasets (Citysacpes, ADE20K, and COCO-Stuff) including numerous real-scene images.

References

  1. IEEE Trans Pattern Anal Mach Intell. 2023 May;45(5):6055-6071 [PMID: 36215369]
  2. IEEE Trans Pattern Anal Mach Intell. 2023 Jan;45(1):768-784 [PMID: 35263249]
  3. Proc IEEE Comput Soc Conf Comput Vis Pattern Recognit. 2019 Jun;2019:2422-2431 [PMID: 32076365]
  4. IEEE Trans Pattern Anal Mach Intell. 2022 Sep;44(9):4852-4866 [PMID: 33914680]
  5. Sensors (Basel). 2023 Aug 01;23(15): [PMID: 37571641]
  6. Perspect Psychol Sci. 2011 Jan;6(1):3-5 [PMID: 26162106]
  7. IEEE Trans Image Process. 2004 Apr;13(4):600-12 [PMID: 15376593]
  8. IEEE Trans Pattern Anal Mach Intell. 2018 Apr;40(4):834-848 [PMID: 28463186]

Grants

  1. 61971316/National Natural Science Foundation of China

Word Cloud

Created with Highcharts 10.0.0synthesisimagesemanticSC-UNetmodelCNNlocalperformanceUNet-likenetworkfusedconditionallow-levelmodulefusecombinesmechanismbetterfeaturesSemanticapproachesdominatedmodellingConvolutionalNeuralNetworksDuelimitationsperceptionimprovementseemsplateauedrecentyearstackleissueproposeSwinTransformerPhotorealisticgivenlayoutdependshigh-levelsemanticspositionsimprovedesignnovelresidualfusiondecoderefficientlyhierarchicalfeaturemapsextracteddifferentscalesMoreoveropposition-basedlearningweightassignmentenhancingattendinginformationComparedpureCNN-basedmodelsglobalperceptionsextracthigh-multi-scaleconductedextensiveamountcomparisonexperimentsquantitativequalitativetermsvalidateeffectivenessproposedoutcomesillustratedistinctivelyoutperformsstate-of-the-artthreebenchmarkdatasetsCitysacpesADE20KCOCO-Stuffincludingnumerousreal-sceneimagesswintransformer

Similar Articles

Cited By