Processing math: 100%
A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y. Wang, "FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1428-1439, Aug. 2021. doi: 10.1109/JAS.2021.1004057
Citation: K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y. Wang, "FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1428-1439, Aug. 2021. doi: 10.1109/JAS.2021.1004057

FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation

doi: 10.1109/JAS.2021.1004057
Funds:  This work was supported in part by the National Key Research and Development Program of China (2018YFB1305002), the National Natural Science Foundation of China (62006256), the Postdoctoral Science Foundation of China (2020M683050), the Key Research and Development Program of Guangzhou (202007050002), and the Fundamental Research Funds for the Central Universities (67000-31610134)
More Information
  • Because pixel values of foggy images are irregularly higher than those of images captured in normal weather (clear images), it is difficult to extract and express their texture. No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images. We investigated this relationship and propose a generative adversarial network (GAN) for foggy image semantic segmentation (FISS GAN), which contains two parts: an edge GAN and a semantic segmentation GAN. The edge GAN is designed to generate edge information from foggy images to provide auxiliary information to the semantic segmentation GAN. The semantic segmentation GAN is designed to extract and express the texture of foggy images and generate semantic segmentation images. Experiments on foggy cityscapes datasets and foggy driving datasets indicated that FISS GAN achieved state-of-the-art performance.

     

  • ENVIRONMENTAL perception plays a vital role in the fields of autonomous driving [1], robotics [2], etc., and this perception influences the subsequent decisions and control of such devices [3]-[5]. Fog is a common form of weather, and when fog exists, the pixel values of foggy images are irregularly higher than those of clear images. As a result, the texture of foggy images is less than that of clear images. There are already many methods for semantic segmentation of clear images, which can extract and express the features of clear images and achieve good semantic segmentation results. However, the performance of these methods on foggy images is poor. This poor performance occurs because current methods cannot efficiently extract and express the features of foggy images. Moreover, foggy image data are not sparse, and the current excellent work [6], [7] on sparse data cannot be used. Therefore, to date, researchers have developed two ways to address this problem:

    In this method, first, a foggy image is converted to a fog-free image by defogging algorithms, and then the restored image is segmented by a semantic segmentation algorithm. Therefore, the defogging-segmentation method can be separated into two steps.

    Step 1: Fog removal. According to the classic atmosphere scattering model [8], [9], a fog-free image can be represented by a foggy image

    J(x)=I(x)At(x)+A (1)

    where J(x) is the fog-free image; I(x) is the foggy image; t(x) is the transmission map; and A is the global atmospheric light.

    Step 2: Semantic segmentation of fog-free images. When semantic segmentation is performed, the algorithms’ inputs may be the fog-free image and its auxiliary information or only the fog-free image. Therefore, the problem of semantic image segmentation after defogging can be expressed as

    S(x)=F(f(J(x),g(x))) (2)

    where g(x) is auxiliary information; if there is no auxiliary information, g(x) is self-mapping; f() is the relation between J(x) and g(x); F() is flection from f(J(x),g(x)) to S(x); and S(x) is the semantic segmentation image.

    In this method, first, a semantic segmentation model is trained based on clear images. Then, based on the trained semantic segmentation model and transfer learning, the semantic segmentation model is trained on foggy images. The semantic segmentation method based on transfer learning can also be separated into two steps.

    Step 1: Training the semantic segmentation model with clear images. The method used to obtain the semantic segmentation model is the same as that shown in (2). However, the inputs for this method are clear images and their auxiliary information or only clear images. The training model can be expressed as

    M=F(f(C(x),g(x))) (3)

    where C(x) are the clear images, M is the semantic segmentation model of clear images, and g(x) is the auxiliary information mentioned above.

    Step 2: Training the transfer learning model with foggy images. Using the clear images as the source domain and foggy images as the target domain, the semantic segmentation model can be trained with foggy images based on the model above

    S(x)=T(M,I(x))=T(F(f(C(x),g(x))),I(x)) (4)

    where T() is a transfer learning method, and the other terms are the same as defined above.

    These two methods can achieve semantic segmentation results for foggy images; however, they are based on defogged images or semantic segmentation models trained with clear images. Without this information, these two methods are useless. This study focuses on a new semantic segmentation method that directly explores the mapping relationship between foggy images and the resulting semantic segmentation images. The mathematical model can be expressed as follows:

    S(x)=F(f(I(x),g(x))). (5)

    It is challenging to solve (5). The motivation of this paper is to explore a semantic segmentation method that can efficiently solve (5), which is an efficient method to express the mapping relationship between foggy images and the resulting semantic segmentation images.

    A generative adversarial network (GAN) is an efficient semantic segmentation method. Luc et al. [10] first explored the use of a GAN for clear image semantic segmentation because a GAN could enforce forms of higher-order consistency [11]. Subsequently, [12] and [13] also provided GANs for the semantic segmentation of clear images and achieved state-of-the-art performance. In this paper, we also explore the semantic segmentation method for foggy images based on a GAN. Additionally, based on the “lines first, color next” approach, edge images are used to provide auxiliary information for clear image inpainting [14]. This method has been shown to greatly improve the quality of clear image inpainting. In this paper, we also analyze the foggy image semantic segmentation (FISS) problem using the “lines first, color next” approach and use edge images as auxiliary information. Specifically, we first obtain the edge information of foggy images and then obtain the semantic segmentation results for foggy images under the guidance of this edge information. Based on the above ideas, a two-stage FISS GAN is provided in this paper. The main contributions of this paper are as follows:

    1) We propose a novel efficient network architecture based on a combination of concepts from U_Net [15], called a dilated convolution U_Net. By incorporating dilated convolution layers and adjusting the feature size in the convolutional layer, dilated convolution U_Net has shown improved feature extraction and expression ability.

    2) A direct FISS method (FISS GAN) that generates semantic segmentation images under edge information guidance is proposed. We show our method’s effectiveness through extensive experiments on foggy cityscapes datasets and foggy driving datasets and achieve state-of-the-art performance. To the best of our knowledge, this is the first paper to explore a direct FISS method.

    The structure of this paper is as follows: Section I is the introduction; Section II introduces the work related to foggy images and semantic segmentation methods; Section III describes FISS GAN in detail; Section IV describes the experiments designed to verify the performance of FISS GAN; and Section V summarizes the full paper.

    Most studies on foggy images are based on defogging methods. Image defogging methods can be divided into traditional defogging methods and deep learning-based defogging methods. Meanwhile, according to the different processing methods, traditional defogging methods can be divided into image enhancement defogging methods and physical model-based defogging methods. The methods based on image enhancement [16]-[18] do not consider the fog in the image and directly improve contrast or highlight image features to make the image clearer and achieve purpose of image defogging. However, when contrast is improved or image features are highlighted, some image information will be lost, and images defogged by this method will be obviously distorted.

    The methods based on atmospheric scattering models [19]-[25] consider the fog in the image and study the image defogging mechanism or add other prior knowledge (scene depth information [26], [27]) to produce a clear image. Among these methods, the classic algorithms are the dark channel defogging method proposed by He et al. [23], an approach based on Markov random fields presented by TAN [21], and a visibility restoration algorithm proposed by Taral et al. [28]. The image defogging methods based on atmospheric scattering models provide better defogging results than those obtained by image enhancement. However, the parameters used in the methods that utilize atmospheric scattering models to defog an image, such as the defogging coefficient and transmittance, are selected according to experience, so the resulting image exhibits some distortion.

    With the development of deep learning (DL), recent research has increasingly explored defogging methods based on DL. Some researchers obtained the transmission map of a fog image through a DL network and then defogged the image based on an atmospheric scattering model [29]-[32]. This kind of method does not need prior knowledge, but its dependence on parameters and models will also cause slight image distortion. Other researchers designed neural networks to study end-to-end defogging methods [33]-[38]. Moreover, with the development of GANs in image inpainting and image enhancement, researchers have also proposed image defogging methods based on GANs [39]-[44], which greatly improve the quality of image defogging. In addition to studies on defogging, researchers have studied methods for obtaining optical flow data from foggy images [45].

    Semantic segmentation is a high-level perception task for robotic and autonomous driving. Prior semantic segmentation methods include color slices and conditional random fields (CRFs). With the development of DL, traditional DL-based semantic segmentation methods have greatly improved the accuracy of semantic segmentation. The fully convolutional network (FCN) [46] is the first semantic segmentation method based on traditional DL. However, due to its pooling operation, some information may be lost. Therefore, the accuracy of semantic segmentation with this method is low. To increase the accuracy of semantic segmentation, many improved semantic segmentation frameworks [15], [47]-[56] and improved loss functions [51] were subsequently proposed. Most traditional DL-based semantic segmentation methods are supervised. Supervised semantic segmentation methods can achieve good segmentation results, but they require a large amount of segmentation data. To solve this problem, Hoffman et al. [57] and Zhang et al. [58] proposed training semantic segmentation models through a synthetic dataset where the new model is trained to predict real data by transfer learning.

    Luc et al. [10] introduced GAN into the field of semantic segmentation. The generator’s input is the image that needs to be segmented, and the output is the semantic segmentation classification of the image. The discriminator’s input is the ground truth of the semantic segmentation classification or the generated semantic segmentation classification, and the output is the judgment result of whether the input is a true value. In addition, considering GAN’s outstanding performance in transfer learning, researchers proposed a series of semantic segmentation GANs based on transfer learning. Pix2Pix [12] is a typical GAN model for semantic segmentation that considers semantic segmentation as one image-to-image translation problem and builds a general conditional GAN to solve it. Because domain adaptation cannot capture pixel-level and low-level domain shifts, Hoffman et al. [13] proposed cycle-consistent adversarial domain adaptation (CYCADA), which can adapt representations at both the pixel level and feature level and improve the precision of semantic segmentation.

    An unsupervised general framework that extracts the same features of the source domain and target domain was proposed by Murez et al. [59]. To address the domain mismatch problem between real images and synthetic images, Hong et al. [60] proposed a network that integrates GAN into the FCN framework to reduce the gap between the source and target domains; Luo et al. [61] proposed a category-level adversarial network that enforces local semantic consistency during the trend of global alignment. To improve performance and solve the limited dataset problem of domain adaptation, Li et al. [62] presented a bidirectional learning framework of semantic segmentation in which the image translation model and the segmentation adaptation model were trained alternately and while promoting each other.

    The approaches above can directly address clear images and achieve state-of-the-art performance. However, these methods cannot handle foggy images very well because of their weak texture characteristics. To the best of our knowledge, there has been no research on a direct semantic segmentation method for foggy images.

    Unlike current semantic segmentation GANs [10], [12], which handle clear images and contain one part, FISS GAN (Fig. 1) handles foggy images and contains two parts: the edge GAN and the semantic segmentation GAN. The purpose of the edge GAN is to obtain the edge information of foggy images to assist with the semantic segmentation tasks. The edge directly achieved from foggy images contains all detailed edge information, while the edge information used for semantic segmentation is only its boundary information. Therefore, we use the edge information achieved from the ground truth of the semantic segmentation image as the ground truth in our edge GAN instead of the edge information from the clear image.

    Figure  1.  The pipeline of FISS GAN.

    To clarify, we tested these two kinds of edges by the Canny algorithm [63]. The visual differences between the two edges are shown in Fig. 2. As seen in Fig. 2, the edge achieved directly from the foggy image contains too much useless information for semantic segmentation. In contrast, another edge is just the boundary of its semantic segmentation, which is appropriate for semantic segmentation.

    Figure  2.  The visual differences.

    The purpose of the semantic segmentation GAN is to accomplish the semantic segmentation of foggy images. The inputs of the semantic segmentation GAN are foggy images and edge images achieved from the edge GAN, and its outputs are the semantic segmentation results of foggy images. Therefore, based on the mathematical model for the semantic segmentation of foggy images (formula (5)), the mathematical model of FISS GAN can be expressed as follows:

    S(x)=F(f(I(x),Egan(x))) (6)

    where F() is the semantic segmentation GAN; f() is the concatenate function; I(x) is the foggy image; and Egan(x) is the edge information achieved from the edge GAN.

    To further improve feature extraction and expression abilities, we learn convolution and deconvolution features by combining thoughts from U_Net [15] and propose a new network architecture, namely, dilated convolution U_Net (Fig. 3). Dilated convolution U_Net consists of three convolution layers (C1, C2, and C3), four dilated convolution layers (DC), and three fusion layers (f(C3,DC), f(C2,CT1), and f(C1,CT2)). The dilated convolution U_Net contains 4 dilated convolution layers and can result in a receptive field of the dilation factor of 19. Fusion layers are the layers that concatenate features from the dilated convolution results or transposed convolution results with the corresponding convolution layer. Similar to the fusion approach of U_Net [15], we divided the fusion operation into three steps:

    Figure  3.  Structure of dilated convolution U_Net.

    Step 1: Fuse C3 and DC to obtain f(C3,DC) and deconvolute f(C3,DC) to obtain CT1;

    Step 2: Fuse C2 and CT1 to obtain f(C2,CT1) and deconvolute f(C2,CT1) to obtain CT2;

    Step 3: Fuse C1 and CT2 to obtain f(C1,CT2).

    The fusion approach of this paper is a concatenation operation. Three convolution layers and four dilated convolution layers are used to extract input features, and two deconvolution layers are used to express the extracted features. The size of each layer feature is shown in Fig. 3.

    The differences between dilated convolution U_Net and U_Net [15] are as follows: 1) Dilated convolution U_Net incorporates dilated convolution layers to improve feature extraction ability. 2) In feature fusion, because the feature sizes of the convolution layers and deconvolution layers in U_Net [15] are different, the features of the convolution layers are randomly cropped, and this operation leads to features that do not correspond. Thus, some information from the fusion image might be lost in the fusion step. In the dilated convolution U_Net [15] proposed in this study, the feature sizes of the convolution layers and their corresponding deconvolution layers are the same, which means that the features of the convolution layers can be directly fused with the features of the deconvolution layers. Thus, no information will be lost in the fusion step. 3) U_Net achieves image feature extraction and expression from the convolution layers, maximum pooling layers, upsampling layers (first the bilinear layer, then the convolution layer or transformed convolution layers) and convolution layers. U_Net consists of 23 convolution layers, 4 maximum pooling layers and 4 upsampling layers. According to the convolution kernel and step size of U_Net, the number of parameters that need to be trained is 17 268 563. The dilated convolution U_Net proposed in this paper achieves image feature extraction and expression by convolution layers, dilated convolution layers, transformed convolution layers and convolution layers. This method consists of 3 convolution layers, 4 dilated convolution layers and 2 transformed convolution layers. With the convolution kernel and step size of dilated convolution U_Net (Table I), the number of parameters that need to be trained is 4 335 424. The more parameters that need to be trained, the more computations that are required. Therefore, dilated convolution U_Net has fewer network layers, fewer parameters, and less computation than U_Net.

    Table  I.  Parameters of G1 and G2
    InputKernelStridePadding
    Dilated convolution U_NetC1[7×7, 64]13
    C2[4×4, 128]21
    C3[4×4, 256]21
    DC[3×3, 256]11
    CT1[4×4, 128]21
    CT2[4×4, 64]21
    G1_C3[7×7, 1]13
    G2_C3[7×7, 19]13
     | Show Table
    DownLoad: CSV

    The architecture of the edge GAN, as shown in Fig. 1, includes the edge generator G1 and edge discriminator D1. The purpose of G1 is to generate an edge image similar to the ground truth edge image. G1 is composed of the dilated convolution U_Net and one convolution layer (G1_C3). Because the edge image is a set of 0 or 255 pixel values, it can be expressed by single-channel image data. Therefore, the size of G1_C3 is 1×H×W. The purpose of D1 is to determine whether the generated edge image is the ground truth image and provides feedback (please refer to “the false binary cross entropy (BCE) loss from D1Loss” below) to the edge generator G1 to improve the accuracy of the generated image. The design of D1 is similar to that of PatchGAN [64], which contains five standard convolution layers.

    The loss function plays an important role in the neural network model. This function determines whether the neural network model converges or achieves good accuracy. The edge GAN includes G1 and D1. The loss function includes the loss function of G1 and that of D1. The inputs of D1 are the ground truth of the edge images and the edge images generated from G1, where the ground truth of the edge image is achieved by the Canny algorithm [63] from the semantic segmentation image. In addition, the output of D1 is whether its input is true. Specifically, the output is the probability matrix (0 ~ 1).

    The value of the probability matrix is expected to be close to 1 after the ground truth passes through D1, which means that this edge image is the ground truth (the size of the matrix is the same as the size of the output matrix, and the label value is 1). In contrast, the value of the probability matrix of the generated edge image after passing through D1 is close to 0, which means that this edge image is a generated edge image (the size of the matrix is the same as the size of the output matrix, and the label value is 0). Therefore, the discriminator loss function of the edge GAN (D1 loss) is designed as the BCE loss of the discriminator output and its corresponding label.

    Since the output of D1 includes the true value probability obtained by taking the ground truth of the edge image as the input and the false value probability obtained by taking the generated edge image as the input, the D1 loss has two parts: the BCE loss between the true value probability and 1, namely, true BCE loss, and the BCE loss between the false value probability and 0, namely, false BCE loss. Specifically, D1 loss is the average of true BCE loss and false BCE loss.

    Let the foggy image be F, and the generated edge image be ~FE; let the ground truth of the edge images be FE, and the true value probability be FEL; let the false value probability be ~FEL, and let the D1 loss be D1Loss. D1Loss can be formulated as

    D1Loss=BCELoss(FEL,1)+BCELoss(~FEL,0)2=log(FEL)log(~FEL)2. (7)

    The features of the D1 convolution layer can adequately express the ground truth of the edge image or the generated edge image. Therefore, we achieve G1’s ability to generate images by narrowing the gap between the feature of the ground truth edge image and the feature of the generated edge image. The gap between the feature of the ground truth edge image and the feature of the generated edge image is calculated by L1 losses. Meanwhile, the false BCE loss from D1Loss indicates the quality of the image generated by G1. A large false BCE loss indicates that the generated edge image is different with the ground truth image. A small false BCE loss indicates that the generated edge image is close to the ground truth image. A false BCE loss partly reflects the quality of the generator, and its optimization goal is consistent with that of the generator, which is to reduce its value. Therefore, it is considered part of the generator loss function.

    Hence, the G1 loss is the sum of the L1 losses from each layer feature and the false BCE loss. If the input is FE, the convolution layer output of D1 is DCi(i=1,2,3,4,5); if the input is  FE, the convolution layer output of D1 is ~DCi(i=1,2,3,4,5). G1 loss is expressed as G1Loss, and it can be formulated as

    G1Loss=5i=1L1Loss(DCi,~DCi)+BCELoss(~FEL,0)=5i=1|DCi~DCi|log(~FEL). (8)

    Similar to the edge GAN, the semantic segmentation GAN includes the semantic segmentation generator G2 and semantic segmentation discriminator D2. The goal of G2 is to generate semantic segmentation classifications with the same ground truth as the semantic segmentation classifications. G2 is composed of the dilated convolution U_Net and one convolution layer (G2_C3). The goal of the semantic segmentation GAN is to divide the foggy images into n classes. Therefore, the size of G2_C3 is n×H×W. The purpose of D2 is to judge whether the generated semantic segmentation image is the ground truth image and provide feedback (please refer to “the false BCE loss from D2Loss” below) to the semantic segmentation generator G2 so that it can improve the accuracy of the generated image. The structure of D2 is the same as that of D1, which contains 5 standard convolution layers.

    The input of D2 is the ground truth of the semantic segmentation image of the foggy image and the semantic segmentation image generated by G2, and its output is the probability matrix (0 ~ 1), which indicates whether the input is ground truth. Therefore, similar to the D1 loss of the edge GAN, the discriminator loss function of the semantic segmentation GAN (D2 loss) includes two parts: the BCE loss between true value probability and 1, namely, true BCE loss, and the BCE loss between false value probability and 0, namely, false BCE loss. Specifically, D2 loss is the average of true BCE loss and false BCE loss.

    Let the semantic segmentation image generated by G2 be ~FSF, and the generated semantic segmentation classification be ~FS,~FS=[~FS0,~FS1,,~FSj,~FSn1], where ~FSj is the jth generated semantic segmentation classification; let the ground truth of semantic segmentation (n class) classification be FS; FS=[FS0,FS1,,FSj,FSn1], where FSj is the jth ground truth of semantic segmentation classification; let the ground truth of the semantic segmentation image be FSF, and the label obtained by FS be FSL; let the label obtained by ~FS be ~FSL, and the discriminator loss function of the semantic segmentation GAN be D2Loss, which can also be formulated as

    D2Loss=BCELoss(FSL,1)+BCELoss(~FSL,0)2=log(FSL)log(~FSL)2. (9)

    In this paper, the loss function of G2 is designed as the sum of three parts: L1 loss between FSL and ~FSL (L1Loss(FSL,~FSL)), cross-entropy loss between FS and ~FS (CrossEntropyLoss(~FS,FS)), and false BCE loss from D2Loss. Let the loss function of G2 be G2Loss, which can be formulated as

    G2Loss=L1Loss(FSL,~FSL)+BCELoss(~FSL,0)+CrossEntropyLoss(~FS,FS)=|FSL~FSL|n1j=0(log(e~FSjn1j=0e~FSj)×FSj)log(~FSL). (10)

    The loss functions of edge GANs are mathematical operations (linear operations) of several existing loss functions, which have all been proven to be convergent when proposed and are commonly used in GANs. Therefore, mathematical operations (linear operations) of several existing loss functions are also convergent, as are the loss functions of the semantic segmentation GAN.

    The foggy cityscapes dataset [65] is a synthetic foggy dataset with 19 classifications (road, sidewalk, building, wall, etc.) for semantic foggy scene understanding (SFSU). It contains 2975 training images and 500 valuation images with β = 0.005 (β is the attenuation coefficient; the higher the attenuation coefficient is, the more fog there is in the image), 2975 training images and 500 valuation images with β = 0.01, and 2975 training images and 500 valuation images with β = 0.02. Due to the differences in the attenuation coefficients, we separate the foggy cityscapes dataset into three datasets. Dataset 1 is composed of 2975 training images and 500 valuation images with β = 0.005. Dataset 2 is composed of 2975 training images and 500 valuation images with β = 0.01, and Dataset 3 is composed of 2975 training images and 500 valuation images with β = 0.02. The corresponding semantic segmentation ground truth contains semantic segmentation images with color, semantic segmentation images with labels, images with instance labels and label files with polygon data. The ground truth of edge images is obtained from semantic segmentation images with color and by the Canny algorithm [63].

    The foggy driving dataset [65] is a dataset with 101 real-world images that can be used to evaluate the trained models. We separately use Dataset 1, Dataset 2, and Dataset 3 to train the models and use the foggy driving dataset [65] as the test set to test the trained models. Due to the lack of training data and validation data, we carry out random flip, random crop, rotation, and translation operations on the data during training and verification to avoid the overfitting phenomena.

    The activation function of the dilated convolution U_Net is ReLU [66], while that of G1_CT3 and G2_CT3 is sigmoid. The activation function of the first four layers in D1 and D2 is LeReLU [67], and the parameter value is 0.25, while that of the last layer is sigmoid. The optimization algorithm of the edge GAN and semantic segmentation GAN is Adam [68]. The experiment’s input size is 256 × 256, and the number of training epochs is 100. The edge GAN and semantic segmentation GAN architecture parameters are shown in Tables I and II.

    Table  II.  Parameters of D1 and D2
    InputKernelStridePadding
    D1_C1, D2_C1[4×4, 64]21
    D1_C2, D2_C2[4×4, 128]21
    D1_C3, D2_C3[4×4, 256]21
    D1_C4, D2_C4[4×4, 512]11
    D1_C5, D2_C5[4×4, 1]11
     | Show Table
    DownLoad: CSV

    To the best of our knowledge, there is no direct semantic segmentation method for foggy images for comparison; however, OCR [48] and HANet [49] have achieved remarkable results on cityscapes datasets in public without additional training data. Among them, HANet [49] achieved the best performance. To verify the performance of FISS GAN, we compare it with OCR [48] and HANet [49]. Our training and validation data come from the foggy cityscapes dataset mentioned above, and we separately train OCR [48], HANet [49] and FISS GAN on Dataset 1, Dataset 2, and Dataset 3. Meanwhile, we use the foggy driving dataset as the test data.

    The qualitative experimental results on Dataset 1, Dataset 2, and Dataset 3 are separately shown in Figs. 4-6. The semantic segmentation effect of FISS GAN is better than that of OCR [48] and HANet [49] on each dataset. To further determine the performance of each model, the mean intersection over union (IoU) score of each model is calculated in this paper (Table III). As shown in Table III, the mean IoU scores of FISS GAN on Dataset 1, Dataset 2, and Dataset 3 are 69.37%, 65.94%, and 64.01%, respectively, which are all higher than the corresponding scores of OCR [48] and HANet [49], and FISS GAN achieved state-of-the-art performance. These results indicate that FISS GAN can extract more features from a foggy image than OCR [48] and HANet [49]. Meanwhile, regardless of the method, the mean IoU score on Dataset 1 was higher than that on Dataset 2 and Dataset 3. According to our analysis, the ultimate reason for this difference is that images in Dataset 1 have small attenuation coefficients, which means the image pixels from Dataset 1 are smaller than those from Dataset 2 and Dataset 3, and the images in Dataset 1 have more texture than those in Dataset 2 and Dataset 3. Therefore, it is easier to extract and express the features of images in Dataset 1 than those of Dataset 2 and Dataset 3.

    Figure  4.  The qualitative experimental results of each model on Dataset 1.
    Figure  5.  The qualitative experimental results of each model on Dataset 2.
    Figure  6.  The qualitative experimental results of each model on Dataset 3.
    Table  III.  The Mean IoU Score of Each Model
    DatasetModelsMean IoU (%)
    Dataset 1OCR [48]24.44
    HANet [49]43.85
    FISS GAN69.37
    Dataset 2OCR [48]20.88
    HANet [49]41.45
    FISS GAN65.94
    Dataset 3OCR [48]19.94
    HANet [49]39.73
    FISS GAN64.01
     | Show Table
    DownLoad: CSV

    Additionally, we test the pixel accuracy of the edge GAN on each dataset. The qualitative experimental results of each dataset are shown in Fig. 7, and the quantitative experimental results of each dataset are shown in Table IV. The pixel accuracy of Dataset 1 is 87.79%, which is slightly larger than that of Dataset 2 and Dataset 3, which indicates that the edge GAN can efficiently generate edge images, and more edge features can be extracted from the dataset with less fog.

    Figure  7.  The qualitative experimental results of each dataset.
    Table  IV.  The Quantitative Experimental Results of Each Dataset
    DatasetsPixel accuracy (%)
    Dataset 187.79
    Dataset 287.35
    Dataset 386.92
     | Show Table
    DownLoad: CSV

    We count the validation data of OCR [48], HANet [49] and FISS GAN to create a mean IoU diagram (Fig. 8) and loss diagram (Fig. 9) for each model. The X-axis of both Fig. 8 and Fig. 9 is the epoch. The Y-axis of Fig. 8 is the mean IoU value, while the Y-axis of Fig. 9 is the loss value. To be more specific, the loss value of OCR [48], HANet [49] were obtained from their open-source code, while the loss value of FISS GAN is G2Loss. As seen in Fig. 8, the mean IoU value of the verification data is not significantly different from that of the test data. Meanwhile, Fig. 9 shows that the loss of OCR [48], HANet [49] and FISS GAN tends to decrease or stabilize. Therefore, the OCR model [48], HANet model [49] and FISS GAN model are all convergent models.

    Figure  8.  Validation mean IoU for OCR [48], HANet [49] and FISS GAN.
    Figure  9.  Validation loss for OCR [48], HANet [49], and FISS GAN.

    To verify that the dilated convolution in the dilated convolution U_Net can extract more features than the standard convolution, we separately use the dilated convolution and standard convolution (standard convolution U_Net) to train and test the FISS GAN (edge GAN and semantic segmentation GAN). The datasets (training and test datasets), FISS GAN parameters, and epoch numbers are the same as in the above experiments. The pixel accuracy and mean IoU are shown in Table V. As seen in Table V, regardless of the dataset, the pixel accuracy and the mean IoU achieved through dilated convolution U_Net are higher than those of standard convolution U_Net.

    Table  V.  Comparison Results of Standard Convolution U_Net and Dilated Convolution U_Net (%)
    DatasetsPixel accuracyMean IoU
    Standard convolution U_NetDilated convolution U_NetStandard convolution U_NetDilated convolution U_Net
    Dataset 166.357.7958.9569.37
    Dataset 265.9787.3556.4965.94
    Dataset 364.9186.9254.3664.01
     | Show Table
    DownLoad: CSV

    Additionally, to verify the edge effect on FISS GAN, we replace the edge achieved from the semantic segmentation images with edges achieved from foggy images and trained FISS GAN (edge GAN and semantic segmentation GAN) with the same experimental settings above. The pixel accuracy and mean IoU are shown in Table VI. As seen in Table VI, with the same dataset, the pixel accuracy and mean IoU achieved from the edges of semantic segmentation images are slightly higher than those obtained from the other methods. This experiment indicates that the edge achieved from the semantic segmentation images could provide more guided information than the edges achieved from foggy images.

    Table  VI.  Comparison Results of Different Edges (%)
    DatasetsPixel accuracyMean IoU
    Edges of foggy imagesEdges of semantic segmentation imagesEdges of foggy imagesEdges of semantic segmentation images
    Dataset 174.4187.7960.0769.37
    Dataset 274.2587.3558.4265.94
    Dataset 373.6886.9257.8564.01
     | Show Table
    DownLoad: CSV

    Currently, semantic segmentation methods for foggy images are based on fog-free images or clear images, which do not explore the relation between foggy images and their semantic segmentation images. A semantic segmentation method (FISS GAN) has been proposed in this paper that can directly process foggy images. FISS GAN was composed of edge GAN and semantic segmentation GAN. Specifically, FISS GAN first obtained edge information from foggy images with edge GAN and then achieved semantic segmentation results with semantic segmentation GAN using foggy images and their edge information as inputs. Experiments based on foggy cityscapes and foggy driving datasets have shown that FISS GAN can directly extract the features from foggy images and achieve state-of-the-art results for semantic segmentation. Although FISS GAN can directly extract the features from a foggy image and realize its semantic segmentation, it cannot accurately segment a foggy image with a limited texture. In the future, we will focus on designing a more efficient feature extraction network to improve the accuracy of the semantic segmentation of foggy images.

  • [1]
    L. Chen, W. J. Zhan, W. Tian, Y. H. He, and Q. Zou, “Deep integration: A multi-label architecture for road scene recognition,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 4883–4898, Oct. 2019. doi: 10.1109/TIP.2019.2913079
    [2]
    K. Wada, K. Okada, and M. Inaba, “Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter,” in Proc. IEEE Int. Conf. Robotics and Autom., Montreal, Canada, 2019, pp. 9558–9564.
    [3]
    Y. C. Ouyang, L. Dong, L. Xue and C. Y. Sun, “Adaptive control based on neural networks for an uncertain 2-DOF helicopter system with input deadzone and output constraints,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 807–815, May 2019. doi: 10.1109/JAS.2019.1911495
    [4]
    Y. H. Luo, S. N. Zhao, D. S. Yang, and H. W. Zhang, “A new robust adaptive neural network backstepping control for single machine infinite power system with TCSC,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 48–56, Jan. 2020.
    [5]
    N. Zerari, M. Chemachema, and N. Essounbouli, “Neural network based adaptive tracking control for a class of pure feedback nonlinear systems with input saturation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 278–290, Jan. 2019. doi: 10.1109/JAS.2018.7511255
    [6]
    D. Wu and X. Luo, “Robust latent factor analysis for precise representation of high-dimensional and sparse data,” IEEE/CAA J. Autom. Sinica, pp. 766–805, Dec. 2019.
    [7]
    X. Luo, Y. Yuan, S. L. Chen, N. Y. Zeng, and Z. D. Wang, “Position-transitional particle swarm optimization-incorporated latent factor analysis,” IEEE Trans. Knowl. Data. En., pp. 1–1, Oct. 2019.
    [8]
    A. Cantor, “Optics of the atmosphere: Scattering by molecules and particles,” IEEE J. Quantum. Elect., vol. 14, no. 9, pp. 698–699, Sept. 1978.
    [9]
    S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” Int. J. Comput. Vision, vol. 48, no. 3, pp. 233–254, Jul. 2002. doi: 10.1023/A:1016328200723
    [10]
    P. Luc, C. Couprie, S, Chintala, and J. Verbeek, “Semantic segmentation using adversarial networks,” arXiv preprint arXiv: 1611.08408, Dec. 2016.
    [11]
    A. Arnab, S. Jayasumana, S. Zheng, and P. H. S. Torr, “Higher order conditional random fields in deep neural networks,” in Proc. European Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe and M. Welling Eds. Cham, Germany: Springer, 2016, pp. 524–540.
    [12]
    P. Isola, J. Y. Zhu, T. H. Zhou, and A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE. Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 1125–1134.
    [13]
    J. Hoffman, E. Tzeng, T. Park, Y. J. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1989–1998.
    [14]
    K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, “Edgeconnect: Generative image inpainting with adversarial edge learning,” arXiv preprint arXiv: 1901.00212, Jan. 2019.
    [15]
    O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Medical Image Computing and Computer-assisted Intervention. Cham, Germany: Springer, 2015, pp. 234–241.
    [16]
    J. Y. Kim, L. S. Kim, and S. H. Hwang, “An advanced contrast enhancement using partially overlapped sub-block histogram equalization,” in Proc. Int. Conf. IEEE Symposium on Circuits and Systems, Geneva, Switzerland, 2000, pp. 537–540.
    [17]
    A. Eriksson, G. Capi, and K. Doya, “Evolution of meta-parameters in reinforcement learning algorithm,” in Proc. IEEE/RSJ. Int. Conf. Intelligent Robots and System, Las Vegas, USA, 2003, pp. 412–417.
    [18]
    M. J. Seow and V. K. Asari, “Ratio rule and homomorphic filter for enhancement of digital colour image,” Neurocomputing, vol. 69, no. 7–9, pp. 954–958, Mar. 2006. doi: 10.1016/j.neucom.2005.07.003
    [19]
    S. Shwartz, E. Namer, and Y. Y. Schechner, “Blind haze separation,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, New York, USA, 2006, pp. 1984–1991.
    [20]
    Y. Y. Schechner and Y. Averbuch, “Regularized image recovery in scattering media,” IEEE Trans. Pattern Anal., vol. 29, no. 9, pp. 1655–1660, Sept. 2007. doi: 10.1109/TPAMI.2007.1141
    [21]
    R. T. Tan, “Visibility in bad weather from a single image,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, Anchorage, USA, 2008, pp. 1–8.
    [22]
    R. Fattal, “Single image dehazing,” ACM Trans. Graphic., vol. 27, no. 3, pp. 1–9, Aug. 2008.
    [23]
    K. M. He, J. Sun, and X. O. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal., vol. 33, no. 12, pp. 2341–2353, Dec. 2011. doi: 10.1109/TPAMI.2010.168
    [24]
    K. B. Gibson and T. Q. Nguyen, “On the effectiveness of the dark channel prior for single image dehazing by approximating with minimum volume ellipsoids,” in Proc. IEEE. Int. Conf. Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011, pp. 1253–1256.
    [25]
    D. F. Shi, B. Li, W. Ding, and Q. M. Chen, “Haze removal and enhancement using transmittance-dark channel prior based on object spectral characteristic,” Acta Autom. Sinica, vol. 39, no. 12, pp. 2064–2070, Dec. 2013.
    [26]
    S. G. Narasimhan and S. K. Nayar, “Interactive (de) weathering of an image using physical models,” in Proc. IEEE Workshop on color and photometric Methods in computer Vision, vol. 6, no. 4, Article No. 1, Jan. 2003.
    [27]
    S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision in bad weather,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, Hilton Head, USA, 2000, pp. 598–605.
    [28]
    J. Tarel and N. Hautière, “Fast visibility restoration from a single color or gray level image,” in Proc. 12th IEEE Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 2201–2208.
    [29]
    H. Zhang, V. Sindagi, and V. M. Patel, “Joint transmission map estimation and dehazing using deep networks,” IEEE Trans. Circ. Syst. Vid., vol. 30, no. 7, Jul. 2020.
    [30]
    W. Q. Ren, S. Liu, H. Zhang, J. S. Pan, X. C. Cao, and M. H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in Proc. European Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe and M. Welling Eds. Cham, Germany: Springer, 2016, pp. 154–169.
    [31]
    H. Zhang and V. M. Patel, “Densely connected pyramid defogging network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 3194–3203.
    [32]
    B. L. Cai, X. M. Xu, K. Jia, C. M. Qing, and D. C. Tao, “DehazeNet: An end-to-end system for single image haze removal,” IEEE Trans. Image Process., vol. 25, no. 11, pp. 5187–5198, Nov. 2016. doi: 10.1109/TIP.2016.2598681
    [33]
    D. D. Chen, M. M. He, Q. N. Fan, J. Liao, L. H. Zhang, D. D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1375–1383.
    [34]
    S. Y. Huang, H. X. Li, Y. Yang, B. Wang, and N. N. Rao, “An end-to-end dehazing network with transitional convolution layer,” Multidim. Syst. Sign P., vol. 31, no. 4, pp. 1603–1623, Mar. 2020. doi: 10.1007/s11045-020-00723-2
    [35]
    H. Zhang, V. Sindagi, and V. M. Patel, “Multi-scale single image dehazing using perceptual pyramid deep network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018, pp. 902–911.
    [36]
    X. Qin, Z. L. Wang, Y. C. Bai, X. D. Xie, and H. Z. Jia, “FFA-Net: Feature fusion attention network for single image defogging,” arXiv preprint arXiv: 1911.07559, Nov. 2019.
    [37]
    Q. S. Yi, A. W. Jiang, J. C. Li, J. Y. Wan, and M. W. Wang, “Progressive back-traced dehazing network based on multi-resolution recurrent reconstruction,” IEEE Access, vol. 8, pp. 54514–54521, Mar. 2020. doi: 10.1109/ACCESS.2020.2981491
    [38]
    B. Y. Li, X. L. Peng, Z. Y. Wang, J. Z. Xu, and D. Feng, “An all-in-one network for defogging and beyond,” arXiv preprint arXiv: 1707.06543, Jul. 2017.
    [39]
    H. Y. Zhu, X. Peng, V. Chandrasekhar, L. Y. Li, and J. H. Lim, “DehazeGAN: When image dehazing meets differential programming,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 1234–1240.
    [40]
    R. D. Li, J. S. Pan, Z. C. Li, and J. H. Tang, “Single image dehazing via conditional generative adversarial network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8202–8211.
    [41]
    D. Engin, A. Genc, and H. K. Ekenel, “Cycle-dehaze: enhanced cycleGAN for single image dehazing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018, pp. 825–833.
    [42]
    G. Kim, J. Park, S. Ha, and J. Kwon, “Bidirectional deep residual learning for haze removal,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Long Beach Convention and Entertainment Center, USA, 2018, pp. 46–54.
    [43]
    A. Dudhane and S. Murala, “CDNet: Single image de-hazing using unpaired adversarial training,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1147–1155.
    [44]
    P. Sharma, P. Jain, and A. Sur, “Scale-aware conditional generative adversarial network for image defogging,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Snowmass, USA, 2020, pp. 2355–2365.
    [45]
    W. D. Yan, A. Sharma, and R. T. Tan, “Optical flow in dense foggy scenes using semi-supervised learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 13259–13268.
    [46]
    J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 3431–3440.
    [47]
    L. Chen, W. J. Zhan, J. J. Liu, W. Tian, and D. P. Cao, “Semantic segmentation via structured refined prediction and dual global priors,” in Proc. IEEE Int. Conf. Advanced Robotics and Mechatronics, Toyonaka, Japan, 2019, pp. 53–58.
    [48]
    Y. H. Yuan, X. L. Chen, and J. D. Wang, “Object-contextual representations for semantic segmentation,” arXiv preprint arXiv: 1909.11065, Sept. 2019.
    [49]
    S. Choi, J. T. Kim, and J. Choo, “Cars can’t fly up in the sky: Improving urban-scene segmentation via height-driven attention networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 9373–9383.
    [50]
    M. H. Yin, Z. L. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu, “Disentangled non-local neural networks,” arXiv preprint arXiv: 2006.06668, Sept. 2020.
    [51]
    H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, and J. Y. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2881–2890.
    [52]
    L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 801–818.
    [53]
    H. Y. Chen, L. H. Tsai, S. C. Chang, J. Y. Pan, Y. T. Chen, W. Wei, and D. C. Juan, “Learning with hierarchical complement objective,” arXiv preprint arXiv: 1911.07257, Nov, 2019.
    [54]
    J. Fu, J. Liu, H. J. Tian, Y. Li, Y. J. Bao, Z. W. Fang, and H. Q. Lu, “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 3146–3154.
    [55]
    C. Zhang, G. S. Lin, F. Y. Liu, R. Yao, and C. H. Chen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5217–5226.
    [56]
    J. J. He, Z. Y. Deng, L. Zhou, Y. L. Wang, and Y. Qiao, “Adaptive pyramid context network for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 7519–7528.
    [57]
    J. Hoffman, D. Q. Wang, F. Yu, and T. Darrell, “FCNs in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv: 1612.02649, Dec, 2016.
    [58]
    Y. H. Zhang, Z. F. Qiu, T. Yao, D. Liu, and T. Mei, “Fully convolutional adaptation networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 6810–6818.
    [59]
    Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim, “Image to image translation for domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 4500–4509.
    [60]
    W. X. Hong, Z. Z. Wang, M. Yang, and J. S. Yuan, “Conditional generative adversarial network for structured domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 1335–1344.
    [61]
    Y. W. Luo, L. Zheng, T. Guan, J. Q. Yu, and Y. Yang, “Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 2502–2511.
    [62]
    Y. S. Li, L. Yuan, and N. Vasconcelos, “Bidirectional learning for domain adaptation of semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 6936–6945.
    [63]
    J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. doi: 10.1109/TPAMI.1986.4767851
    [64]
    J. Y. Zhu, T. Park, P. Isola and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
    [65]
    C. Sakaridis, D. X. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” Int. J. Comput. Vision, vol. 126, no. 9, pp. 973–992, Mar. 2018. doi: 10.1007/s11263-018-1072-8
    [66]
    V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. 27th Int. Conf. on Int. Conf. on Machine Learning, F. Johannes and J. Thorsten, Eds, 2010, pp. 807–814.
    [67]
    A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. ICML, vol. 30, no. 1, pp. 3, 2013.
    [68]
    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, Dec. 2014.
  • Related Articles

    [1]Haotian Xu, Shuai Liu, Yueyang Li, Ke Li. Distributed Observer for Full-Measured Nonlinear Systems Based on Knowledge of FMCF[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(1): 69-85. doi: 10.1109/JAS.2024.124467
    [2]Yifan Yuan, Guanqun Yang, James Z. Wang, Hui Zhang, Hongming Shan, Fei-Yue Wang, Junping Zhang. Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(4): 705-718. doi: 10.1109/JAS.2024.124800
    [3]Jinyuan Liu, Xingyuan Li, Zirui Wang, Zhiying Jiang, Wei Zhong, Wei Fan, Bin Xu. PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(3): 502-515. doi: 10.1109/JAS.2024.124878
    [4]Xiaorui Li, Xiaojuan Ban, Haoran Qiao, Zhaolin Yuan, Hong-Ning Dai, Chao Yao, Yu Guo, Mohammad S. Obaidat, George Q. Huang. Multi-Scale Time Series Segmentation Network Based on Eddy Current Testing for Detecting Surface Metal Defects[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(3): 528-538. doi: 10.1109/JAS.2025.125117
    [5]Zimo Yin, Jian Pu, Yijie Zhou, Xiangyang Xue. Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(11): 2270-2283. doi: 10.1109/JAS.2024.124629
    [6]Wenqi Ren, Yang Tang, Qiyu Sun, Chaoqiang Zhao, Qing-Long Han. Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(5): 1106-1126. doi: 10.1109/JAS.2023.123207
    [7]Yang Li, Xiao Wang, Zhifan He, Ze Wang, Ke Cheng, Sanchuan Ding, Yijing Fan, Xiaotao Li, Yawen Niu, Shanpeng Xiao, Zhenqi Hao, Bin Gao, Huaqiang Wu. Industry-Oriented Detection Method of PCBA Defects Using Semantic Segmentation Models[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1438-1446. doi: 10.1109/JAS.2024.124422
    [8]Zefeng Zheng, Luyao Teng, Wei Zhang, Naiqi Wu, Shaohua Teng. Knowledge Transfer Learning via Dual Density Sampling for Resource-Limited Domain Adaptation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(12): 2269-2291. doi: 10.1109/JAS.2023.123342
    [9]Xinhua Wang, Shasha Zhao, Lei Guo, Lei Zhu, Chaoran Cui, Liancheng Xu. GraphCA: Learning From Graph Counterfactual Augmentation for Knowledge Tracing[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2108-2123. doi: 10.1109/JAS.2023.123678
    [10]Yutong Wang, Xiao Wang, Xingxia Wang, Jing Yang, Oliver Kwan, Lingxi Li, Fei-Yue Wang. The ChatGPT After: Building Knowledge Factories for Knowledge Workers with Knowledge Automation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2041-2044. doi: 10.1109/JAS.2023.123966
    [11]Ye Lin, Zhezhuang Xu, Dan Chen, Zhijie Ai, Yang Qiu, Yazhou Yuan. Wood Crack Detection Based on Data-Driven Semantic Segmentation Network[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(6): 1510-1512. doi: 10.1109/JAS.2023.123357
    [12]Zheyun Qin, Xiankai Lu, Xiushan Nie, Dongfang Liu, Yilong Yin, Wenguan Wang. Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(5): 1192-1208. doi: 10.1109/JAS.2023.123456
    [13]Yu Liu, Bin Jiang, Jiaming Xu. Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(3): 711-721. doi: 10.1109/JAS.2022.105863
    [14]Linfeng Tang, Yuxin Deng, Yong Ma, Jun Huang, Jiayi Ma. SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(12): 2121-2137. doi: 10.1109/JAS.2022.106082
    [15]Jun Chen, Kangle Wu, Yang Yu, Linbo Luo. CDP-GAN: Near-Infrared and Visible Image Fusion Via Color Distribution Preserved GAN[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(9): 1698-1701. doi: 10.1109/JAS.2022.105818
    [16]Chengli Peng, Jiayi Ma. Domain Adaptive Semantic Segmentation via Entropy-Ranking and Uncertain Learning-Based Self-Training[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(8): 1524-1527. doi: 10.1109/JAS.2022.105767
    [17]Cong Wang, Witold Pedrycz, ZhiWu Li, MengChu Zhou. Residual-driven Fuzzy C-Means Clustering for Image Segmentation[J]. IEEE/CAA Journal of Automatica Sinica, 2021, 8(4): 876-889. doi: 10.1109/JAS.2020.1003420
    [18]Xiaofeng Sun, Shuhan Shen, Hainan Cui, Lihua Hu, Zhanyi Hu. Geographic, Geometrical and Semantic Reconstruction of Urban Scene from High Resolution Oblique Aerial Images[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(1): 118-130. doi: 10.1109/JAS.2017.7510673
    [19]Qingpeng Zhang, David Haglin. Semantic Similarity between Ontologies at Different Scales[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2): 132-140.
    [20]Hongyu Zhao, Chuangbai Xiao, Jing Yu, Xiujie Xu. Single Image Fog Removal Based on Local Extrema[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(2): 158-165.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(9)  / Tables(6)

    Article Metrics

    Article views (1478) PDF downloads(129) Cited by()

    Highlights

    • No method has previously been developed to directly explore the relationship between foggy images and semantic segmentation images. We investigated this relationship and propose a generative adversarial network (GAN) for foggy image semantic segmentation.
    • We propose a novel efficient network architecture based on a combination of concepts from U_Net, called a dilated convolution U_Net. By incorporating dilated convolution layers and adjusting the feature size in the convolutional layer, dilated convolution U_Net has shown improved feature extraction and expression ability.
    • A direct FISS GAN that generates semantic segmentation images under edge information guidance is proposed. We show our method’s effectiveness through extensive experiments on foggy cityscapes datasets and foggy driving datasets and achieve state-of-the-art performance.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return