
IEEE/CAA Journal of Automatica Sinica
Citation: | K. H. Liu, Z. H. Ye, H. Y. Guo, D. P. Cao, L. Chen, and F.-Y. Wang, "FISS GAN: A Generative Adversarial Network for Foggy Image Semantic Segmentation," IEEE/CAA J. Autom. Sinica, vol. 8, no. 8, pp. 1428-1439, Aug. 2021. doi: 10.1109/JAS.2021.1004057 |
ENVIRONMENTAL perception plays a vital role in the fields of autonomous driving [1], robotics [2], etc., and this perception influences the subsequent decisions and control of such devices [3]-[5]. Fog is a common form of weather, and when fog exists, the pixel values of foggy images are irregularly higher than those of clear images. As a result, the texture of foggy images is less than that of clear images. There are already many methods for semantic segmentation of clear images, which can extract and express the features of clear images and achieve good semantic segmentation results. However, the performance of these methods on foggy images is poor. This poor performance occurs because current methods cannot efficiently extract and express the features of foggy images. Moreover, foggy image data are not sparse, and the current excellent work [6], [7] on sparse data cannot be used. Therefore, to date, researchers have developed two ways to address this problem:
In this method, first, a foggy image is converted to a fog-free image by defogging algorithms, and then the restored image is segmented by a semantic segmentation algorithm. Therefore, the defogging-segmentation method can be separated into two steps.
Step 1: Fog removal. According to the classic atmosphere scattering model [8], [9], a fog-free image can be represented by a foggy image
J(x)=I(x)−At(x)+A | (1) |
where
Step 2: Semantic segmentation of fog-free images. When semantic segmentation is performed, the algorithms’ inputs may be the fog-free image and its auxiliary information or only the fog-free image. Therefore, the problem of semantic image segmentation after defogging can be expressed as
S(x)=F(f(J(x),g(x))) | (2) |
where
In this method, first, a semantic segmentation model is trained based on clear images. Then, based on the trained semantic segmentation model and transfer learning, the semantic segmentation model is trained on foggy images. The semantic segmentation method based on transfer learning can also be separated into two steps.
Step 1: Training the semantic segmentation model with clear images. The method used to obtain the semantic segmentation model is the same as that shown in (2). However, the inputs for this method are clear images and their auxiliary information or only clear images. The training model can be expressed as
M=F(f(C(x),g(x))) | (3) |
where
Step 2: Training the transfer learning model with foggy images. Using the clear images as the source domain and foggy images as the target domain, the semantic segmentation model can be trained with foggy images based on the model above
S(x)=T(M,I(x))=T(F(f(C(x),g(x))),I(x)) | (4) |
where
These two methods can achieve semantic segmentation results for foggy images; however, they are based on defogged images or semantic segmentation models trained with clear images. Without this information, these two methods are useless. This study focuses on a new semantic segmentation method that directly explores the mapping relationship between foggy images and the resulting semantic segmentation images. The mathematical model can be expressed as follows:
S(x)=F(f(I(x),g(x))). | (5) |
It is challenging to solve (5). The motivation of this paper is to explore a semantic segmentation method that can efficiently solve (5), which is an efficient method to express the mapping relationship between foggy images and the resulting semantic segmentation images.
A generative adversarial network (GAN) is an efficient semantic segmentation method. Luc et al. [10] first explored the use of a GAN for clear image semantic segmentation because a GAN could enforce forms of higher-order consistency [11]. Subsequently, [12] and [13] also provided GANs for the semantic segmentation of clear images and achieved state-of-the-art performance. In this paper, we also explore the semantic segmentation method for foggy images based on a GAN. Additionally, based on the “lines first, color next” approach, edge images are used to provide auxiliary information for clear image inpainting [14]. This method has been shown to greatly improve the quality of clear image inpainting. In this paper, we also analyze the foggy image semantic segmentation (FISS) problem using the “lines first, color next” approach and use edge images as auxiliary information. Specifically, we first obtain the edge information of foggy images and then obtain the semantic segmentation results for foggy images under the guidance of this edge information. Based on the above ideas, a two-stage FISS GAN is provided in this paper. The main contributions of this paper are as follows:
1) We propose a novel efficient network architecture based on a combination of concepts from U_Net [15], called a dilated convolution U_Net. By incorporating dilated convolution layers and adjusting the feature size in the convolutional layer, dilated convolution U_Net has shown improved feature extraction and expression ability.
2) A direct FISS method (FISS GAN) that generates semantic segmentation images under edge information guidance is proposed. We show our method’s effectiveness through extensive experiments on foggy cityscapes datasets and foggy driving datasets and achieve state-of-the-art performance. To the best of our knowledge, this is the first paper to explore a direct FISS method.
The structure of this paper is as follows: Section I is the introduction; Section II introduces the work related to foggy images and semantic segmentation methods; Section III describes FISS GAN in detail; Section IV describes the experiments designed to verify the performance of FISS GAN; and Section V summarizes the full paper.
Most studies on foggy images are based on defogging methods. Image defogging methods can be divided into traditional defogging methods and deep learning-based defogging methods. Meanwhile, according to the different processing methods, traditional defogging methods can be divided into image enhancement defogging methods and physical model-based defogging methods. The methods based on image enhancement [16]-[18] do not consider the fog in the image and directly improve contrast or highlight image features to make the image clearer and achieve purpose of image defogging. However, when contrast is improved or image features are highlighted, some image information will be lost, and images defogged by this method will be obviously distorted.
The methods based on atmospheric scattering models [19]-[25] consider the fog in the image and study the image defogging mechanism or add other prior knowledge (scene depth information [26], [27]) to produce a clear image. Among these methods, the classic algorithms are the dark channel defogging method proposed by He et al. [23], an approach based on Markov random fields presented by TAN [21], and a visibility restoration algorithm proposed by Taral et al. [28]. The image defogging methods based on atmospheric scattering models provide better defogging results than those obtained by image enhancement. However, the parameters used in the methods that utilize atmospheric scattering models to defog an image, such as the defogging coefficient and transmittance, are selected according to experience, so the resulting image exhibits some distortion.
With the development of deep learning (DL), recent research has increasingly explored defogging methods based on DL. Some researchers obtained the transmission map of a fog image through a DL network and then defogged the image based on an atmospheric scattering model [29]-[32]. This kind of method does not need prior knowledge, but its dependence on parameters and models will also cause slight image distortion. Other researchers designed neural networks to study end-to-end defogging methods [33]-[38]. Moreover, with the development of GANs in image inpainting and image enhancement, researchers have also proposed image defogging methods based on GANs [39]-[44], which greatly improve the quality of image defogging. In addition to studies on defogging, researchers have studied methods for obtaining optical flow data from foggy images [45].
Semantic segmentation is a high-level perception task for robotic and autonomous driving. Prior semantic segmentation methods include color slices and conditional random fields (CRFs). With the development of DL, traditional DL-based semantic segmentation methods have greatly improved the accuracy of semantic segmentation. The fully convolutional network (FCN) [46] is the first semantic segmentation method based on traditional DL. However, due to its pooling operation, some information may be lost. Therefore, the accuracy of semantic segmentation with this method is low. To increase the accuracy of semantic segmentation, many improved semantic segmentation frameworks [15], [47]-[56] and improved loss functions [51] were subsequently proposed. Most traditional DL-based semantic segmentation methods are supervised. Supervised semantic segmentation methods can achieve good segmentation results, but they require a large amount of segmentation data. To solve this problem, Hoffman et al. [57] and Zhang et al. [58] proposed training semantic segmentation models through a synthetic dataset where the new model is trained to predict real data by transfer learning.
Luc et al. [10] introduced GAN into the field of semantic segmentation. The generator’s input is the image that needs to be segmented, and the output is the semantic segmentation classification of the image. The discriminator’s input is the ground truth of the semantic segmentation classification or the generated semantic segmentation classification, and the output is the judgment result of whether the input is a true value. In addition, considering GAN’s outstanding performance in transfer learning, researchers proposed a series of semantic segmentation GANs based on transfer learning. Pix2Pix [12] is a typical GAN model for semantic segmentation that considers semantic segmentation as one image-to-image translation problem and builds a general conditional GAN to solve it. Because domain adaptation cannot capture pixel-level and low-level domain shifts, Hoffman et al. [13] proposed cycle-consistent adversarial domain adaptation (CYCADA), which can adapt representations at both the pixel level and feature level and improve the precision of semantic segmentation.
An unsupervised general framework that extracts the same features of the source domain and target domain was proposed by Murez et al. [59]. To address the domain mismatch problem between real images and synthetic images, Hong et al. [60] proposed a network that integrates GAN into the FCN framework to reduce the gap between the source and target domains; Luo et al. [61] proposed a category-level adversarial network that enforces local semantic consistency during the trend of global alignment. To improve performance and solve the limited dataset problem of domain adaptation, Li et al. [62] presented a bidirectional learning framework of semantic segmentation in which the image translation model and the segmentation adaptation model were trained alternately and while promoting each other.
The approaches above can directly address clear images and achieve state-of-the-art performance. However, these methods cannot handle foggy images very well because of their weak texture characteristics. To the best of our knowledge, there has been no research on a direct semantic segmentation method for foggy images.
Unlike current semantic segmentation GANs [10], [12], which handle clear images and contain one part, FISS GAN (Fig. 1) handles foggy images and contains two parts: the edge GAN and the semantic segmentation GAN. The purpose of the edge GAN is to obtain the edge information of foggy images to assist with the semantic segmentation tasks. The edge directly achieved from foggy images contains all detailed edge information, while the edge information used for semantic segmentation is only its boundary information. Therefore, we use the edge information achieved from the ground truth of the semantic segmentation image as the ground truth in our edge GAN instead of the edge information from the clear image.
To clarify, we tested these two kinds of edges by the Canny algorithm [63]. The visual differences between the two edges are shown in Fig. 2. As seen in Fig. 2, the edge achieved directly from the foggy image contains too much useless information for semantic segmentation. In contrast, another edge is just the boundary of its semantic segmentation, which is appropriate for semantic segmentation.
The purpose of the semantic segmentation GAN is to accomplish the semantic segmentation of foggy images. The inputs of the semantic segmentation GAN are foggy images and edge images achieved from the edge GAN, and its outputs are the semantic segmentation results of foggy images. Therefore, based on the mathematical model for the semantic segmentation of foggy images (formula (5)), the mathematical model of FISS GAN can be expressed as follows:
S(x)=F(f(I(x),Egan(x))) | (6) |
where
To further improve feature extraction and expression abilities, we learn convolution and deconvolution features by combining thoughts from U_Net [15] and propose a new network architecture, namely, dilated convolution U_Net (Fig. 3). Dilated convolution U_Net consists of three convolution layers (C1, C2, and C3), four dilated convolution layers (DC), and three fusion layers (
Step 1: Fuse C3 and DC to obtain
Step 2: Fuse C2 and CT1 to obtain
Step 3: Fuse C1 and CT2 to obtain
The fusion approach of this paper is a concatenation operation. Three convolution layers and four dilated convolution layers are used to extract input features, and two deconvolution layers are used to express the extracted features. The size of each layer feature is shown in Fig. 3.
The differences between dilated convolution U_Net and U_Net [15] are as follows: 1) Dilated convolution U_Net incorporates dilated convolution layers to improve feature extraction ability. 2) In feature fusion, because the feature sizes of the convolution layers and deconvolution layers in U_Net [15] are different, the features of the convolution layers are randomly cropped, and this operation leads to features that do not correspond. Thus, some information from the fusion image might be lost in the fusion step. In the dilated convolution U_Net [15] proposed in this study, the feature sizes of the convolution layers and their corresponding deconvolution layers are the same, which means that the features of the convolution layers can be directly fused with the features of the deconvolution layers. Thus, no information will be lost in the fusion step. 3) U_Net achieves image feature extraction and expression from the convolution layers, maximum pooling layers, upsampling layers (first the bilinear layer, then the convolution layer or transformed convolution layers) and convolution layers. U_Net consists of 23 convolution layers, 4 maximum pooling layers and 4 upsampling layers. According to the convolution kernel and step size of U_Net, the number of parameters that need to be trained is 17 268 563. The dilated convolution U_Net proposed in this paper achieves image feature extraction and expression by convolution layers, dilated convolution layers, transformed convolution layers and convolution layers. This method consists of 3 convolution layers, 4 dilated convolution layers and 2 transformed convolution layers. With the convolution kernel and step size of dilated convolution U_Net (Table I), the number of parameters that need to be trained is 4 335 424. The more parameters that need to be trained, the more computations that are required. Therefore, dilated convolution U_Net has fewer network layers, fewer parameters, and less computation than U_Net.
Input | Kernel | Stride | Padding | |
Dilated convolution U_Net | C1 | [7×7, 64] | 1 | 3 |
C2 | [4×4, 128] | 2 | 1 | |
C3 | [4×4, 256] | 2 | 1 | |
DC | [3×3, 256] | 1 | 1 | |
CT1 | [4×4, 128] | 2 | 1 | |
CT2 | [4×4, 64] | 2 | 1 | |
G1_C3 | [7×7, 1] | 1 | 3 | |
G2_C3 | [7×7, 19] | 1 | 3 |
The architecture of the edge GAN, as shown in Fig. 1, includes the edge generator G1 and edge discriminator D1. The purpose of G1 is to generate an edge image similar to the ground truth edge image. G1 is composed of the dilated convolution U_Net and one convolution layer (G1_C3). Because the edge image is a set of 0 or 255 pixel values, it can be expressed by single-channel image data. Therefore, the size of G1_C3 is 1×H×W. The purpose of D1 is to determine whether the generated edge image is the ground truth image and provides feedback (please refer to “the false binary cross entropy (BCE) loss from
The loss function plays an important role in the neural network model. This function determines whether the neural network model converges or achieves good accuracy. The edge GAN includes G1 and D1. The loss function includes the loss function of G1 and that of D1. The inputs of D1 are the ground truth of the edge images and the edge images generated from G1, where the ground truth of the edge image is achieved by the Canny algorithm [63] from the semantic segmentation image. In addition, the output of D1 is whether its input is true. Specifically, the output is the probability matrix (0 ~ 1).
The value of the probability matrix is expected to be close to 1 after the ground truth passes through D1, which means that this edge image is the ground truth (the size of the matrix is the same as the size of the output matrix, and the label value is 1). In contrast, the value of the probability matrix of the generated edge image after passing through D1 is close to 0, which means that this edge image is a generated edge image (the size of the matrix is the same as the size of the output matrix, and the label value is 0). Therefore, the discriminator loss function of the edge GAN (D1 loss) is designed as the BCE loss of the discriminator output and its corresponding label.
Since the output of D1 includes the true value probability obtained by taking the ground truth of the edge image as the input and the false value probability obtained by taking the generated edge image as the input, the D1 loss has two parts: the BCE loss between the true value probability and 1, namely, true BCE loss, and the BCE loss between the false value probability and 0, namely, false BCE loss. Specifically, D1 loss is the average of true BCE loss and false BCE loss.
Let the foggy image be F, and the generated edge image be
D1Loss=BCELoss(FEL,1)+BCELoss(~FEL,0)2=−log(FEL)−log(~FEL)2. | (7) |
The features of the D1 convolution layer can adequately express the ground truth of the edge image or the generated edge image. Therefore, we achieve G1’s ability to generate images by narrowing the gap between the feature of the ground truth edge image and the feature of the generated edge image. The gap between the feature of the ground truth edge image and the feature of the generated edge image is calculated by L1 losses. Meanwhile, the false BCE loss from
Hence, the G1 loss is the sum of the L1 losses from each layer feature and the false BCE loss. If the input is FE, the convolution layer output of D1 is
G1Loss=5∑i=1L1Loss(DCi,~DCi)+BCELoss(~FEL,0)=5∑i=1|DCi−~DCi|−log(~FEL). | (8) |
Similar to the edge GAN, the semantic segmentation GAN includes the semantic segmentation generator G2 and semantic segmentation discriminator D2. The goal of G2 is to generate semantic segmentation classifications with the same ground truth as the semantic segmentation classifications. G2 is composed of the dilated convolution U_Net and one convolution layer (G2_C3). The goal of the semantic segmentation GAN is to divide the foggy images into n classes. Therefore, the size of G2_C3 is n×H×W. The purpose of D2 is to judge whether the generated semantic segmentation image is the ground truth image and provide feedback (please refer to “the false BCE loss from
The input of D2 is the ground truth of the semantic segmentation image of the foggy image and the semantic segmentation image generated by G2, and its output is the probability matrix (0 ~ 1), which indicates whether the input is ground truth. Therefore, similar to the D1 loss of the edge GAN, the discriminator loss function of the semantic segmentation GAN (D2 loss) includes two parts: the BCE loss between true value probability and 1, namely, true BCE loss, and the BCE loss between false value probability and 0, namely, false BCE loss. Specifically, D2 loss is the average of true BCE loss and false BCE loss.
Let the semantic segmentation image generated by G2 be
D2Loss=BCELoss(FSL,1)+BCELoss(~FSL,0)2=−log(FSL)−log(~FSL)2. | (9) |
In this paper, the loss function of G2 is designed as the sum of three parts: L1 loss between
G2Loss=L1Loss(FSL,~FSL)+BCELoss(~FSL,0)+CrossEntropyLoss(~FS,FS)=|FSL−~FSL|−n−1∑j=0(log(e~FSjn−1∑j=0e~FSj)×FSj)−log(~FSL). | (10) |
The loss functions of edge GANs are mathematical operations (linear operations) of several existing loss functions, which have all been proven to be convergent when proposed and are commonly used in GANs. Therefore, mathematical operations (linear operations) of several existing loss functions are also convergent, as are the loss functions of the semantic segmentation GAN.
The foggy cityscapes dataset [65] is a synthetic foggy dataset with 19 classifications (road, sidewalk, building, wall, etc.) for semantic foggy scene understanding (SFSU). It contains 2975 training images and 500 valuation images with β = 0.005 (β is the attenuation coefficient; the higher the attenuation coefficient is, the more fog there is in the image), 2975 training images and 500 valuation images with β = 0.01, and 2975 training images and 500 valuation images with β = 0.02. Due to the differences in the attenuation coefficients, we separate the foggy cityscapes dataset into three datasets. Dataset 1 is composed of 2975 training images and 500 valuation images with β = 0.005. Dataset 2 is composed of 2975 training images and 500 valuation images with β = 0.01, and Dataset 3 is composed of 2975 training images and 500 valuation images with β = 0.02. The corresponding semantic segmentation ground truth contains semantic segmentation images with color, semantic segmentation images with labels, images with instance labels and label files with polygon data. The ground truth of edge images is obtained from semantic segmentation images with color and by the Canny algorithm [63].
The foggy driving dataset [65] is a dataset with 101 real-world images that can be used to evaluate the trained models. We separately use Dataset 1, Dataset 2, and Dataset 3 to train the models and use the foggy driving dataset [65] as the test set to test the trained models. Due to the lack of training data and validation data, we carry out random flip, random crop, rotation, and translation operations on the data during training and verification to avoid the overfitting phenomena.
The activation function of the dilated convolution U_Net is ReLU [66], while that of G1_CT3 and G2_CT3 is sigmoid. The activation function of the first four layers in D1 and D2 is LeReLU [67], and the parameter value is 0.25, while that of the last layer is sigmoid. The optimization algorithm of the edge GAN and semantic segmentation GAN is Adam [68]. The experiment’s input size is 256 × 256, and the number of training epochs is 100. The edge GAN and semantic segmentation GAN architecture parameters are shown in Tables I and II.
Input | Kernel | Stride | Padding |
D1_C1, D2_C1 | [4×4, 64] | 2 | 1 |
D1_C2, D2_C2 | [4×4, 128] | 2 | 1 |
D1_C3, D2_C3 | [4×4, 256] | 2 | 1 |
D1_C4, D2_C4 | [4×4, 512] | 1 | 1 |
D1_C5, D2_C5 | [4×4, 1] | 1 | 1 |
To the best of our knowledge, there is no direct semantic segmentation method for foggy images for comparison; however, OCR [48] and HANet [49] have achieved remarkable results on cityscapes datasets in public without additional training data. Among them, HANet [49] achieved the best performance. To verify the performance of FISS GAN, we compare it with OCR [48] and HANet [49]. Our training and validation data come from the foggy cityscapes dataset mentioned above, and we separately train OCR [48], HANet [49] and FISS GAN on Dataset 1, Dataset 2, and Dataset 3. Meanwhile, we use the foggy driving dataset as the test data.
The qualitative experimental results on Dataset 1, Dataset 2, and Dataset 3 are separately shown in Figs. 4-6. The semantic segmentation effect of FISS GAN is better than that of OCR [48] and HANet [49] on each dataset. To further determine the performance of each model, the mean intersection over union (IoU) score of each model is calculated in this paper (Table III). As shown in Table III, the mean IoU scores of FISS GAN on Dataset 1, Dataset 2, and Dataset 3 are 69.37%, 65.94%, and 64.01%, respectively, which are all higher than the corresponding scores of OCR [48] and HANet [49], and FISS GAN achieved state-of-the-art performance. These results indicate that FISS GAN can extract more features from a foggy image than OCR [48] and HANet [49]. Meanwhile, regardless of the method, the mean IoU score on Dataset 1 was higher than that on Dataset 2 and Dataset 3. According to our analysis, the ultimate reason for this difference is that images in Dataset 1 have small attenuation coefficients, which means the image pixels from Dataset 1 are smaller than those from Dataset 2 and Dataset 3, and the images in Dataset 1 have more texture than those in Dataset 2 and Dataset 3. Therefore, it is easier to extract and express the features of images in Dataset 1 than those of Dataset 2 and Dataset 3.
Additionally, we test the pixel accuracy of the edge GAN on each dataset. The qualitative experimental results of each dataset are shown in Fig. 7, and the quantitative experimental results of each dataset are shown in Table IV. The pixel accuracy of Dataset 1 is 87.79%, which is slightly larger than that of Dataset 2 and Dataset 3, which indicates that the edge GAN can efficiently generate edge images, and more edge features can be extracted from the dataset with less fog.
Datasets | Pixel accuracy (%) |
Dataset 1 | 87.79 |
Dataset 2 | 87.35 |
Dataset 3 | 86.92 |
We count the validation data of OCR [48], HANet [49] and FISS GAN to create a mean IoU diagram (Fig. 8) and loss diagram (Fig. 9) for each model. The X-axis of both Fig. 8 and Fig. 9 is the epoch. The Y-axis of Fig. 8 is the mean IoU value, while the Y-axis of Fig. 9 is the loss value. To be more specific, the loss value of OCR [48], HANet [49] were obtained from their open-source code, while the loss value of FISS GAN is
To verify that the dilated convolution in the dilated convolution U_Net can extract more features than the standard convolution, we separately use the dilated convolution and standard convolution (standard convolution U_Net) to train and test the FISS GAN (edge GAN and semantic segmentation GAN). The datasets (training and test datasets), FISS GAN parameters, and epoch numbers are the same as in the above experiments. The pixel accuracy and mean IoU are shown in Table V. As seen in Table V, regardless of the dataset, the pixel accuracy and the mean IoU achieved through dilated convolution U_Net are higher than those of standard convolution U_Net.
Datasets | Pixel accuracy | Mean IoU | |||
Standard convolution U_Net | Dilated convolution U_Net | Standard convolution U_Net | Dilated convolution U_Net | ||
Dataset 1 | 66.35 | 7.79 | 58.95 | 69.37 | |
Dataset 2 | 65.97 | 87.35 | 56.49 | 65.94 | |
Dataset 3 | 64.91 | 86.92 | 54.36 | 64.01 |
Additionally, to verify the edge effect on FISS GAN, we replace the edge achieved from the semantic segmentation images with edges achieved from foggy images and trained FISS GAN (edge GAN and semantic segmentation GAN) with the same experimental settings above. The pixel accuracy and mean IoU are shown in Table VI. As seen in Table VI, with the same dataset, the pixel accuracy and mean IoU achieved from the edges of semantic segmentation images are slightly higher than those obtained from the other methods. This experiment indicates that the edge achieved from the semantic segmentation images could provide more guided information than the edges achieved from foggy images.
Datasets | Pixel accuracy | Mean IoU | |||
Edges of foggy images | Edges of semantic segmentation images | Edges of foggy images | Edges of semantic segmentation images | ||
Dataset 1 | 74.41 | 87.79 | 60.07 | 69.37 | |
Dataset 2 | 74.25 | 87.35 | 58.42 | 65.94 | |
Dataset 3 | 73.68 | 86.92 | 57.85 | 64.01 |
Currently, semantic segmentation methods for foggy images are based on fog-free images or clear images, which do not explore the relation between foggy images and their semantic segmentation images. A semantic segmentation method (FISS GAN) has been proposed in this paper that can directly process foggy images. FISS GAN was composed of edge GAN and semantic segmentation GAN. Specifically, FISS GAN first obtained edge information from foggy images with edge GAN and then achieved semantic segmentation results with semantic segmentation GAN using foggy images and their edge information as inputs. Experiments based on foggy cityscapes and foggy driving datasets have shown that FISS GAN can directly extract the features from foggy images and achieve state-of-the-art results for semantic segmentation. Although FISS GAN can directly extract the features from a foggy image and realize its semantic segmentation, it cannot accurately segment a foggy image with a limited texture. In the future, we will focus on designing a more efficient feature extraction network to improve the accuracy of the semantic segmentation of foggy images.
[1] |
L. Chen, W. J. Zhan, W. Tian, Y. H. He, and Q. Zou, “Deep integration: A multi-label architecture for road scene recognition,” IEEE Trans. Image Process., vol. 28, no. 10, pp. 4883–4898, Oct. 2019. doi: 10.1109/TIP.2019.2913079
|
[2] |
K. Wada, K. Okada, and M. Inaba, “Joint learning of instance and semantic segmentation for robotic pick-and-place with heavy occlusions in clutter,” in Proc. IEEE Int. Conf. Robotics and Autom., Montreal, Canada, 2019, pp. 9558–9564.
|
[3] |
Y. C. Ouyang, L. Dong, L. Xue and C. Y. Sun, “Adaptive control based on neural networks for an uncertain 2-DOF helicopter system with input deadzone and output constraints,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 3, pp. 807–815, May 2019. doi: 10.1109/JAS.2019.1911495
|
[4] |
Y. H. Luo, S. N. Zhao, D. S. Yang, and H. W. Zhang, “A new robust adaptive neural network backstepping control for single machine infinite power system with TCSC,” IEEE/CAA J. Autom. Sinica, vol. 7, no. 1, pp. 48–56, Jan. 2020.
|
[5] |
N. Zerari, M. Chemachema, and N. Essounbouli, “Neural network based adaptive tracking control for a class of pure feedback nonlinear systems with input saturation,” IEEE/CAA J. Autom. Sinica, vol. 6, no. 1, pp. 278–290, Jan. 2019. doi: 10.1109/JAS.2018.7511255
|
[6] |
D. Wu and X. Luo, “Robust latent factor analysis for precise representation of high-dimensional and sparse data,” IEEE/CAA J. Autom. Sinica, pp. 766–805, Dec. 2019.
|
[7] |
X. Luo, Y. Yuan, S. L. Chen, N. Y. Zeng, and Z. D. Wang, “Position-transitional particle swarm optimization-incorporated latent factor analysis,” IEEE Trans. Knowl. Data. En., pp. 1–1, Oct. 2019.
|
[8] |
A. Cantor, “Optics of the atmosphere: Scattering by molecules and particles,” IEEE J. Quantum. Elect., vol. 14, no. 9, pp. 698–699, Sept. 1978.
|
[9] |
S. G. Narasimhan and S. K. Nayar, “Vision and the atmosphere,” Int. J. Comput. Vision, vol. 48, no. 3, pp. 233–254, Jul. 2002. doi: 10.1023/A:1016328200723
|
[10] |
P. Luc, C. Couprie, S, Chintala, and J. Verbeek, “Semantic segmentation using adversarial networks,” arXiv preprint arXiv: 1611.08408, Dec. 2016.
|
[11] |
A. Arnab, S. Jayasumana, S. Zheng, and P. H. S. Torr, “Higher order conditional random fields in deep neural networks,” in Proc. European Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe and M. Welling Eds. Cham, Germany: Springer, 2016, pp. 524–540.
|
[12] |
P. Isola, J. Y. Zhu, T. H. Zhou, and A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE. Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 1125–1134.
|
[13] |
J. Hoffman, E. Tzeng, T. Park, Y. J. Zhu, P. Isola, K. Saenko, A. Efros, and T. Darrell, “Cycada: Cycle-consistent adversarial domain adaptation,” in Proc. 35th Int. Conf. Machine Learning, Stockholm, Sweden, 2018, pp. 1989–1998.
|
[14] |
K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, and M. Ebrahimi, “Edgeconnect: Generative image inpainting with adversarial edge learning,” arXiv preprint arXiv: 1901.00212, Jan. 2019.
|
[15] |
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Proc. Int. Conf. Medical Image Computing and Computer-assisted Intervention. Cham, Germany: Springer, 2015, pp. 234–241.
|
[16] |
J. Y. Kim, L. S. Kim, and S. H. Hwang, “An advanced contrast enhancement using partially overlapped sub-block histogram equalization,” in Proc. Int. Conf. IEEE Symposium on Circuits and Systems, Geneva, Switzerland, 2000, pp. 537–540.
|
[17] |
A. Eriksson, G. Capi, and K. Doya, “Evolution of meta-parameters in reinforcement learning algorithm,” in Proc. IEEE/RSJ. Int. Conf. Intelligent Robots and System, Las Vegas, USA, 2003, pp. 412–417.
|
[18] |
M. J. Seow and V. K. Asari, “Ratio rule and homomorphic filter for enhancement of digital colour image,” Neurocomputing, vol. 69, no. 7–9, pp. 954–958, Mar. 2006. doi: 10.1016/j.neucom.2005.07.003
|
[19] |
S. Shwartz, E. Namer, and Y. Y. Schechner, “Blind haze separation,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, New York, USA, 2006, pp. 1984–1991.
|
[20] |
Y. Y. Schechner and Y. Averbuch, “Regularized image recovery in scattering media,” IEEE Trans. Pattern Anal., vol. 29, no. 9, pp. 1655–1660, Sept. 2007. doi: 10.1109/TPAMI.2007.1141
|
[21] |
R. T. Tan, “Visibility in bad weather from a single image,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, Anchorage, USA, 2008, pp. 1–8.
|
[22] |
R. Fattal, “Single image dehazing,” ACM Trans. Graphic., vol. 27, no. 3, pp. 1–9, Aug. 2008.
|
[23] |
K. M. He, J. Sun, and X. O. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal., vol. 33, no. 12, pp. 2341–2353, Dec. 2011. doi: 10.1109/TPAMI.2010.168
|
[24] |
K. B. Gibson and T. Q. Nguyen, “On the effectiveness of the dark channel prior for single image dehazing by approximating with minimum volume ellipsoids,” in Proc. IEEE. Int. Conf. Acoustics, Speech and Signal Processing, Prague, Czech Republic, 2011, pp. 1253–1256.
|
[25] |
D. F. Shi, B. Li, W. Ding, and Q. M. Chen, “Haze removal and enhancement using transmittance-dark channel prior based on object spectral characteristic,” Acta Autom. Sinica, vol. 39, no. 12, pp. 2064–2070, Dec. 2013.
|
[26] |
S. G. Narasimhan and S. K. Nayar, “Interactive (de) weathering of an image using physical models,” in Proc. IEEE Workshop on color and photometric Methods in computer Vision, vol. 6, no. 4, Article No. 1, Jan. 2003.
|
[27] |
S. G. Narasimhan and S. K. Nayar, “Chromatic framework for vision in bad weather,” in Proc. IEEE. Int. Conf. Computer Vision and Pattern Recognition, Hilton Head, USA, 2000, pp. 598–605.
|
[28] |
J. Tarel and N. Hautière, “Fast visibility restoration from a single color or gray level image,” in Proc. 12th IEEE Int. Conf. Computer Vision, Kyoto, Japan, 2009, pp. 2201–2208.
|
[29] |
H. Zhang, V. Sindagi, and V. M. Patel, “Joint transmission map estimation and dehazing using deep networks,” IEEE Trans. Circ. Syst. Vid., vol. 30, no. 7, Jul. 2020.
|
[30] |
W. Q. Ren, S. Liu, H. Zhang, J. S. Pan, X. C. Cao, and M. H. Yang, “Single image dehazing via multi-scale convolutional neural networks,” in Proc. European Conf. Computer Vision, B. Leibe, J. Matas, N. Sebe and M. Welling Eds. Cham, Germany: Springer, 2016, pp. 154–169.
|
[31] |
H. Zhang and V. M. Patel, “Densely connected pyramid defogging network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 3194–3203.
|
[32] |
B. L. Cai, X. M. Xu, K. Jia, C. M. Qing, and D. C. Tao, “DehazeNet: An end-to-end system for single image haze removal,” IEEE Trans. Image Process., vol. 25, no. 11, pp. 5187–5198, Nov. 2016. doi: 10.1109/TIP.2016.2598681
|
[33] |
D. D. Chen, M. M. He, Q. N. Fan, J. Liao, L. H. Zhang, D. D. Hou, L. Yuan, and G. Hua, “Gated context aggregation network for image dehazing and deraining,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1375–1383.
|
[34] |
S. Y. Huang, H. X. Li, Y. Yang, B. Wang, and N. N. Rao, “An end-to-end dehazing network with transitional convolution layer,” Multidim. Syst. Sign P., vol. 31, no. 4, pp. 1603–1623, Mar. 2020. doi: 10.1007/s11045-020-00723-2
|
[35] |
H. Zhang, V. Sindagi, and V. M. Patel, “Multi-scale single image dehazing using perceptual pyramid deep network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018, pp. 902–911.
|
[36] |
X. Qin, Z. L. Wang, Y. C. Bai, X. D. Xie, and H. Z. Jia, “FFA-Net: Feature fusion attention network for single image defogging,” arXiv preprint arXiv: 1911.07559, Nov. 2019.
|
[37] |
Q. S. Yi, A. W. Jiang, J. C. Li, J. Y. Wan, and M. W. Wang, “Progressive back-traced dehazing network based on multi-resolution recurrent reconstruction,” IEEE Access, vol. 8, pp. 54514–54521, Mar. 2020. doi: 10.1109/ACCESS.2020.2981491
|
[38] |
B. Y. Li, X. L. Peng, Z. Y. Wang, J. Z. Xu, and D. Feng, “An all-in-one network for defogging and beyond,” arXiv preprint arXiv: 1707.06543, Jul. 2017.
|
[39] |
H. Y. Zhu, X. Peng, V. Chandrasekhar, L. Y. Li, and J. H. Lim, “DehazeGAN: When image dehazing meets differential programming,” in Proc. 27th Int. Joint Conf. Artificial Intelligence, Stockholm, Sweden, 2018, pp. 1234–1240.
|
[40] |
R. D. Li, J. S. Pan, Z. C. Li, and J. H. Tang, “Single image dehazing via conditional generative adversarial network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 8202–8211.
|
[41] |
D. Engin, A. Genc, and H. K. Ekenel, “Cycle-dehaze: enhanced cycleGAN for single image dehazing,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, 2018, pp. 825–833.
|
[42] |
G. Kim, J. Park, S. Ha, and J. Kwon, “Bidirectional deep residual learning for haze removal,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Long Beach Convention and Entertainment Center, USA, 2018, pp. 46–54.
|
[43] |
A. Dudhane and S. Murala, “CDNet: Single image de-hazing using unpaired adversarial training,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Waikoloa, USA, 2019, pp. 1147–1155.
|
[44] |
P. Sharma, P. Jain, and A. Sur, “Scale-aware conditional generative adversarial network for image defogging,” in Proc. IEEE Winter Conf. Applications of Computer Vision, Snowmass, USA, 2020, pp. 2355–2365.
|
[45] |
W. D. Yan, A. Sharma, and R. T. Tan, “Optical flow in dense foggy scenes using semi-supervised learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 13259–13268.
|
[46] |
J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Boston, USA, 2015, pp. 3431–3440.
|
[47] |
L. Chen, W. J. Zhan, J. J. Liu, W. Tian, and D. P. Cao, “Semantic segmentation via structured refined prediction and dual global priors,” in Proc. IEEE Int. Conf. Advanced Robotics and Mechatronics, Toyonaka, Japan, 2019, pp. 53–58.
|
[48] |
Y. H. Yuan, X. L. Chen, and J. D. Wang, “Object-contextual representations for semantic segmentation,” arXiv preprint arXiv: 1909.11065, Sept. 2019.
|
[49] |
S. Choi, J. T. Kim, and J. Choo, “Cars can’t fly up in the sky: Improving urban-scene segmentation via height-driven attention networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Seattle, USA, 2020, pp. 9373–9383.
|
[50] |
M. H. Yin, Z. L. Yao, Y. Cao, X. Li, Z. Zhang, S. Lin, and H. Hu, “Disentangled non-local neural networks,” arXiv preprint arXiv: 2006.06668, Sept. 2020.
|
[51] |
H. S. Zhao, J. P. Shi, X. J. Qi, X. G. Wang, and J. Y. Jia, “Pyramid scene parsing network,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Honolulu, USA, 2017, pp. 2881–2890.
|
[52] |
L. C. Chen, Y. K. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proc. European Conf. Computer Vision, Munich, Germany, 2018, pp. 801–818.
|
[53] |
H. Y. Chen, L. H. Tsai, S. C. Chang, J. Y. Pan, Y. T. Chen, W. Wei, and D. C. Juan, “Learning with hierarchical complement objective,” arXiv preprint arXiv: 1911.07257, Nov, 2019.
|
[54] |
J. Fu, J. Liu, H. J. Tian, Y. Li, Y. J. Bao, Z. W. Fang, and H. Q. Lu, “Dual attention network for scene segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 3146–3154.
|
[55] |
C. Zhang, G. S. Lin, F. Y. Liu, R. Yao, and C. H. Chen, “Canet: Class-agnostic segmentation networks with iterative refinement and attentive few-shot learning,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 5217–5226.
|
[56] |
J. J. He, Z. Y. Deng, L. Zhou, Y. L. Wang, and Y. Qiao, “Adaptive pyramid context network for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 7519–7528.
|
[57] |
J. Hoffman, D. Q. Wang, F. Yu, and T. Darrell, “FCNs in the wild: Pixel-level adversarial and constraint-based adaptation,” arXiv preprint arXiv: 1612.02649, Dec, 2016.
|
[58] |
Y. H. Zhang, Z. F. Qiu, T. Yao, D. Liu, and T. Mei, “Fully convolutional adaptation networks for semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 6810–6818.
|
[59] |
Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim, “Image to image translation for domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 4500–4509.
|
[60] |
W. X. Hong, Z. Z. Wang, M. Yang, and J. S. Yuan, “Conditional generative adversarial network for structured domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Salt Lake City, USA, 2018, pp. 1335–1344.
|
[61] |
Y. W. Luo, L. Zheng, T. Guan, J. Q. Yu, and Y. Yang, “Taking a closer look at domain shift: Category-level adversaries for semantics consistent domain adaptation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 2502–2511.
|
[62] |
Y. S. Li, L. Yuan, and N. Vasconcelos, “Bidirectional learning for domain adaptation of semantic segmentation,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition, Long Beach, USA, 2019, pp. 6936–6945.
|
[63] |
J. Canny, “A computational approach to edge detection,” IEEE Trans. Pattern Anal., vol. PAMI-8, no. 6, pp. 679–698, Nov. 1986. doi: 10.1109/TPAMI.1986.4767851
|
[64] |
J. Y. Zhu, T. Park, P. Isola and A. A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Computer Vision, Venice, Italy, 2017, pp. 2223–2232.
|
[65] |
C. Sakaridis, D. X. Dai, and L. Van Gool, “Semantic foggy scene understanding with synthetic data,” Int. J. Comput. Vision, vol. 126, no. 9, pp. 973–992, Mar. 2018. doi: 10.1007/s11263-018-1072-8
|
[66] |
V. Nair and G. E. Hinton, “Rectified linear units improve restricted boltzmann machines,” in Proc. 27th Int. Conf. on Int. Conf. on Machine Learning, F. Johannes and J. Thorsten, Eds, 2010, pp. 807–814.
|
[67] |
A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearities improve neural network acoustic models,” in Proc. ICML, vol. 30, no. 1, pp. 3, 2013.
|
[68] |
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv: 1412.6980, Dec. 2014.
|
[1] | Haotian Xu, Shuai Liu, Yueyang Li, Ke Li. Distributed Observer for Full-Measured Nonlinear Systems Based on Knowledge of FMCF[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(1): 69-85. doi: 10.1109/JAS.2024.124467 |
[2] | Yifan Yuan, Guanqun Yang, James Z. Wang, Hui Zhang, Hongming Shan, Fei-Yue Wang, Junping Zhang. Dissecting and Mitigating Semantic Discrepancy in Stable Diffusion for Image-to-Image Translation[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(4): 705-718. doi: 10.1109/JAS.2024.124800 |
[3] | Jinyuan Liu, Xingyuan Li, Zirui Wang, Zhiying Jiang, Wei Zhong, Wei Fan, Bin Xu. PromptFusion: Harmonized Semantic Prompt Learning for Infrared and Visible Image Fusion[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(3): 502-515. doi: 10.1109/JAS.2024.124878 |
[4] | Xiaorui Li, Xiaojuan Ban, Haoran Qiao, Zhaolin Yuan, Hong-Ning Dai, Chao Yao, Yu Guo, Mohammad S. Obaidat, George Q. Huang. Multi-Scale Time Series Segmentation Network Based on Eddy Current Testing for Detecting Surface Metal Defects[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(3): 528-538. doi: 10.1109/JAS.2025.125117 |
[5] | Zimo Yin, Jian Pu, Yijie Zhou, Xiangyang Xue. Two-Stage Approach for Targeted Knowledge Transfer in Self-Knowledge Distillation[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(11): 2270-2283. doi: 10.1109/JAS.2024.124629 |
[6] | Wenqi Ren, Yang Tang, Qiyu Sun, Chaoqiang Zhao, Qing-Long Han. Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(5): 1106-1126. doi: 10.1109/JAS.2023.123207 |
[7] | Yang Li, Xiao Wang, Zhifan He, Ze Wang, Ke Cheng, Sanchuan Ding, Yijing Fan, Xiaotao Li, Yawen Niu, Shanpeng Xiao, Zhenqi Hao, Bin Gao, Huaqiang Wu. Industry-Oriented Detection Method of PCBA Defects Using Semantic Segmentation Models[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1438-1446. doi: 10.1109/JAS.2024.124422 |
[8] | Zefeng Zheng, Luyao Teng, Wei Zhang, Naiqi Wu, Shaohua Teng. Knowledge Transfer Learning via Dual Density Sampling for Resource-Limited Domain Adaptation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(12): 2269-2291. doi: 10.1109/JAS.2023.123342 |
[9] | Xinhua Wang, Shasha Zhao, Lei Guo, Lei Zhu, Chaoran Cui, Liancheng Xu. GraphCA: Learning From Graph Counterfactual Augmentation for Knowledge Tracing[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2108-2123. doi: 10.1109/JAS.2023.123678 |
[10] | Yutong Wang, Xiao Wang, Xingxia Wang, Jing Yang, Oliver Kwan, Lingxi Li, Fei-Yue Wang. The ChatGPT After: Building Knowledge Factories for Knowledge Workers with Knowledge Automation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2041-2044. doi: 10.1109/JAS.2023.123966 |
[11] | Ye Lin, Zhezhuang Xu, Dan Chen, Zhijie Ai, Yang Qiu, Yazhou Yuan. Wood Crack Detection Based on Data-Driven Semantic Segmentation Network[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(6): 1510-1512. doi: 10.1109/JAS.2023.123357 |
[12] | Zheyun Qin, Xiankai Lu, Xiushan Nie, Dongfang Liu, Yilong Yin, Wenguan Wang. Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(5): 1192-1208. doi: 10.1109/JAS.2023.123456 |
[13] | Yu Liu, Bin Jiang, Jiaming Xu. Axial Assembled Correspondence Network for Few-Shot Semantic Segmentation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(3): 711-721. doi: 10.1109/JAS.2022.105863 |
[14] | Linfeng Tang, Yuxin Deng, Yong Ma, Jun Huang, Jiayi Ma. SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(12): 2121-2137. doi: 10.1109/JAS.2022.106082 |
[15] | Jun Chen, Kangle Wu, Yang Yu, Linbo Luo. CDP-GAN: Near-Infrared and Visible Image Fusion Via Color Distribution Preserved GAN[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(9): 1698-1701. doi: 10.1109/JAS.2022.105818 |
[16] | Chengli Peng, Jiayi Ma. Domain Adaptive Semantic Segmentation via Entropy-Ranking and Uncertain Learning-Based Self-Training[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(8): 1524-1527. doi: 10.1109/JAS.2022.105767 |
[17] | Cong Wang, Witold Pedrycz, ZhiWu Li, MengChu Zhou. Residual-driven Fuzzy C-Means Clustering for Image Segmentation[J]. IEEE/CAA Journal of Automatica Sinica, 2021, 8(4): 876-889. doi: 10.1109/JAS.2020.1003420 |
[18] | Xiaofeng Sun, Shuhan Shen, Hainan Cui, Lihua Hu, Zhanyi Hu. Geographic, Geometrical and Semantic Reconstruction of Urban Scene from High Resolution Oblique Aerial Images[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(1): 118-130. doi: 10.1109/JAS.2017.7510673 |
[19] | Qingpeng Zhang, David Haglin. Semantic Similarity between Ontologies at Different Scales[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2): 132-140. |
[20] | Hongyu Zhao, Chuangbai Xiao, Jing Yu, Xiujie Xu. Single Image Fog Removal Based on Local Extrema[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(2): 158-165. |
Input | Kernel | Stride | Padding | |
Dilated convolution U_Net | C1 | [7×7, 64] | 1 | 3 |
C2 | [4×4, 128] | 2 | 1 | |
C3 | [4×4, 256] | 2 | 1 | |
DC | [3×3, 256] | 1 | 1 | |
CT1 | [4×4, 128] | 2 | 1 | |
CT2 | [4×4, 64] | 2 | 1 | |
G1_C3 | [7×7, 1] | 1 | 3 | |
G2_C3 | [7×7, 19] | 1 | 3 |
Input | Kernel | Stride | Padding |
D1_C1, D2_C1 | [4×4, 64] | 2 | 1 |
D1_C2, D2_C2 | [4×4, 128] | 2 | 1 |
D1_C3, D2_C3 | [4×4, 256] | 2 | 1 |
D1_C4, D2_C4 | [4×4, 512] | 1 | 1 |
D1_C5, D2_C5 | [4×4, 1] | 1 | 1 |
Datasets | Pixel accuracy (%) |
Dataset 1 | 87.79 |
Dataset 2 | 87.35 |
Dataset 3 | 86.92 |
Datasets | Pixel accuracy | Mean IoU | |||
Standard convolution U_Net | Dilated convolution U_Net | Standard convolution U_Net | Dilated convolution U_Net | ||
Dataset 1 | 66.35 | 7.79 | 58.95 | 69.37 | |
Dataset 2 | 65.97 | 87.35 | 56.49 | 65.94 | |
Dataset 3 | 64.91 | 86.92 | 54.36 | 64.01 |
Datasets | Pixel accuracy | Mean IoU | |||
Edges of foggy images | Edges of semantic segmentation images | Edges of foggy images | Edges of semantic segmentation images | ||
Dataset 1 | 74.41 | 87.79 | 60.07 | 69.37 | |
Dataset 2 | 74.25 | 87.35 | 58.42 | 65.94 | |
Dataset 3 | 73.68 | 86.92 | 57.85 | 64.01 |
Input | Kernel | Stride | Padding | |
Dilated convolution U_Net | C1 | [7×7, 64] | 1 | 3 |
C2 | [4×4, 128] | 2 | 1 | |
C3 | [4×4, 256] | 2 | 1 | |
DC | [3×3, 256] | 1 | 1 | |
CT1 | [4×4, 128] | 2 | 1 | |
CT2 | [4×4, 64] | 2 | 1 | |
G1_C3 | [7×7, 1] | 1 | 3 | |
G2_C3 | [7×7, 19] | 1 | 3 |
Input | Kernel | Stride | Padding |
D1_C1, D2_C1 | [4×4, 64] | 2 | 1 |
D1_C2, D2_C2 | [4×4, 128] | 2 | 1 |
D1_C3, D2_C3 | [4×4, 256] | 2 | 1 |
D1_C4, D2_C4 | [4×4, 512] | 1 | 1 |
D1_C5, D2_C5 | [4×4, 1] | 1 | 1 |
Datasets | Pixel accuracy (%) |
Dataset 1 | 87.79 |
Dataset 2 | 87.35 |
Dataset 3 | 86.92 |
Datasets | Pixel accuracy | Mean IoU | |||
Standard convolution U_Net | Dilated convolution U_Net | Standard convolution U_Net | Dilated convolution U_Net | ||
Dataset 1 | 66.35 | 7.79 | 58.95 | 69.37 | |
Dataset 2 | 65.97 | 87.35 | 56.49 | 65.94 | |
Dataset 3 | 64.91 | 86.92 | 54.36 | 64.01 |
Datasets | Pixel accuracy | Mean IoU | |||
Edges of foggy images | Edges of semantic segmentation images | Edges of foggy images | Edges of semantic segmentation images | ||
Dataset 1 | 74.41 | 87.79 | 60.07 | 69.37 | |
Dataset 2 | 74.25 | 87.35 | 58.42 | 65.94 | |
Dataset 3 | 73.68 | 86.92 | 57.85 | 64.01 |