CSDD: A Benchmark Dataset for Casting Surface Defect Detection and Segmentation

Kai Mao; Ping Wei; Yangyang Wang; Meiqin Liu; Shuaijie Wang; Nanning Zheng

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Abstract

I. Introduction

II. Related Work

III. Casting Surface Defect Dataset

References

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2025 > In Press, Accepted Manuscript

K. Mao, P. Wei, Y. Wang, M. Liu, S. Wang, and N. Zheng, “CSDD: A benchmark dataset for casting surface defect detection and segmentation,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 5, pp. 1–13, May 2025.

Citation:

K. Mao, P. Wei, Y. Wang, M. Liu, S. Wang, and N. Zheng, “CSDD: A benchmark dataset for casting surface defect detection and segmentation,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 5, pp. 1–13, May 2025.

Citation:

PDF( 33247 KB)

CSDD: A Benchmark Dataset for Casting Surface Defect Detection and Segmentation

Funds: This work was supported by the National Natural Science Foundation of China (U23B2060, 62088102) and the Key Research and Development Program of China (2020AAA0108305)

More Information

Author Bio:
Kai Mao received the B.E. degree of automation from Xidian University. He is currently a Ph.D. candidate at the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. His research interests focus on industrial defect detection and anomaly detection

Ping Wei (Semior Member, IEEE) received the B.E. and Ph.D. degrees from Xi’an Jiaotong University. He is currently a Professor with the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. He has been a postdoctoral researcher at University of California, USA. His research interests include computer vision, machine learning, and computational cognition

Yangyang Wang received the B.E. degree in software engineering from Zhengzhou University. He is currently a master student at the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. His research interests include industrial defect detection and image stitching

Meiqin Liu (Semior Member, IEEE) received the B.E. and Ph.D. degrees from Central South University, in 1994 and 1999, respectively. She was a Visiting Scholar with the University of New Orleans, USA, from 2008 to 2009. She was a Professor with the College of Electrical Engineering, Zhejiang University, China, from 2001 to 2021. She is currently a Professor with the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, China. Her current research interests include artificial intelligence, multi-sensor networks, and information fusion

Shuaijie Wang received the B.E. degree in automation from Xi’an Jiaotong University, the M.E. degree from the Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University. His research interest is metallic surface defect detection

Nanning Zheng (Life Fellow, IEEE) received B.S. degree in from the Department of Electrical Engineering, Xi’an Jiaotong University in 1975, and the M.S. degree in information and control engineering from Xi’an Jiaotong University in 1981 and the Ph.D. degree in electrical engineering from Keio University Japan in 1985. He joined Xi’an Jiaotong University in 1975, where he is currently a Professor and the Director of the Institute of Artificial Intelligence and Robotics. His research interests include computer vision, pattern recognition, and machine learning. Dr. Zheng became a Member of the Chinese Academy of Engineering in 1999. He is the Chinese Representative on the Governing Board of the International Association for Pattern Recognition. He also serves as the President of the Chinese Association of Automation
Corresponding author: Ping Wei, e-mail: pingwei@xjtu.edu.cn
Received Date: 2024-09-15
Revised Date: 2024-12-23
Accepted Date: 2025-01-24

Available Online: 2025-03-27

Abstract

Abstract

Automatic surface defect detection is a critical technique for ensuring product quality in industrial casting production. While general object detection techniques have made remarkable progress over the past decade, casting surface defect detection still has considerable room for improvement. Lack of sufficient and high-quality data has become one of the most challenging problems for casting surface defect detection. In this paper, we construct a new casting surface defect dataset (CSDD) containing 2100 high-resolution images of casting surface defects and 56356 defects in total. The class and defect region for each defect are manually labeled. We conduct a series of experiments on this dataset using multiple state-of-the-art object detection methods, establishing a comprehensive set of baselines. We also propose a defect detection method based on YOLOv5 with the global attention mechanism and partial convolution. Our proposed method achieves superior performance compared to other object detection methods. Additionally, we also conduct a series of experiments with multiple state-of-the-art semantic segmentation methods, providing extensive baselines for defect segmentation. To the best of our knowledge, the CSDD has the largest number of defects for casting surface defect detection and segmentation. It would benefit both the industrial vision research and manufacturing applications.
- Casting surface defect dataset,
- defect detection,
- defect segmentation,
- neural network

FullText(HTML)

I. Introduction

SURFACE defect detection is an essential step in industrial production as surface defects have adverse effects on the quality of industrial products, especially for the casting production. Traditional manual inspection manners are time-consuming and labor-intensive, leading to inconsistent and unreliable defect detection results. Automated surface defect detection instruments are crucial as they offer high efficiency, consistency and accuracy, enabling timely identification of defects. Most automated surface defect detection instruments first capture images of industrial products using visual sensors, and then use various algorithms to detect defects. High-precision defect detection methods play an important role in these instruments.

Nowadays, defect detection methods using deep learning models has been receiving growing research attention [1]–[12]. While these methods have achieved remarkable progress, casting surface defect detection has still considerable room for improvement. Lack of sufficient and high-quality data has become one of the most challenging problems. Developing methods for casting surface defect detection requires substantial defect data. However, extant datasets [13]–[19] are characterized by inherent limitations, including low image resolution, simple background structure, constrained data volume, etc. There is an urgent need for a benchmark dataset that better aligns with real-world industrial production conditions.

Collecting defective samples in industrial scenes is a challenging task. This is because, in many situations, the vast majority of products produced during the manufacturing process are defect-free. The probability of encountering a defective sample is extremely low, with some defects only appearing once every few months. This makes it challenging to gather a large number of defective samples.

Castings are a significant category of industrial products and widely used in various industries, as depicted in Fig. 1. During the production or usage process, the casting surface may exhibit various defects, such as cracks, shrinkage cavities and inclusions, which significantly impact the performance and compromise the mechanical properties.

Figure 1. The surface structural characteristics of castings, using an engine as the example.

DownLoad: Full-Size Img PowerPoint

In this paper, we build a benchmark Casting Surface Defect Dataset (CSDD). The dataset contains 2100 RGB images, each with the resolution of $3648 \times 3648$ . It encompasses three distinct types of defects: scratches, spots and rusts, providing meticulous annotations in both detection and segmentation formats for each defect. The CSDD demonstrates significant advantages over existing datasets [13]–[19], including higher image resolution, more complex background structures, and a greater number of defects. Thus, the CSDD can serve as a standard benchmark for surface defect detection and segmentation.

We conduct a comprehensive series of comparative experiments, leveraging well-established detection and segmentation methods to rigorously compare the performance of various methods. Moreover, we propose a new defect detection method by integrating the global attention mechanism and partial convolution into YOLOv5 [20]. Experimental results demonstrate that the proposed method significantly improves the defect detection performance.

Our contributions are summarized as follows.

1) We construct a casting surface defect dataset, which can serve as a novel benchmark for casting surface defect detection and segmentation.

2) We extensively evaluate existing methods of defect detection and segmentation on the CSDD, providing comprehensive baselines for future research.

3) We propose a defect detection method that incorporates the global attention mechanism and partial convolution, further improving the detection performance.

II. Related Work

A Defect Datasets

Datasets with sufficient samples are vital for surface defect detection. Collecting defective samples is a nontrivial task since most samples in industrial production are products without defects. For this reason, the scale, diversity, and amounts of existing defect datasets are inferior to the general object detection datasets. Nevertheless, many studies have striven to build reasonable defect datasets and achieved considerable progress over the past decade.

NEU-DET [13] comprises 1800 grayscale images with six types of surface defects commonly observed on hot-rolled steel strips. The defects are annotated with bounding boxes. KolektorSDD [14] is composed of 399 grayscale images depicting defective electrical commutators. For each image, a meticulously annotated pixel-wise mask is provided, enabling precise segmentation of defects. The dataset focuses only on a single type of defect. KolektorSDD2 [15] is captured with a visual inspection system in real industrial production scenarios. The dataset contains 2979 images without defects and 356 images with defects. For each defect, a pixel level annotation is provided. The types of defects include scratches, minor spots and surface imperfections.

BSData [16] collects 1104 RGB images by mounting the camera system onto the nut of a ball screw drive. Within the dataset of 1104 images, 394 are defective. GC10-DET [17] collects defects from the surface of a steel sheet and organizes them into a dataset consisting of 3570 grayscale images. The dataset contains 10 types of defects. SD-saliency-900 [18] comprises a total of 900 steel surface defect images and their corresponding pixel level annotations. The Severstal dataset [19] collects a large number of steel surface images during the production process and annotates the defects at the pixel level. The training set contains 12 568 gray images, providing corresponding annotations. While the annotations for the test set have not been disclosed.

The above datasets contribute greatly to the research community and industrial applications. However, a common limitation is that the images in these datasets either have simple inner structures or smooth textures, which make the defects more distinct. Our CSDD is larger in scale than most existing datasets. More importantly, our images feature complex inner structures and backgrounds that intertwine with the defects, presenting significant challenges for defect detection and segmentation.

B Object Detection Methods

Object detection aims to identify and localize objects within an image, using algorithms to classify each object and predict its bounding box. Recently, deep learning-based object detection methods [20]–[28] have garnered significant attention due to their superior accuracy and robustness. These methods can also be applied to defect detection, enabling the identification of defects in various materials and products with high precision.

Faster RCNN [21] is a classic two-stage object detection method. It consists of a feature extractor, a region proposal network and an object classifier. Faster RCNN can achieve high accuracy, but the prediction speed is slower compared to one-stage object detection methods. One the basis of Faster RCNN, Cascade RCNN [22] adopts a cascaded detector and RPN structure. It improves the detection accuracy and robustness, but the training time and computing resource requirement also increase accordingly.

SSD [23] is a one-stage object detection method and can realize the detection of multi-scale objects by predicting on feature maps at different scales. Compared to other methods, SSD is faster and has a better performance on small objects. However, it has disadvantages in detecting large objects and dealing with the issue of class imbalance between positive and negative samples. On the basis of SSD, RetinaNet [24] proposes focal loss to solve the problem of class imbalance between positive and negative samples. It performs better in dealing with the problem of class imbalance, but the detection accuracy is relatively low.

YOLOv5 [20] consists of two parts: a feature extraction network and a detection network. Compared with other object detection models, YOLOv5 has a faster detection speed. But it faces challenges in detecting small objects and overlapped ones. YOLOv8 [29], building on the foundation of YOLOv5, incorporates a more advanced backbone and an anchor-free head, further enhancing the efficiency of detection. YOLOv10 [30] introduces consistent dual assignments for NMS-free training and a holistic efficiency-accuracy driven model design strategy to better achieve real-time object detection. CenterNet [25] detects objects by classifying center points and regressing bounding boxes on feature maps at different scales, which has advantages in both speed and accuracy. However, it faces challenges in detecting small objects and handing overlapped ones. FCOS [26] represents a fully convolutional, single-stage object detection framework. It facilitates efficient detection for multi-scale objects. This approach obviates the necessity for intricate anchor box design and sample selection stages inherent in conventional object detection methodologies.

Deformable DETR [27] is a variant of DETR [31] object detection method, which improves the detection performance through the incorporation of a deformable attention mechanism. It enhances the precision and adaptability of feature representations. Conditional DETR [28] learns conditional spatial queries from embeddings in the decoder and implements a multi-head cross-attention mechanism, which achieves faster convergence than conventional DETR [31].

C Surface Defect Detection Methods

In recent years, surface defect detection has received more and more attention for its importance to industrial production. Many researchers have studied this problem and proposed impressive approaches.

Traditional methods extract hand-crafted features from images and use statistical learning methods to detect surface defects. Tsanakas et al. [32] utilize edge detection operators for operating photovoltaic modules. Yuan et al. [33] introduce a thresholding method for defect detection based on the otsu algorithm [34]. Li and Tsai [35] propose an automatic visual inspection method based on Fourier image reconstruction. Cen et al. [36] propose a defect inspection method for defect inspection of TFT-LCD panels based on low-rank matrix model. Jian et al. [37] propose a method for detecting possible defects in the manufacturing process of mobile phone screens. They employ a contour-based registration method to generate template images for aligning the images.

However, these traditional methods rely on hand-crafted features, which are less effective in complex environments or with intricate defects.

Recently, many researchers apply deep learning methods to defect detection. SDDNet [7] integrates a feature retaining block to preserve texture information and a skip densely connected module to propagate fine-grained details for improved prediction of defects. Zeng et al. [38] present the atrous spatial pyramid pooling-balanced-feature pyramid network, a novel feature fusion method for enhancing small object detection, particularly validated on tiny defect detection in printed circuit boards. Jain et al. [39] introduce a framework employing Generative Adversarial Networks for synthetic data augmentation, significantly boosting the surface defect detection performance. Wang et al. [9] introduce a novel method for surface defect detection by integrating global context-based self-similarity feature augmentation with a multi-scale bidirectional feature fusion module. ETDNet [10] incorporates a lightweight vision transformer, a channel-modulated feature pyramid network and a task-oriented decoupled head to improve classification and regression task performance. Li et al. [11] propose a method for detecting micro-defects on printed circuit board assemblies, which integrates a semantic segmentation network with rule-based detection algorithms. An improved YOLOv5 model based on the Wasserstein GAN algorithm is proposed for small sample defect detection by Han et al. [12].

While the above existing approaches have made great progress in defect detection, there is still considerable room for improvement for the challenges of inner structure influence, illumination changes and defect shape variance. To overcome these problems, we propose an improved method with the global attention mechanism and partial convolution.

D Semantic Segmentation Methods

Semantic segmentation methods can be broadly categorized into two main classes: CNN-based methods and Transformer-based methods.

1) CNN-Based Methods: FCN [40] replaces fully connected layers with fully convolutional layers, enabling pixel-wise segmentation. UNet [41] employs a U-shaped encoder-decoder structure with skip connections. The skip connections facilitate the fusion of low-level and high-level features, enabling precise segmentation of objects and achieving remarkable segmentation accuracy. DeepLabv3 [42] employs atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rate and significantly improves their previous DeepLab versions [43] without DenseCRF post-processing. DeepLabv3plus [44] adds a decoder module to refine the segmentation results based on [42] and apply the depth wise separable convolution to both Atrous Spatial Pyramid Pooling and decoder modules. FastFCN [45] introduces the Joint Pyramid Upsampling module to replace dilated convolutions, reducing computation complexity by over three times without performance loss. CGNet [46] is a lightweight and efficient semantic segmentation. It introduces the Context Guided block to capture joint features of local and global contexts, enhancing the segmentation accuracy. DDRNet [47] proposes a family of efficient backbones specially designed for real-time semantic segmentation, achieving an impressive balance between accuracy and speed. PIDNet [48] is an innovative three-branch network for real-time semantic segmentation, combining detail, context, and boundary information for high accuracy and speed.

2) Transformer-Based Methods: Segmenter [49] extends the Vision Transformer (ViT) to semantic segmentation, utilizing output embeddings of image patches and employing either a point-wise linear decoder or a mask transformer decoder for class label prediction. SegFormer [50] introduces a novel hierarchically structured transformer encoder, generating multi-scale features without positional encoding. It achieves significantly better performance and efficiency compared to previous methods. MaskFormer [51] is a novel semantic and instance level segmentation method that predicts a set of binary masks, each associated with a single global class label, handling semantic and panoptic segmentation tasks in a unified manner. Mask2Former [52] employs the masked attention to extract localized features within predicted mask regions, further enhancing the performance.

III. Casting Surface Defect Dataset

Since there was no suitable dataset for casting surface defects with complex casting structures, we have created a new Casting Surface Defect Dataset. We believe this dataset will benefit both the industrial vision research and manufacturing applications.

A Data Specifications

In practical production processes, most samples are defect-free, making it challenging to collect a sufficient number of defective samples. To overcome this problem, we adopt a manufacturing-simulation approach for defective data collection.

Firstly, we machine a large number of metal plates, each measuring 30 centimeters in length and width. We design the surface structures based on actual castings, such as automotive engines shown in Fig. 1. Then, we use machines to create more than ten cavities of different sizes and layouts on each metal plate. The radius of these cavities ranges from 0.4 to 4 centimeters. These metal plates with cavities are intended to simulate the actual casting surfaces, such as those found in automotive engines.

Secondly, we use production tools to create defects on the metal plates, including three types: scratches, spots and rusts. The sizes of these defects could be very small and vary significantly. To ensure that the casting surface and defects closely resemble those found in practical products, we produce them under the instructions of professional production and quality inspection engineers.

Finally, the metal plates are captured using a visual sensor. We manually annotate each image with the defect type and the corresponding defect region. For the annotation of the CSDD, we first randomly divide the images into six groups of approximately equal size, with each group assigned to an independent annotator who labels the defects in each image. After completing this process, each annotator double-checks and amends the annotations made by the other five annotators to ensure the accuracy and consistency of the labels.

Due to the reflective nature of metal plates, it is extremely difficult to obtain uniform illumination images. In this case, we carefully design an imaging scheme. We use four strip lights to surround and illuminate the metal plates. Meanwhile, the intensity and angle of these strip lights can be adjusted to capture the highest quality images. Besides, in order to obtain images with uniform brightness, we use the gamma correction to process the captured images. After adjusting the exposure time and gain, we successfully obtain images with uniform illumination.

Through these procedures, the Casting Surface Defect Dataset are built. It contains 2100 images of metal plates with a total of 56 356 annotated defects, of which 31 634 defects are scratches, 10 470 defects are spots, and 14 252 defects are rusts. Table I and Table II present the information of the annotated defects. Fig. 2 visualizes some examples of these three types of defects.

Table I. Comparison Between the CSDD and Other Datasets of Surface Defects. “Structure” Represents Whether There is Structural Information Presented in the Images, Such as the Circular Structure Observed in Our CSDD. “Resolution” Represents the Image Resolution. For Some Datasets, the Images May Have Varying Resolutions. The Values Listed in the Table Indicate the Maximum Resolution Among the Images in the Dataset. “Defect Types” Refers to the Number of Defect Categories. For Datasets that Only Provide Binary Segmentation Annotations, We Consider “Defect Types” as 1

Dataset	Objects	Structure	Images	Anno. images	Anno. defects	Color mode	Resolution	Anno. type	Defect types
NEU-DET [13]	Steel Strip		1800	1800	4189		$200\times200$	Bbox	6
GC10-DET [17]	Metal		3570	3570	3564		$2048\times1000$	Bbox	${10}$
Severstal [19]	Steel		${18074}$	${12568}$	19382		$1600\times256$	Pixel	4
KolektorSDD [14]	Commutator		399	399	56		$1270\times500$	Pixel	1
KolektorSDD2 [15]	Commutator		3335	3335	394	√	$649\times241$	Pixel	1
SD-saliency-900 [18]	Steel Strip		900	900	2143		$200\times200$	Pixel	3
BSData [16]	Spindle	√	1104	1104	485	√	$2816\times1016$	Pixel	1
${CSDD (Ours)}$	Casting	√	2100	2100	56356	√	${3648}\times{3648}$	Pixel	3

| Show Table

DownLoad: CSV

Table II. Overview of the Annotated Defects. The Width and Height are Measured With Pixels. Here “Avg” Indicates “Average”

Type	Number	Width (Avg)	Height (Avg)
Scratch	31634	71.77	39.23
Spot	10470	24.40	20.83
Rust	14252	71.55	41.34

| Show Table

DownLoad: CSV

Figure 2. Example images for all three defect types of our CSDD. The first, third, and fifth rows display original images. In the second, fourth and sixth rows, partial close-up views are given.

DownLoad: Full-Size Img PowerPoint

B Comparison With Existing Datasets of Surface Defects.

The comparison between our CSDD and the existing datasets introduced in Section II-A is depicted in Table I and Fig. 3. In comparison to the existing datasets, our CSDD demonstrates several notable complexities.

Figure 3. Sample comparison between our Casting Surface Defect Dataset and other datasets of surface defects. “KD” stands for “KolektorSDD”, and “KD2” stands for “KolektorSDD2”.

DownLoad: Full-Size Img PowerPoint

First, it has a large number of images and defects. As shown in Table I, our CSDD has 2100 images and 56 356 annotated defects, which has the largest number of annotated defects compared to other datasets.

Second, the images in our CSDD have complex inner structures. These inner structures would disturb the defect detection and makes the task more challenging, as shown in Fig. 2, which are closer to the practical applications.

Besides, the size of the defect is extremely small compared to the size of the image, as shown in . The size of the casting surface in our CSDD is $30$ centimeters long while the sizes of some defects are smaller than $0.1$ centimeters. The image resolution is $3648 \times 3648$ , while most defects are smaller than 100 pixels, with some even less than 50 pixels. This significant difference in size presents challenges associated with detecting “small objects”.

IV. Methods

In this section, we present an method for defect detection by incorporating the global attention mechanism and partial convolution into YOLOv5 [20]. The method, along with the approaches introduced in Section II-B, can serve as baselines for defect detection on our CSDD.

A Model Architecture

We enhance the standard YOLOv5[20] by incorporating both the global attention mechanism (GAM) [53] and partial convolution (PConv) [54]. The overall architecture is depicted in Fig. 4. The convolution (Conv), bottleneck with 3 convolutions (C3), spatial pyramid pooling (SPPF), upsample, and detect modules remain consistent with the standard YOLOv5 [4]. We incorporate GAM preceding the SPPF module and substitute two of the C3 modules with P3 modules in the backbone part. We will provide detailed introductions to GAM and P3 in Sections IV-B and IV-C, respectively.

Figure 4. The overall architecture of our method with the global attention mechanism and partial convolution. The symbol “

$\star$ ” denotes the modules we introduce to the standard YOLOv5 [20].

DownLoad: Full-Size Img PowerPoint

As shown in Fig. 4, our model comprises three main components: the backbone, the neck and the head. The backbone functions as the feature extractor, utilizing Conv, C3, P3, GAM and SPPF to extract features from the input images. The Conv module is primarily used for downsampling the input, while the C3 module performs feature extraction and fusion to obtain feature maps containing rich semantic information. The SPPF module further enhances the semantic information contained in the deepest feature maps through pooling and feature fusion. We replace two C3 modules in the standard YOLOv5 with two P3 modules and add a GAM module before the SPPF module. The neck integrates feature maps of three different scales extracted from various stages of the backbone. By merging shallow features, it ensures that the resulting feature maps possess both rich semantic information and precise positional details. The head conducts predictions on the feature maps of three different scales obtained from the neck, and consolidates the prediction results to form the final detection outcome.

B Global Attention Mechanism

Attention mechanisms can effectively enhance the representational capacity of feature maps, thereby contributing to detection tasks. The Convolutional Block Attention Module (CBAM) [55] sequentially infers attention maps along channel and spatial dimensions, then utilizes these attention maps to adaptively refine feature maps. The global attention mechanism [53] redesigns the sequential channel-spatial attention mechanism from CBAM and achieves better performance. We use the global attention mechanism [53] in our method to enhance the model’s ability to perceive defect details, thereby improving its detection performance.

Given the input feature map $F_1$ , it sequentially passes through the channel attention module and the spatial attention module. The intermediate state $F_2$ and the output $F_3$ are defined as

$\begin{array}{*{20}{l}} F_2 = M_c(F_1) \otimes F_1 \\ \end{array}$

(1)

$\begin{array}{*{20}{l}} F_3 = M_s(F_2) \otimes F_2 \\ \end{array}$

(2)

where $\otimes$ denotes element-wise multiplication, $M_c$ is the channel attention submodule and $M_s$ is the spatial attention submodule. The channel attention submodule of GAM employs 3D permutation, while the spatial attention submodule utilizes two convolutional layers for spatial information fusion.

C Partial Convolution

Fast neural networks with low latency and high throughput are crucial, especially for neural networks deployed in real-world scenarios. PConv [54] applies a regular convolution to only a portion of the input channels, leaving the rest untouched. Building upon PConv, fast neural network (FasterNet) [54] achieves the state-of-the-art performance in classification and detection tasks, with significantly lower latency and higher throughput.

We make modifications to the C3 module of the standard YOLOv5 [20], as illustrated in Fig. 5. The conv, bn and relu represent the standard convolution, batch normalization and relu operation, respectively. We name the modified C3 module as the P3 module. The P3 module comprises three FasterNet Blocks [54], with each FasterNet Block containing a PConv.

Figure 5. The architecture of the P3 module.

DownLoad: Full-Size Img PowerPoint

We incorporate GAM preceding the SPPF module and substitute two of the C3 modules with P3 in the backbone part. With GAM, the model becomes more adept at capturing features specific to defects themselves, thereby aiding downstream detection tasks. Through the integration of PConv, the model further enhances detection performance while maintaining a reduced parameter count.

V. Experiments

In this section, we perform a comprehensive evaluation of several state-of-the-art methods for defect detection and segmentation on our CSDD, respectively. We also evaluate the method proposed in this paper and compare it with other approaches. These results can serve as baselines for future research.

A Experimental Settings

The CSDD contains 2100 high-resolution RGB images of steel plates. 1343 (64%) images are used for training, 335 (16%) images are allocated for validation, and 422 (20%) images are reserved for testing. We follow the same data division pattern for all the methods reported in this work. The images are resized to 1024 × 1024.

For defect detection, we use the average precision (AP) and the mean average precision (mAP) as the evaluation metrics. They are defined as follows:

$\begin{split} & Precision=\frac{TP_{{\rm{bbox}}}}{TP_{{\rm{bbox}}}+FP_{{\rm{bbox}}}} \\ & Recall=\frac{TP_{{\rm{bbox}}}}{TP_{{\rm{bbox}}}+FN_{{\rm{bbox}}}} \end{split}$

(3)

where $TP_{{\rm{bbox}}}$ , $FP_{{\rm{bbox}}}$ and $FN_{{\rm{bbox}}}$ are the number of true positives, false positives and false negatives, at the bounding box level. The AP is computed as the area under the precision-recall curve, while the mAP represents the average AP across all defect types.

For defect segmentation, the Intersection over Union (IoU) and mean Intersection over Union (mIoU) are used as the evaluation metrics. The IoU is defined as follows:

$IoU=\dfrac{TP_{{\rm{pixel}}}}{TP_{{\rm{pixel}}}+FP_{{\rm{pixel}}}+FN_{{\rm{pixel}}}}$

(4)

where $TP_{{\rm{pixel}}}$ , $FP_{{\rm{pixel}}}$ and $FN_{{\rm{pixel}}}$ are the number of true positives, false positives and false negatives, at the pixel level. The mIoU is the mean value of the IoU across all defect types.

Additionally, we utilize giga floating point operations (GFLOPs) [56] to evaluate the complexity of different methods for both defect detection and segmentation. GFLOPs quantifies the total number of floating-point operations required for a single forward pass. A lower GFLOPs indicates a lower complexity of the method.

Figure 7. The visualization of segmentation results obtained from the Unet. The first, third, and fifth rows display original images. In the second, fourth and sixth rows, partial close-up views are given.

DownLoad: Full-Size Img PowerPoint

B Defect Detection on CSDD

For defect detection, we implement the CNN-based and Transformer-based methods mentioned in Section II-B using MMDetection [56], excluding YOLOv5, YOLOv8 and YOLOv10. For YOLOv5, YOLOv8 and YOLOv10, we utilize their official open-source project packages [20], [29], [30]. Most experimental parameters are kept consistent with the default parameters in MMDetection. For image augmentation, we employ the default configurations from MMDetection []. In order to make the object detection methods based on Transformer architecture have scale invariance and stronger generalization ability, data enhancement methods including random choice resize are widely used in MMDetection []. The input size of Transformer-based methods is a dynamically changing range. Therefore, we follow the default settings in MMDetection[] without forcing all image inputs to be scaled to $1024\times1024$ . We use 4 NVIDIA Tesla 3090 GPUs and set the batch size as 4. More detailed experimental settings are shown in Table III.

Table III. Training Details for Defect Detection on CSDD

Methods	Optimizer	Learning rate	Weight decay	Momentum	Augmentation	Loss	Scheduler	Batch size	Epochs
Faster R-CNN [21]	SGD	0.02	0.0001	0.9	Random flip	L1 loss (bbox) CE loss (cls)	Linear LR	4	48
Cascade R-CNN [22]	SGD	0.02	0.0001	0.9	Random flip	Smooth L1 loss (bbox) CE loss (cls)	Linear LR	4	48
SSD [23]	SGD	0.002	0.0005	0.9	Random crop Random flip	Smooth L1 loss (bbox) CE loss (cls)	Linear LR	4	48
RetinaNet [24]	SGD	0.02	0.0001	0.9	Random flip	L1 loss (bbox) Focal loss (cls)	Linear LR	4	48
YOLOv5 [20]	SGD	0.01	0.0005	0.8	Mosaic, mixup HSV color-space random flip	CIoU loss (location) BCE loss (class+object)	Lambda LR	4	300
YOLOv8 [29]	SGD	0.01	0.0005	0.8	Mosaic, mixup HSV color-space random flip	CIoU loss (location) BCE loss (class+object)	Lambda LR	4	300
YOLOv10 [30]	SGD	0.01	0.0005	0.8	Mosaic, mixup HSV color-space random flip	CIoU loss (location) BCE loss (class+object)	Lambda LR	4	300
CenterNet [25]	SGD	0.01	0.0001	0.9	Random flip	GIoU loss (bbox) Gaussian focal loss (cls)	Linear LR	4	48
FCOS [26]	SGD	0.01	0.0001	0.9	Random flip	IoU loss (bbox) CE loss (cls)	Constant LR MultiStep LR	4	48
Deformable DETR [27]	AdamW	0.0002	0.0001	−	Random flip Random crop Random Choice resize	L1 loss (bbox), Focal loss (cls), GIoU loss (IoU)	MultiStep LR	4	500
Conditional DETR [28]	AdamW	0.0001	0.0001	−	Random flip Random crop Random Choice resize	L1 loss (bbox), Focal loss (cls), GIoU loss (IoU)	MultiStep LR	4	500

| Show Table

DownLoad: CSV

The results of the CNN-based and Transformer-based methods described in Section II-B on the CSDD for defect detection are shown in the Table IV. Among all the baseline methods, YOLOv5 achieves the best results, with the mAP of 69.5%, while FCOS performs the worst. The SSD model has the highest complexity, whereas YOLOv5 has the lowest. Rusts are the most difficult to detect among all the defect types. All the methods we implement perform worse on rusts compared to scratches and spots, likely due to the more complex shapes and larger size variations of rusts, as shown in Fig. 2. The challenging nature of detecting rusts also highlights the value of our CSDD for advancing defect detection techniques.

Table IV. The Performance and Complexity of Different Methods for Defect Detection on CSDD. The Best Performing Method is Highlighted in Boldface

Method	Scratch	Spot	Rust	Average	Complexity
Method	AP% $\uparrow$			mAP% $\uparrow$	GFLOPs $\downarrow$
Faster R-CNN [21]	78.4	63.5	36.0	59.3	220
Cascade R-CNN [22]	77.8	63.5	38.0	60.0	248
SSD [23]	79.8	69.4	47.2	65.5	351
RetinaNet [24]	69.9	55.6	21.6	49.0	210
YOLOv5 [20]	83.6	71.3	53.7	69.5	15.8
YOLOv8 [29]	84.1	63.6	52.1	66.6	28.4
YOLOv10 [30]	81.6	60.1	51.5	64.4	24.5
CenterNet [25]	77.1	55.9	32.6	55.2	201
FCOS [26]	72.6	37.1	26.6	45.4	201
Deformable DETR [27]	74.0	57.5	26.2	52.6	200
Conditional DETR [28]	75.4	58.0	25.3	52.9	102
Ours	84.4	73.2	55.8	71.1	16.2

| Show Table

DownLoad: CSV

For our proposed method, we employ the same training parameter configurations as the YOLOv5. The results are presented in Table IV. Our method achieves the best results compared to other comparison approaches. The mAP of our method reaches 71.1%, outperforming the best existing method by 1.6%. For each individual defect type, our method also performs the best, achieving the mAP of 84.4% for scratches, 73.2% for spots, and 55.8% for rusts.

We also validate the effectiveness of the global attention mechanism and partial convolution, as shown in Table V. The base model refers to the standard YOLOv5. Both the GAM and PConv can effectively enhance the performance of detection. With GAM, the model can more effectively capture features related to defects, thus aiding in defect detection. By employing PConv, the model achieves higher detection accuracy with lower model complexity. The highest detection performance is achieved when both global attention mechanism and partial convolution are utilized simultaneously.

Table V. The Performance (mAP%) and Model Complexity (GFlops) of Different Combinations for Defect Detection

Combination	${\rm{mAP}} \uparrow$	GFLOPs $\downarrow$
Base model	69.5	15.8
Base model + GAM	70.0	17.2
Base model + PConv	70.6	14.6
Base model + GAM + PConv	71.1	16.2

| Show Table

DownLoad: CSV

The visualization of the detection results is shown in Fig. 6. Our method is capable of accurately detecting defects, even in cases of complex defects, such as small defects and those with significant variations in size.

Figure 6. The visualization of detection results obtained from our method. The first, third, and fifth rows display original images. In the second, fourth and sixth rows, partial close-up views are given.

DownLoad: Full-Size Img PowerPoint

C Defect Segmentation on the CSDD

For defect segmentation, we implement the CNN-based and Transformer-based methods mentioned in 2.4 using MMSegmentation [57]. Most experimental parameters are kept consistent with the default parameters in MMSegmentation. For image augmentation, we employ the default configurations from MMSegmentation. During training, random flip, photometric distortion are used to augment the data. When using multiple losses, the weight of CrossEntropyLoss and DiceLoss are 1.0 and 3.0 respectively. We use 4 NVIDIA GeForce RTX 3090 GPUs. Training details are listed in Table VI.

Table VI. Training Details for Defect Segmentation on CSDD

Methods	Loss	Optimizer	Scheduler	Learning rate	Weight decay	Momentum	Batch size	Iterations
FCN [40]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
UNet [41]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	2	160 000
DeepLabv3 [42]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
DeepLabv3plus [44]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	2	160 000
FastFCN [45]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
CGNet [46]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
DDRNet [47]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
PIDNet [48]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
Segmenter [49]	CE + Dice	SGD	PolyLR	0.001	0	0.9	1	320 000
Segformer [50]	CE + Dice	SGD	PolyLR	0.01	0.0005	0.9	4	80 000
MaskFormer [51]	CE + Focal + Dice	AdamW	LinearLR + PolyLR	6E−05	0.01	0.9	4	80 000
Mask2Former [51]	CE + Dice	AdamW	PolyLR	1E−04	0.05	0.9	2	160 000

| Show Table

DownLoad: CSV

The performances of all the CNN-based and Transformer-based methods on our CSDD for defect segmentation are presented in the . Among all the methods, Unet achieves the best performance, with the mIoU of 54.29%. The CGNet has the lowest model complexity but only reaches the mIoU of 48.20%. Figure $\ref{fig:segmentation-result-visualization}$ shows the visualized segmentation results of the Unet. Most of the defects are accurately segmented. Compared to scratches, spots and rusts are more difficult to segment, highlighting that one of the most challenging aspects lies in identifying small defects and those with significant size variation. This implies that there is still room for improvement in existing methods on our CSDD.

Table VII. The Performance and Complexity of Different Methods for Defect Segmentation on CSDD. The Best Performing Method is Highlighted in Boldface

Method	Scratch	Spot	Rust	Average	Complexity
Method	IoU% $\uparrow$			mIoU% $\uparrow$	GFLOPs $\downarrow$
FCN [40]	55.63	46.80	42.54	48.32	791
UNet [41]	65.73	50.64	46.49	54.29	812
DeepLabv3 [42]	53.99	45.32	41.26	46.86	1079
DeepLabv3plus [44]	61.10	46.39	44.26	50.58	706
FastFCN [45]	54.79	45.77	40.52	47.03	521
CGNet [46]	55.22	47.33	42.04	48.20	13.80
DDRNet [47]	46.21	35.87	36.79	39.62	71.73
PIDNet [48]	46.70	35.99	38.21	40.30	23.76
Segmenter [49]	25.90	24.71	30.18	26.93	775
SegFormer [50]	59.13	43.67	43.89	48.90	148
MaskFormer [51]	60.81	46.07	44.05	50.31	−
Mask2Former [52]	60.76	45.15	44.00	49.97	−

| Show Table

DownLoad: CSV

VI. Discussion

A Generalization Experiments of Our Proposed Method

To further validate the generalizability of our proposed method, we conduct experiments on the NEU-DET and GC10-DET datasets. Similar to the experimental settings employed on the CSDD, the NEU-DET and GC10-DET datasets are subdivided into training, validation, and test sets, with proportions of 64%, 16% and 20%, respectively. The images are also resized to 1024 × 1024.

On the NEU-DET dataset, the standard YOLOv5 achieves the mAP of 62.0%, while our method reaches the mAP of 73.1%, which represents a significant improvement of 11.1%. On the GC10-DET dataset, the standard YOLOv5 exhibits the mAP of 59.5%, while our method achieves the mAP of 64.3%, reflecting an enhancement of 4.8%. These results demonstrate that the global attention mechanism and partial convolution introduced in our method can provide significant performance improvements on both NEU-DET and GC10-DET datasets, further validating the generalizability of our method.

B Advantage and Disadvantage of Our Proposed Method

The employment of the GAM and PConv allows our model to capture defect features more precisely and comprehensively, thereby achieving higher defect detection performance across all defect types. Our method achieves the AP of 84.4%, 73.2% and 55.8% for the scratch, spot and rust, respectively. It is noteworthy that, in comparison to the scratch, the spot generally appears in smaller sizes, while the rust tends to have more complex shapes. This indicates that there is still room for improvement in our method for detecting small-sized and complex-shaped defects, which can be a focus for future work.

C Performance of CNN and Transformer-Based Methods

For the defect detection task, the performance of seven CNN-based methods (Faster RCNN, Cascade R-CNN, SSD, YOLOv5, YOLOv8, YOLOv10 and CenterNet) outperforms that of the best Transformer-based method (Conditional DETR). These results suggest that the perception of local features may be more important. CNNs, through convolutional layers, effectively extract local features, which are crucial for capturing low-level information such as the edges and textures of defects in images. Furthermore, CNNs can effectively maintain the spatial structure of the image, making them well suited for handling spatially localized information.

D FutUre Work

1) Multimodality: Our CSDD can be expanded into a multimodal dataset. In future work, we can add corresponding textual descriptions for each type of defects, serving as prior knowledge about the defects. How to utilize this prior knowledge to further enhance the model’s defect detection performance is a research topic worthy of exploration.

2) Incomplete labels: In some application scenarios, the process of annotating all defects is very labor-intensive and time-consuming. Consequently, research into achieving higher accuracy in defect detection and segmentation with only a subset of defects annotated is a highly valuable direction of study.

VII. Conclusion

In this work, we construct the casting surface defect dataset (CSDD), which contains 2100 RGB images of steel plates, each with the resolution of $3648 \times 3648$ . The CSDD provides both detailed detection and segmentation annotations. We extensively evaluate existing detection and segmentation methods on the CSDD, providing a comprehensive baseline for future researches. Besides, we introduce the global attention mechanism and partial convolution to YOLOv5 and further enhance its performance in defect detection. With this work, we aim to further bridge the gap between the industrial and academic communities in the field of surface defect detection and segmentation.

References(57)

References

[1]	J. Masci, U. Meier, G. Fricout, and J. Schmidhuber, “Multi-scale pyramidal pooling network for generic steel defect classification,” The 2013 International Joint Conference on Neural Networks (IJCNN), pp. 1–8, 2013.
[2]	R. Ren, T. Hung, and K. C. Tan, “A generic deep-learning-based approach for automated surface inspection,” IEEE transactions on cybernetics, vol. 48, no. 3, pp. 929–940, 2017.
[3]	V. Natarajan, T.-Y. Hung, S. Vaikundam, and L.-T. Chia, “Convolutional networks for voting-based anomaly classification in metal surface inspection,” 2017 IEEE International Conference on Industrial Technology (ICIT), pp. 986–991, 2017.
[4]	X. Tao, D. Zhang, W. Ma, X. Liu, and D. Xu, “Automatic metallic surface defect detection and recognition with convolutional neural networks,” Applied Sciences, vol. 8, no. 9, p. 1575, 2018. doi: 10.3390/app8091575
[5]	T. Wang, Y. Chen, M. Qiao, and H. Snoussi, “A fast and robust convolutional neural network-based defect detection model in product quality control,” The International Journal of Advanced Manufacturing Technology, vol. 94, pp. 3465–3471, 2018. doi: 10.1007/s00170-017-0882-0
[6]	Y.-J. Cha, W. Choi, G. Suh, S. Mahmoudkhani, and O. Büyüköztürk, “Autonomous structural visual inspection using region-based deep learning for detecting multiple damage types,” Computer-Aided Civil and Infrastructure Engineering, vol. 33, no. 9, pp. 731–747, 2018. doi: 10.1111/mice.12334
[7]	L. Cui, X. Jiang, M. Xu, W. Li, P. Lv, and B. Zhou, “Sddnet: A fast and accurate network for surface defect detection,” IEEE Transactions on Instrumentation and Measurement, vol. 70, pp. 1–13, 2021.
[8]	X. Jiang, F. Yan, Y. Lu, K. Wang, S. Guo, T. Zhang, Y. Pang, J. Niu, and M. Xu, “Joint attention-guided feature fusion network for saliency detection of surface defects,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–12, 2022.
[9]	H. Wang, R. Zhang, M. Feng, Y. Liu, and G. Yang, “Global contextbased self-similarity feature augmentation and bidirectional feature fusion for surface defect detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–12, 2023.
[10]	H. Zhou, R. Yang, R. Hu, C. Shu, X. Tang, and X. Li, “Etdnet: Efficient transformer-based detection network for surface defect detection,” IEEE Transactions on Instrumentation and Measurement, vol. 72, pp. 1–14, 2023.
[11]	Y. Li, X. Wang, Z. He, Z. Wang, K. Cheng, S. Ding, Y. Fan, X. Li, Y. Niu, S. Xiao et al, “Industry-oriented detection method of pcba defects using semantic segmentation models,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 6, pp. 1438–1446, 2024. doi: 10.1109/JAS.2024.124422
[12]	Y. Han, L. Wang, Y. Wang, and Z. Geng, “Intelligent small sample defect detection of concrete surface using novel deep learning integrating improved yolov5,” IEEE/CAA Journal of Automatica Sinica, vol. 11, no. 2, pp. 545–547, 2024. doi: 10.1109/JAS.2023.124035
[13]	K. Song and Y. Yan, “A noise robust method based on completed local binary patterns for hot-rolled steel strip surface defects,” Applied Surface Science, vol. 285, pp. 858–864, 2013. doi: 10.1016/j.apsusc.2013.09.002
[14]	D. Tabernik, S. Sela, J. Skvarč, and D. Skočaj, “Segmentation-Based Deep-Learning Approach for Surface-Defect Detection,”Journal of Intelligent Manufacturing, May 2019.
[15]	J. Božič, D. Tabernik, and D. Skočaj, “Mixed supervision for surface-defect detection: from weakly to fully supervised learning,”Computers in Industry, 2021.
[16]	T. Schlagenhauf and M. Landwehr, “Industrial machine tool component surface defect dataset,” Data in Brief, vol. 39, p. 107643, 2021. doi: 10.1016/j.dib.2021.107643
[17]	X. Lv, F. Duan, J.-j. Jiang, X. Fu, and L. Gan, “Deep metallic surface defect detection: The new benchmark and detection network,”Sensors, vol. 20, no. 6, 2020.
[18]	G. Song, K. Song, and Y. Yan, “Saliency detection for strip steel surface defects using multiple constraints and improved texture features,” Optics and Lasers in Engineering, vol. 128, p. 106000, 2020. doi: 10.1016/j.optlaseng.2019.106000
[19]	i. i. O. Alexey Grishin, BorisV, “Severstal: Steel defect detection,”2019.[Online]. Available: https://kaggle.com/competitions/severstal-steel-defect-detection
[20]	G. Jocher, “Ultralytics yolov5,”2020.[Online]. Available: https://github.com/ultralytics/yolov5
[21]	S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,”Advances in neural information processing systems, vol. 28, 2015.
[22]	Z. Cai and N. Vasconcelos, “Cascade r-cnn: Delving into high quality object detection,”in Proceedings of the IEEE conference on computer vision and pattern recognition, 2018, pp. 6154–6162.
[23]	W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,”Computer Vision– ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I 14, pp. 21–37, 2016.
[24]	T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss x for dense object detection,” Proceedings of the IEEE international conference on computer vision, pp. 2980–2988, 2017.
[25]	X. Zhou, D. Wang, and P. Krähenbühl, “Objects as points,”arXiv preprint arXiv: 1904.07850, 2019.
[26]	Z. Tian, C. Shen, H. Chen, and T. He, “Fcos: Fully convolutional one-stage object detection,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 9627–9636, 2019.
[27]	X. Zhu, W. Su, L. Lu, B. Li, X. Wang, and J. Dai, “Deformable detr: Deformable transformers for end-to-end object detection,”arXiv preprint arXiv: 2010.04159, 2020.
[28]	D. Meng, X. Chen, Z. Fan, G. Zeng, H. Li, Y. Yuan, L. Sun, and J. Wang, “Conditional detr for fast training convergence,” Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3651–3660, 2021.
[29]	G. Jocher, A. Chaurasia, and J. Qiu, “Ultralytics yolov8,”2023. [Online]. Available: https://github.com/ultralytics/ultralytics
[30]	L. L. e. a. Ao Wang, Hui Chen, “Yolov10: Real-time end-to-end object detection,”arXiv preprint arXiv: 2405.14458, 2024.
[31]	N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” European conference on computer vision, pp. 213–229, 2020.
[32]	J. A. Tsanakas, D. Chrysostomou, P. N. Botsaris, and A. Gasteratos, “Fault diagnosis of photovoltaic modules through image processing and canny edge detection on field thermographic measurements,”International Journal of Sustainable Energy, 2015.
[33]	X.-c. Yuan, L.-s. Wu, and Q. Peng, “An improved otsu method using the weighted object variance for defect detection,”Applied surface science, 2015.
[34]	N. Otsu, “A threshold selection method from gray-level histograms,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 9, no. 1, pp. 62–66, 1979. doi: 10.1109/TSMC.1979.4310076
[35]	W. C. Li and D. M. Tsai, “Automatic saw-mark detection in multicrystalline solar wafer images,”Solar Energy Materials and Solar Cells, 2011.
[36]	Y. G. Cen, R. Z. Zhao, L. H. Cen, L. H. Cui, Z. J. Miao, and W. Zhe, “Defect inspection for tft-lcd images based on the low-rank matrix reconstruction,”Neurocomputing, 2015.
[37]	C. Jian, J. Gao, and Y. Ao, “Automatic surface defect detection for mobile phone screen glass based on machine vision,” Applied Soft Computing, vol. 52, pp. 348–358, 2017. doi: 10.1016/j.asoc.2016.10.030
[38]	N. Zeng, P. Wu, Z. Wang, H. Li, W. Liu, and X. Liu, “A smallsized object detection oriented multi-scale feature fusion approach with application to defect detection,” IEEE Transactions on Instrumentation and Measurement, vol. 71, pp. 1–14, 2022.
[39]	S. Jain, G. Seth, A. Paruthi, U. Soni, and G. Kumar, “Synthetic data augmentation for surface defect detection and classification using deep learning,” Journal of Intelligent Manufacturing, pp. 1–14, 2022.
[40]	J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431–3440, 2015.
[41]	O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part Ⅲ 18, pp. 234–241, 2015.
[42]	L.-C. Chen, G. Papandreou, F. Schroff, and H. Adam, “Rethinking atrous convolution for semantic image segmentation,”arXiv preprint arXiv: 1706.05587, 2017.
[43]	L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE transactions on pattern analysis and machine intelligence, vol. 40, no. 4, pp. 834–848, 2017.
[44]	L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoderdecoder with atrous separable convolution for semantic image segmentation,” Proceedings of the European conference on computer vision (ECCV), pp. 801–818, 2018.
[45]	H. Wu, J. Zhang, K. Huang, K. Liang, and Y. Yu, “Fastfcn: Rethinking dilated convolution in the backbone for semantic segmentation,”arXiv preprint arXiv: 1903.11816, 2019.
[46]	T. Wu, S. Tang, R. Zhang, J. Cao, and Y. Zhang, “Cgnet: A light-weight context guided network for semantic segmentation,” IEEE Transactions on Image Processing, vol. 30, pp. 1169–1179, 2020.
[47]	H. Pan, Y. Hong, W. Sun, and Y. Jia, “Deep dual-resolution networks for real-time and accurate semantic segmentation of traffic scenes,”IEEE Transactions on Intelligent Transportation Systems, 2022.
[48]	J. Xu, Z. Xiong, and S. P. Bhattacharyya, “Pidnet: A real-time semantic segmentation network inspired by pid controllers,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 19529–19539, 2023.
[49]	R. Strudel, R. Garcia, I. Laptev, and C. Schmid, “Segmenter: Transformer for semantic segmentation,” Proceedings of the IEEE/CVF international conference on computer vision, pp. 7262–7272, 2021.
[50]	E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12077–12090, 2021.
[51]	B. Cheng, A. Schwing, and A. Kirillov, “Per-pixel classification is not all you need for semantic segmentation,” Advances in Neural Information Processing Systems, vol. 34, p. 17, 2021.
[52]	B. Cheng, I. Misra, A. G. Schwing, A. Kirillov, and R. Girdhar, “Masked-attention mask transformer for universal image segmentation,” Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 1290–1299, 2022.
[53]	Y. Liu, Z. Shao, and N. Hoffmann, “Global attention mechanism: Retain information to enhance channel-spatial interactions,”arXiv preprint arXiv: 2112.05561, 2021.
[54]	J. Chen, S.-h. Kao, H. He, W. Zhuo, S. Wen, C.-H. Lee, and S.- H. G. Chan, “Run, don’t walk: Chasing higher flops for faster neural networks,” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, p. 12, 2023.
[55]	S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” Proceedings of the European conference on computer vision (ECCV), pp. 3–19, 2018.
[56]	K. Chen, J. Wang, J. Pang, Y. Cao, Y. Xiong, X. Li, S. Sun, W. Feng, Z. Liu, J. Xu, Z. Zhang, D. Cheng, C. Zhu, T. Cheng, Q. Zhao, B. Li, X. Lu, R. Zhu, Y. Wu, J. Dai, J. Wang, J. Shi, W. Ouyang, C. C. Loy, and D. Lin, “MMDetection: Open mmlab detection toolbox and benchmark,”arXiv preprint arXiv: 1906.07155, 2019.
[57]	M. Contributors, “MMSegmentation: Openmmlab semantic segmentation toolbox and benchmark,”https://github.com/open-mmlab/mmsegmentation, 2020.

[1]	Zhenzhen Luo, Xiaolu Jin, Yong Luo, Qiangqiang Zhou, Xin Luo. Analysis of Students’ Positive Emotion and Smile Intensity Using Sequence-Relative Key-Frame Labeling and Deep-Asymmetric Convolutional Neural Network[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(4): 806-820. doi: 10.1109/JAS.2024.125016
[2]	Xiaorui Li, Xiaojuan Ban, Haoran Qiao, Zhaolin Yuan, Hong-Ning Dai, Chao Yao, Yu Guo, Mohammad S. Obaidat, George Q. Huang. Multi-Scale Time Series Segmentation Network Based on Eddy Current Testing for Detecting Surface Metal Defects[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(3): 528-538. doi: 10.1109/JAS.2025.125117
[3]	Tao Wang, Qiming Chen, Xun Lang, Lei Xie, Peng Li, Hongye Su. Detection of Oscillations in Process Control Loops From Visual Image Space Using Deep Convolutional Networks[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 982-995. doi: 10.1109/JAS.2023.124170
[4]	Cong Pan, Junran Peng, Zhaoxiang Zhang. Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(3): 673-689. doi: 10.1109/JAS.2023.123660
[5]	Yin Zhu, Qiuqiang Kong, Junjie Shi, Shilei Liu, Xuzhou Ye, Ju-Chiang Wang, Hongming Shan, Junping Zhang. End-to-End Paired Ambisonic-Binaural Audio Rendering[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 502-513. doi: 10.1109/JAS.2023.123969
[6]	Wenqi Ren, Yang Tang, Qiyu Sun, Chaoqiang Zhao, Qing-Long Han. Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(5): 1106-1126. doi: 10.1109/JAS.2023.123207
[7]	Kailong Liu, Qiao Peng, Yuhang Liu, Naxin Cui, Chenghui Zhang. Explainable Neural Network for Sensitivity Analysis of Lithium-ion Battery Smart Production[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(9): 1944-1953. doi: 10.1109/JAS.2024.124539
[8]	Chi Ma, Dianbiao Dong. Finite-time Prescribed Performance Time-Varying Formation Control for Second-Order Multi-Agent Systems With Non-Strict Feedback Based on a Neural Network Observer[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 1039-1050. doi: 10.1109/JAS.2023.123615
[9]	Yongming Han, Lei Wang, Youqing Wang, Zhiqiang Geng. Intelligent Small Sample Defect Detection of Concrete Surface Using Novel Deep Learning Integrating Improved YOLOv5[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 545-547. doi: 10.1109/JAS.2023.124035
[10]	Dingxin He, HaoPing Wang, Yang Tian, Yida Guo. A Fractional-Order Ultra-Local Model-Based Adaptive Neural Network Sliding Mode Control of n-DOF Upper-Limb Exoskeleton With Input Deadzone[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(3): 760-781. doi: 10.1109/JAS.2023.123882
[11]	Yang Li, Xiao Wang, Zhifan He, Ze Wang, Ke Cheng, Sanchuan Ding, Yijing Fan, Xiaotao Li, Yawen Niu, Shanpeng Xiao, Zhenqi Hao, Bin Gao, Huaqiang Wu. Industry-Oriented Detection Method of PCBA Defects Using Semantic Segmentation Models[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1438-1446. doi: 10.1109/JAS.2024.124422
[12]	Zheyun Qin, Xiankai Lu, Xiushan Nie, Dongfang Liu, Yilong Yin, Wenguan Wang. Coarse-to-Fine Video Instance Segmentation With Factorized Conditional Appearance Flows[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(5): 1192-1208. doi: 10.1109/JAS.2023.123456
[13]	Ye Lin, Zhezhuang Xu, Dan Chen, Zhijie Ai, Yang Qiu, Yazhou Yuan. Wood Crack Detection Based on Data-Driven Semantic Segmentation Network[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(6): 1510-1512. doi: 10.1109/JAS.2023.123357
[14]	Zhijia Zhao, Jian Zhang, Shouyan Chen, Wei He, Keum-Shik Hong. Neural-Network-Based Adaptive Finite-Time Control for a Two-Degree-of-Freedom Helicopter System With an Event-Triggering Mechanism[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(8): 1754-1765. doi: 10.1109/JAS.2023.123453
[15]	Lili Fan, Shen Li, Ying Li, Bai Li, Dongpu Cao, Fei-Yue Wang. Pavement Cracks Coupled With Shadows: A New Shadow-Crack Dataset and A Shadow-Removal-Oriented Crack Detection Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(7): 1593-1607. doi: 10.1109/JAS.2023.123447
[16]	Jinhu Lü, Guanghui Wen, Ruqian Lu, Yong Wang, Songmao Zhang. Networked Knowledge and Complex Networks: An Engineering View[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(8): 1366-1383. doi: 10.1109/JAS.2022.105737
[17]	Yu Cao, Jian Huang. Neural-Network-Based Nonlinear Model Predictive Tracking Control of a Pneumatic Muscle Actuator-Driven Exoskeleton[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(6): 1478-1488. doi: 10.1109/JAS.2020.1003351
[18]	Xiaowei Feng, Xiangyu Kong, Hongguang Ma. Coupled Cross-correlation Neural Network Algorithm for Principal Singular Triplet Extraction of a Cross-covariance Matrix[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(2): 147-156.
[19]	Xiaoming Sun, Shuzhi Sam Ge. Adaptive Neural Region Tracking Control of Multi-fully Actuated Ocean Surface Vessels[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(1): 77-83.
[20]	Qiming Zhao, Hao Xu, Sarangapani Jagannathan. Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 372-384.

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Figures(7) / Tables(7)

Get Citation

PDF

XML

Article Metrics

Article views (118) PDF downloads(27)

CSDD: A Benchmark Dataset for Casting Surface Defect Detection and Segmentation