Loading [MathJax]/extensions/TeX/boldsymbol.js
A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Xiaogang Wang, Xiyu Liu and Yu Li, "An Incremental Model Transfer Method for Complex Process Fault Diagnosis," IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1268-1280, Sept. 2019. doi: 10.1109/JAS.2019.1911618
Citation: Xiaogang Wang, Xiyu Liu and Yu Li, "An Incremental Model Transfer Method for Complex Process Fault Diagnosis," IEEE/CAA J. Autom. Sinica, vol. 6, no. 5, pp. 1268-1280, Sept. 2019. doi: 10.1109/JAS.2019.1911618

An Incremental Model Transfer Method for Complex Process Fault Diagnosis

doi: 10.1109/JAS.2019.1911618
More Information
  • Fault diagnosis is an important measure to ensure the safety of production, and all kinds of fault diagnosis methods are of importance in actual production process. However, the complexity and uncertainty of production process often lead to the changes of data distribution and the emergence of new fault classes, and the number of the new fault classes is unpredictable. The reconstruction of the fault diagnosis model and the identification of new fault classes have become core issues under the circumstances. This paper presents a fault diagnosis method based on model transfer learning and the main contributions of the paper are as follows: 1) An incremental model transfer fault diagnosis method is proposed to reconstruct the new process diagnosis model. 2) Breaking the limit of existing method that the new process can only have one more class of faults than the old process, this method can identify M faults more in the new process with the thought of incremental learning. 3) The method offers a solution to a series of problems caused by the increase of fault classes. Experiments based on Tennessee-Eastman process and ore grinding classification process demonstrate the effectiveness and the feasibility of the method.

     

  • With the rapid growth of integration scale of the modern industry, the production process is growing more complex for its dynamic, nonlinear, time-delay, large inertia and strong coupling [1]. Even if a tiny fault is ignored, it may lead to paralysis of the whole system and seriously threaten the safety of life and property. So it is crucial to detect the fault of the whole production process accurately and predict the abnormal working conditions in advance to prevent more losses. It attracts many researchers to devote themselves to this field, which results in an increasing number of fault diagnosis methods being proposed [2], [3]. As a result of the rapid development of computer technology in recent years, a large number of process data have been collected and stored. Thus, the methods based on data-driven have their unique advantages. These methods are based solely on collected data rather than an accurate input-output model for the controlled process. They are suitable for the fault diagnosis of a modern complex process whose mathematical model is difficult to establish. Therefore, the methods based on data-driven have been widely researched and used to the complex industrial fault diagnosis [4]-[7].

    With the increasing complexity of the industrial processes, some traditional machine learning methods, such as Support Vector Machine (SVM), neural networks, reinforcement learning, have been used for fault diagnosis [8]-[10]. These methods have a good effect on the fault diagnosis of many processes, but there is a strict premise that the process data of training and test should be independent and they have the same distribution. Unfortunately, in complex industrial processes, there will be a distribution difference between the old process data and the new ones if there are some changes in the device parameters, the production environment, even the raw materials. Since the new process can not meet the requirements of the old process, the old model can not be directly applied to the fault diagnosis of the new process. In the meanwhile, the data accumulated in the new process is too few to rebuild a new diagnosis model. In this case, traditional machine learning methods are no longer applicable. And it becomes a research hotspot that how to effectively use the old model and the limited new data to build a new model. Transfer learning is a new learning framework to solve the problem [11], [12], it relaxes the assumption that the training and testing samples need to meet the independent and identically distribution, and it focuses on solving the problem of the different distribution between old process and the new one. Domain adaptation of two processes is an effective way to improve the efficiency of transfer learning [13].

    Transfer learning has involved in many fields, mainly in text classification, image classification, artificial intelligence planning and so on. In recent years, transfer learning begins to apply to fault diagnosis field [14], [15], and it performs well. However, transfer learning does not take account of the fact that the differences between the two processes may lead to some new faults in the new process, and the number of new faults can not be estimated accurately. Using known faults knowledge to learn the new faults, this demand often exists but it is nearly untouched in transfer learning. For other fault diagnosis methods, there are several algorithms can detect unknown faults on certain conditions. Hu et al. [16] proposed a fault diagnosis method based on kernel fuzzy C-means (KFCM) for unknown faults in satellite reaction wheels. The unknown faults which can be identified in the method must be a combination of two or more known faults. Kuzborskij et al. [17] solved the problem of image classification and they also recognized a new class by transfer learning. If this method is applied to fault diagnosis, it can only identify an unknown fault. But it is necessary to predict all the new unknown faults in the actual process. However, identifying all the N+M faults is not simply iterating M times of N N+1. With the increase of unknown faults, it will inevitably bring about a series of problems, which hampered scholars' study. Under the inspiration of literature [17], this paper proposes an incremental model transfer fault diagnosis method for complex process, which can identify all the faults, including the new faults in the new process. For convenience, we call our method incremental model transfer learning (IMTL). Prior to current fault diagnosis methods, the presented algorithm can identify the new faults in a new process with the advantage of transfer learning. Besides, the method gets the results of fault detection and classification simultaneously. The main contributions of this paper are as follows:

    1) Introducing model transfer into fault diagnosis field, the method achieves fault diagnosis with very few labeled samples. And fault detection and classification are achieved at the same time.

    2) Combining model transfer with incremental learning, the proposed method achieves the extension from identifying the existing N faults in the old process to N+M faults in the new process, where M is the number of new faults in the new process.

    3) Breaking the limit of [17] that the new process can only have one more class of faults than the old process, the method can identify M faults more in the new process.

    4) This method gives a solution to a series of problems caused by the increase of fault classes.

    The rest of the paper is organized as follows: the related work of model transfer is introduced in Section Ⅱ. Our learning framework and fault diagnosis process are presented in Section Ⅲ. We introduce typical industrial processes and describe our experimental results in Section Ⅳ. Finally, we conclude the paper in Section Ⅴ.

    Now we have a strategy to identify new faults, but how can we establish a diagnosis model quickly for a new industrial process with a few labeled samples? This is a practical and meaningful question. Using the data of similar processes to help to learn new process tasks, transfer learning solves the problem. As long as there exists the similarity between old process and the new one, the old knowledge can be used in the new process. Moreover, the transfer efficiency mainly depends on the degree of the similarity. The more common features there are, the better effect it will get.

    Lu et al. [18] proposed the concept of model transfer learning first. It is based on an established model and there should be a similarity between the two processes. Then it uses less experimental data and the model parameters of the similar process to build a new model. The previous machine learning methods need a lot of time and samples to build a diagnosis model, while the model transfer methods focus on exploiting the existing model and a few new process data to build a diagnosis model rapidly. The model transfer learning does not require remodeling and greatly reduces the time and cost of modeling as well as inherits the advantages of the old method. However, model transfer has its own drawback: the ability of local changes between the old model and the new one is limited and model transfer mainly focuses on global regularization. This paper makes up for the drawback with a local regularization term.

    Most model transfer algorithms focus on the visual object categorization or natural language processing. Segevetal et al. [19] proposed three model transfer algorithms with random forest. The structure expansion reduction (SER) algorithm uses a few samples to expand or cut the structure of the source domain (old process) decision tree to establish target domain (new process) decision tree model; the STRUT (Structure Transfer) algorithm uses several samples to establish the decision model of target domain by modifying the decision tree threshold; the mix SER and STRUT (MIX) algorithm combines the two random forests to get the final classification result by the majority of votes. Tommasi et al. [20] presented a multi-model knowledge transfer algorithm for image binary classification, and the method can tune parameters automatically. On the basis of the original least squares support vector machine (LS-SVM) optimization problem, Chen et al. [21] achieved the purpose of model transfer by adding the penalty function and constraint condition of the auxiliary set, respectively, in the original objective function and constraint condition. Cawley gave a detailed exposition of the leave-one-out method and cross-validation of weighted LSSVM [22]. Luo et al. [23] used model transfer to classify object with a method called Multi-Kernel Transfer Learning, and the results indicated the great effect of the method. The model transfer learning methods mentioned above can only identify known classes. The method in [17] can identify a new class and is the only method in the current model transfer method that can identify new classes, but its extendibility is limited. Prior to the method in [17], the method proposed in this paper identities all the new classes, it also solves the problem that the difference between the new model and the original model increases gradually with the class extension. So it has higher practicability. Moreover, the proposed method achieves the domain adaption through transfer component analysis (TCA) [24], which improves the efficiency of model transfer learning. Besides, the method proposed in this paper applies model transfer to the field of fault diagnosis, which is rarely studied until now.

    1)Problem Settings and Definitions: In the paper, column vectors and matrices are denoted by small and capital bold letters, respectively, e.g. m=[a1,a2,,ad]T. We denote the input sample sets as X=[x1,x2,,xm]Rd×m, where m is the number of samples and d is the characteristic number of each sample. Supposing we have m samples with n classes (mn) that need to be classified, the labels are denoted as yiY={1,2,,n} with i{1,2,,n}. For convenience, we set the label matrix as YBm×n, in which Yin={1,0,yi=notherwise}. We want to learn a function f(x)=WTϕ(x)+b, which assigns the classes of input vector. ϕ() is used to map x to a high dimensional feature space to solve the nonlinear problems.

    2) Model Transfer Learning Based on LS-SVM: Firstly, we extend LS-SVM to a multiclass form through one-vs-all (OVA) as the base model and expressed as

    minW,b12W2+C2Mi=1ξi2s.t.yi=WTϕ(xi)+b+ξii{1,2,,M}. (1)

    Solving (1) we can obtain an N-hyperplanes set from the source (old process) N classes and the N-hyperplanes set can be expressed as classification models. So we regard W as the source optimal model W=[W1,,WN]. When facing a target (new process) task, we can add a global regularization term to (1) to make the new hyperplanes set W close to W, thus the result is

    minW,b12WW2F+12W2F+C2Mi=1ξ2is.t.yi=WTϕ(xi)+b+ξii{1,2,,M}. (2)

    For binary classifier, there is only one hyperplane, so W is a vector and constraint term uses L2 norm so as not to transfer too much. As for multiclass, W is a hyperplane set and then it is a matrix. Since the form of L2 norm is similar to the F norm, the L2 norm focuses on the vector and the F norm focuses on the matrix, so we choose the F norm constraint. To avoid the optimization problem caused by too many parameters, we simplify the weights of global regularization terms to 1/2 and so do all the following global regularization terms and local regularization terms. Of course, in practical applications, the optimization algorithms of parameters can be designed according to the actual situations. Here, we mainly explained the usage of each global regularization term.

    Until now, we have just made a simple approximation of the source model and target model, which allows the source knowledge to transfer to the target domain as much as possible. The reason why we do not replace 12W2F with 12WW2F like other model transfer methods is that minimizing 12W2F can reduce the weight vector, decrease the model complexity, avoid over-fitting, and improve the generalization performance. However, we cannot ignore what is unique to the target task. We will explain it in the next section.

    3) From N to N+1: Incremental Transfer Learning: Incremental transfer learning aims at finding a new hyperplanes set W=[W1,,WN],wN+1 to achieve two goals. The first one is to identify the new class in the target domain by transferring the source models (N classes). While the other one is to make sure the performance on source N classes does not decline too much compared with the former. In other transfer learning methods, the second goal is often ignored. Because of the linearity of the model, we can force to shorten the distance between the two classifiers to make the classifiers performance similar. And both of the goals can be implemented by distance-based regularization terms.

    For the new hyperplane wN+1, we enforce it to get close to a linear combination of the source models W=[W1,,WN]. Therefore, the local regularization term \frac{1}{2} \left\|\mathbf{w}_{N+1}-\boldsymbol{\beta}\mathbf{W}'\right\|_F^2 is added to achieve the first goal. Meanwhile, the coefficient vector \boldsymbol{\beta} = [\beta_1, \ldots, \beta_N]^T is used to weigh the transfer amount of every source model. And \boldsymbol{\beta} can be calculated automatically with the algorithm later. Similarly, we can use the global regularization term \frac{1}{2} \left\|\mathbf{W}-\mathbf{W}'\right\|_F^2 to achieve the second goal by forcing new hyperplane set \mathbf{W} to be close to the prior hyperplanes \mathbf{W}' . It can also prevent negative transfer of the whole model. To achieve the two goals, in the model transfer learning with LS-SVM, we obtain the function

    \begin{align} &\mathop {\min }\limits_{{\mathbf{W}}, b} \sum\limits_{n = 1}^N {\frac{1}{2}\left\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}'} \right\|_F^2} + \sum\limits_{n = 1}^N {\frac{1}{2}\left\| {{{\mathbf{w}}_{N + 1}} - {\boldsymbol{\beta }}{{\mathbf{W}}_n}'} \right\|_F^2} \\ &\; \; \; \; \;\;\; + \sum\limits_{n = 1}^{N + 1} {\frac{1}{2}\left\| {{{\mathbf{W}}_n}} \right\|_F^2 + \frac{C}{2}\sum\limits_{i = 1}^M {\xi _i^2} } \\ & {\rm s.t.}\;\;\;{y_i} = {{\mathbf{W}}^T} \cdot \phi ({{\mathbf{x}}_i}) + b + {\xi _i}\;\;\;\;\;\;\forall i \in \left\{ {1, 2, \ldots, M} \right\} \end{align} (3)

    Let us reiterate the role of each item. The global regularization term \frac{1}{2} \left\|\mathbf{W}-\mathbf{W}'\right\|_F^2 is to avoid negative transfer. While local regularization term \frac{1}{2} \left\|\mathbf{w}_{N+1}-\boldsymbol{\beta}\mathbf{W}'\right\|_F^2 allows local modification of the model and avoids the problem that model transfer usually focuses on global regularization and ignore what is unique to the target model. Minimizing \frac{1}{2} \left\|\mathbf{W}\right\|_F^2 is to avoid over-fitting. The last term is a relaxation variable for LS-SVM. The Lagrangian to (3) can be expressed as

    \begin{align} L& = \mathop {\min }\limits_{{\mathbf{W}}, \mathbf{w}_{N + 1}, b} \sum\limits_{n = 1}^N {\frac{1}{2}\left\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}'} \right\|_F^2} \\ &\;\;\;\;\;+ \sum\limits_{n = 1}^N {\frac{1}{2}\left\| {{{\mathbf{w}}_{N + 1}} - {\boldsymbol{\beta }}{{\mathbf{W}}_n}'} \right\|_F^2} \\ &\;\;\;\;\;+ \sum\limits_{n = 1}^{N + 1} {\frac{1}{2}\left\| {{{\mathbf{W}}_n}} \right\|_F^2} + \frac{C}{2}\sum\limits_{i = 1}^M {\zeta _i^2} \\ &\;\;\;\;\;- \sum\limits_{i = 1}^M {{\alpha _i}} \{ {{\mathbf{W}}^T} \cdot \phi ({{\mathbf{x}}_i}) + b + {\xi _i} - {y_i}\;\} \end{align} (4)

    Then, we can get

    \begin{align} \frac{{\partial L}}{{\partial {\mathbf{W}}}}& = {\mathbf{0}} \Rightarrow \sum\limits_{n = 1}^N {} {{\mathbf{W}}_n} = \frac{1}{2}(\sum\limits_{i = 1}^M {\sum\limits_{n = 1}^N {{{\mathbf{W}}_n}' + {\alpha _i}_n\phi ({{\mathbf{x}}_i}} } )) \end{align} (5)
    \begin{align} \frac{{\partial L}}{{\partial {{\mathbf{w}}_{N + 1}}}} & = 0 \Rightarrow {{\mathbf{w}}_{N + 1}} = \frac{1}{2}(\sum\limits_{i = 1}^N \; {\boldsymbol{\beta }}{{\mathbf{W}}_n}' + \sum\limits_{i = 1}^M {{\alpha _i}_{N + 1}\phi ({{\mathbf{x}}_i}} )\;) \end{align} (6)
    \begin{align} \frac{{\partial L}}{{\partial {\xi_i}}}& = 0 \Rightarrow {\xi _i} = \frac{{{\alpha _i}}}{C} \end{align} (7)
    \begin{align} \frac{{\partial L}}{{\partial {\alpha _i}}} & = 0 \Rightarrow \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + 1} {{y_i}_n} } = \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + 1} {{{\mathbf{W}}_n}^T\phi ({{\mathbf{x}}_i}) + {b_n} + {\xi _{in}}} }\\ &\;\;\;\;\;\;\;\; = \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^N ( } {\mathbf{W}}_n^T\phi ({{\mathbf{x}}_i}) + {b_n} + {\xi _{in}})\\ &\;\;\;\;\;\;\;\; + \sum\limits_{i = 1}^M {{\mathbf{w}}_{N + 1}^T} \phi ({{\mathbf{x}}_i}) + {b_{N + 1}} + {\xi _{iN + 1}} \end{align} (8)

    Putting \mathbf{W}_n from (5), \mathbf{w}_{N+1} from (6), \xi_i from (7) into (8), we can get

    \begin{align} &\sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + 1} {{Y_{in}}} } \; - \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^N {\frac{1}{2}{\mathbf{W}}_n^{{\mathbf{'}}T}\phi ({{\mathbf{x}}_i}) - \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^N {\frac{1}{2}\boldsymbol{\beta } {\mathbf{W}}_n^{{\mathbf{'}}T}} } \phi ({\mathbf{x}}_i)} } \\ & = \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^N {\frac{1}{2}{\alpha _{in}}} } \phi ({{\mathbf{x}}_i})^T\phi ({{\mathbf{x}}_i}) + {b_n} + \frac{{{\alpha _{in}}}}{C}\\ &+ \sum\limits_{i = 1}^M {\frac{1}{2}{\alpha _{i(N + 1)}}} \phi ({{\mathbf{x}}_i})^T\phi ({{\mathbf{x}}_i}) + {b_{N + 1}} + \frac{{{\alpha _{i(N + 1)}}}}{C}. \end{align} (9)

    In matrix form we have

    \begin{align} \left[ {\begin{array}{*{20}{c}} {{\mathbf{Y}} - \mathop {\mathbf{Y}}\limits^\sim }\\ {\mathbf{0}} \end{array}} \right] = \left[ {\begin{array}{*{20}{c}} {\frac{1}{2}{\mathbf{K}} + \frac{1}{C}{\mathbf{I}}}&{\mathbf{1}}\\ {{{\mathbf{1}}^T}}&{\mathbf{0}} \end{array}} \right]\left[ {\begin{array}{*{20}{c}} {\mathbf{A}}\\ {\mathbf{b}} \end{array}} \right] \end{align} (10)

    where {{\mathbf{K}}_{ki}} = \phi ^T{({{\mathbf{x}}_k})}\phi ({{\mathbf{x}}_i}) , and \mathbf{\mathop Y\limits^ \sim} = [\begin{array}{*{20}{c}} {\frac{1}{2}{{\mathbf{X}}^T}{\mathbf{W}}'} & {\frac{1}{2}{{\mathbf{X}}^T}{\mathbf{W}}' {\boldsymbol{\beta }}} \end{array}], \; {\mathbf{A}} = {\mathbf{A}}' - \left[{\begin{array}{*{20}{c}} {{\mathbf{A}}''} & {{\mathbf{A}}''β} \end{array}} \right] , {\mathbf{b}} = {\mathbf{b}}' - \left[{\begin{array}{*{20}{c}} {{\mathbf{b}}''} & {{\mathbf{b}}''β} \end{array}} \right] , Let {\mathbf{M}} = \left[{\begin{array}{*{20}{c}} {\frac{1}{2}{\mathbf{K}} + \frac{1}{C}{\mathbf{I}}} & {\mathbf{1}}\\ {{{\mathbf{1}}^T}} & {\mathbf{0}} \end{array}} \right] , so

    \begin{align} \left[ {\begin{array}{*{20}{c}} {{\mathbf{A}}'}\\ {{\mathbf{b}}{'^T}} \end{array}} \right] = {{\mathbf{M}}^{ - 1}}\left[ {\begin{array}{*{20}{c}} {\mathbf{Y}}\\ {\mathbf{0}} \end{array}} \right] \end{align} (11)
    \begin{align} \left[ {\begin{array}{*{20}{c}} {{\mathbf{A}}''}\\ {{\mathbf{b}}'{'^T}} \end{array}} \right] = {{\mathbf{M}}^{ - 1}}\left[ {\begin{array}{*{20}{c}} {\frac{1}{2}{{\mathbf{X}}^T}{\mathbf{W}}'}\\ {\mathbf{0}} \end{array}} \right]. \end{align} (12)

    Through (5), (6), (11), (12), we can get the new hyperplane set \mathbf{W} = [\mathbf{W}_1, \ldots, \mathbf{W}_N], \mathbf{w}_{N+1} , and the transfer learning problem can be solved as long as the parameter \boldsymbol{\beta} is determined. The main thought of tuning the value of the parameter automatically is as follow: the label leave-one-out (LOO) prediction \mathbf{Y(\boldsymbol{ \pmb{\mathsf{ β}}})} of test samples is measured by the hinge loss, and then the parameter \boldsymbol{\beta} is self-adjusted by solving the minimization problem of LOO error. Finally, LOO prediction is presented as {\mathop {\mathbf{Y}}\limits^ \wedge _{in}} = {{\mathbf{Y}}_{in}} - {{{{\mathbf{A}}_{in}}}}/{{{\mathbf{M}}_{ii}^{ - 1}}} . The function of minimizing the LOO error is as follow.

    \begin{array}{l} \mathop {\min }\limits_\beta \left\{ \begin{array}{l} \max (1 + {Y_{i(N + 1)}}(\beta ) - {Y_{i{y_i}}}(\beta ),\;0)\;\;\;\;\;{y_i} \ne N + 1\\ \mathop {\max }\limits_{r \ne {y_i}} (1 + {Y_{ir}}(\beta ) - {Y_{i{y_i}}}(\beta ),\;0)\;\;\;\;\;\;\;\;\;\;{y_i} = N + 1 \end{array} \right.\\ {\rm{s}}.{\rm{t}}.\;\;{\left\| \beta \right\|_2} \le 1,\;\;\;\;\;\;\;\;\;{\beta _i} \ge 0,i = 1, \ldots ,N \end{array} (13)

    4) From N to N+M: IMTL Algorithm With Few Example: Based on the method above and the thought of incremental learning with SVM, we complete the extension from N to N+M with LS-SVM. Incremental learning here refers to training a new class at each iteration. The steps of the incremental algorithm with LS-SVM is summarized in Algorithm 1.

    Algorithm 1: IMTL: Incremental Model Transfer Learning
    Input: Training set {\mathbf{X}}, with m-2 subsets adding a class in turn {{\mathbf{X}}_1}, \ldots, {{\mathbf{X}}_{M - 2}}, the number of the classes from 3 to m.
    Output: An LS-SVM classifier model based on {\mathbf{X}}.
    Step 1: Take the subset {\mathbf{X}_1} for training, the hyperplane set {\mathbf{W}_1} classifies class 1 with the other classes, the hyperplane set {\mathbf{W}_2} classifies class 2 with the other classes, and the hyperplane set classifies {\mathbf{W}_3} class 3 with the other classes.
    Step 2: Train hyperplane set {{\mathbf{W}}_1}, {{\mathbf{W}}_2}, {{\mathbf{W}}_3} and subset {\mathbf{X}_2}, the hyperplane set {\mathbf{W}_4} was obtained, specific algorithms have been introduced above.
    Step 3: Repeat step 2 to train the obtained new hyperplane set and data subset {\mathbf{X}_i}, over and over again, until the training set {\mathbf{X}_{M-2}}. The hyperplane set {\mathbf{W}_M} trained by {\mathbf{W}_{M-2}} and {\mathbf{X}_{M-2}} is the classification model of the whole training set {\mathbf{X}}, then output {\mathbf{W}_M}.
    Step 4: End the algorithm.
     | Show Table
    DownLoad: CSV

    From the point of solving problem, the proposed method aims at finding a new hyperplane set \mathbf{W} = [\mathbf{W}_1, \ldots, \mathbf{W}_N, \mathbf{w}_{N+1}, \ldots, \mathbf{w}_{N+p-1}, \mathbf{w}_{N+p}], \mathbf{w}_{N+p+1} . Then our learning framework IMTL can be expressed as

    \begin{align} \begin{array}{l} \mathop {\min }\limits_{{\mathbf{W}}, b} \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\left\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}'} \right\|_F^2} + \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\left\| {{{\mathbf{w}}_{N + p + 1}} - \beta {{\mathbf{W}}_n}'} \right\|_F^2} \\ \;\;\;\;\;\; + \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\left\| {{{\mathbf{W}}_n}} \right\|_F^2} + \frac{C}{2}\sum\limits_{i = 1}^M {\xi _i^2} \\ {\rm s.t.}\;\;\;{y_i} = {{\mathbf{W}}^T} \cdot \phi ({{\mathbf{x}}_i}) + b + {\xi _i}\;\;\;\;\;\;\;\forall i \in \left\{ {1, 2, \ldots, M} \right\} \end{array} \end{align} (14)

    where \mathbf{W}' = [\mathbf{W}_1, \ldots, \mathbf{W}_N, \mathbf{w}_{N+1}, \ldots, \mathbf{w}_{N+p-1}, \mathbf{w}_{N+p}] , {\boldsymbol{\beta}} = {\left[{{\beta _1}, \ldots, {\beta _{N + { {\rm p}}}}} \right]^T} .

    The solution of the problem is similar to that of (3), and it can be regarded as the iteration of the solving problem (3). In other words, we can calculate \mathbf{w}_{N+1} first, and then regard \mathbf{W} = [\mathbf{W}_1, \ldots, \mathbf{W}_N], \mathbf{w}_{N+1} as \mathbf{W}' to find \mathbf {w}_{N+2} , repeatedly and iteratively, until \mathbf {w}_{N+p+1} is calculated.

    In theory, problems maybe appear as the classes increase: the prediction error rate may have a corresponding cumulative effect. Even though we use a global regularization term \frac{1}{2} \left\|\mathbf{W}-\mathbf{W}'\right\|_F^2 to correct the model in each iteration to solve \mathbf {w}_{N+p+1} , with the classes increasing in the target domain, the difference between the new model and the initial model will get greater, which causes a lower accuracy. The number of classes that the model can identify effectively depends on the complexity of the process data. To weaken the negative effects of the situation, we use \sum\limits_{n = 1}^N {\sum\limits_{t = 1}^n {\frac{1}{2}\Big\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}_{{ {\rm - }}t}'} \Big\|_F^2} } to replace \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\Big\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}'} \Big\|_F^2} , and then add \sum\limits_{n = 1}^{N + p} \sum\limits_{t = 1}^n \frac{1}{2}\Big\| {{\mathbf{w}}_{n + p + 1}} - \beta {{\mathbf{W}}_{n{ {\rm - t}}}}' \Big\|_F^2 to replace \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\Big\| {{{\mathbf{w}}_{n + p + 1}} - \beta {{\mathbf{W}}_n}'} \Big\|_F^2} when the error rate drops suddenly, to decrease the difference among new hyperplanes, the original model hyperplanes and the every pre-model.

    \begin{align} &\mathop {\min }\limits_{{\mathbf{W}}, b} \sum\limits_{n = 1}^{N + p} {\sum\limits_{t = 1}^n {\frac{1}{2}\left\| {{{\mathbf{W}}_n} - {{\mathbf{W}}_n}_{{ {\rm - }}t}'} \right\|_F^2} } \\& \qquad + \sum\limits_{n = 1}^{N + p} {\frac{1}{2}\left\| {{{\mathbf{W}}_n}} \right\|_F^2} + \frac{C}{2}\sum\limits_{i = 1}^M {\xi _i^2} \\& \qquad + \sum\limits_{n = 1}^{N + p} {\sum\limits_{t = 1}^n {\frac{1}{2}\left\| {{{\mathbf{w}}_{n + p + 1}} - \boldsymbol{\beta} {{\mathbf{W}}_{n{ {\rm - t}}}}'} \right\|_F^2} } \\ & {\rm s.t.}\;\;\;{y_i} = {{\mathbf{W}}^T} \cdot \phi ({{\mathbf{x}}_i}) + b + {\xi _i}\;\;\;\;\;\;\;\forall i \in \left\{ {1, 2, \ldots, M} \right\}\; \end{align} (15)

    The problem is the combination of (3) and (14), we can get

    \sum\limits_{n = 1}^{N + p} {{{\mathbf{W}}_n}} = \frac{1}{{2n}}\left({ {\sum\limits_{n = 1}^{N + p} {\sum\limits_{t = 0}^n {{\mathbf{W}'}_{n - t}} } { + \sum\limits_{i = 1}^m {{\alpha _{in}}} \phi ({\mathbf{x}_i})}}} \right)
    {{\mathbf{w}}_{N + p + 1}} = \frac{1}{{2n}}\left({ {\sum\limits_{n = 1}^{N + p} {\sum\limits_{t = 0}^n {{\boldsymbol{\beta} {\mathbf{W}'}_{n - t}} }} {+ \sum\limits_{i = 1}^m {{\alpha _{i(N = p + 1)}}} \phi ({\mathbf{x}_i})} }}\right)

    Then

    \begin{align*} &\sum\limits_{{ {\rm i}} = 1}^M {\sum\limits_{n = 1}^{N + { {\rm p}} + 1} {{Y_{in}}} } \; - \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + p} {\sum\limits_{{ {\rm t}} = 1}^n {\frac{1}{{2n}} {\mathbf{W}}_{n - t}^{{\mathbf{'}}T}\phi ({{\mathbf{x}}_i})} }} \nonumber\\& \qquad - \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + p} {\sum\limits_{{ {\rm t}} = 1}^n {\frac{1}{{2n}}\boldsymbol{\beta} {\mathbf{W}}_{n - t}^{{\mathbf{'}}T}\phi ({{\mathbf{x}}_i})} } } \nonumber\\& \quad = \sum\limits_{i = 1}^M {\sum\limits_{n = 1}^{N + { {\rm p}}} {\frac{1}{{2n}}{\alpha _{in}}\phi ^T {{({{\mathbf{x}}_k})}}} } \phi ({{\mathbf{x}}_i}) + {b_n} + \frac{{{\alpha _{in}}}}{C}\nonumber\\& \qquad + \sum\limits_{i = 1}^M {\frac{1}{{2n}}{\alpha _{i(N + p + 1)}}} \phi ^T{({{\mathbf{x}}_k})}\phi ({{\mathbf{x}}_i})\nonumber\\& \qquad + {b_{N + 1}} + \frac{{{\alpha _{i(N + 1)}}}}{C}. \end{align*}

    Then \mathbf{W} = [\mathbf{W}_1, \ldots, \mathbf{W}_N, \mathbf{w}_{N+1}, \ldots, \mathbf{w}_{N+p}], \mathbf{w}_{N+p+1} and the prediction matrix \mathbf{Y}' can be got. As for computational complexity, with the classes increasing, it increases linearly. However, we can also use the learning framework presented in (14), it can work well within certain range and is easier in implementation, since it does not need to be iterated. So we use the method presented in (15) to correct the diagnosis when the accuracy of method presented in (14) suddenly drops a lot.

    The old fault diagnosis model has been built by the old process data \mathbf{x_s} and its corresponding label matrix {{\mathbf{Y}}_s} \in \{ normal, faul{t_1}, \ldots, faul{t_{(N - 1)}}\} . Considering the fault diagnosis of the new processes, we use the old model and the new process data to build a new model in the new complex process, and the {{\mathbf{Y}}_t} \in \{ normal, faul{t_1}, \ldots, faul{t_{(N +M- 1)}}\} is the new process data label matrix. The procedures are as follows.

    1) TCA Process: As mentioned earlier, the efficiency of transfer learning increases as the distribution difference between old process data and new process data decreases. So we need to seek a mapping to minimize the difference in a subspace for a good transferring effect. First, Maximum Mean Difference (MMD) based on the Reproduction Kernel Hilbert Space (RKHS) is used to represent the distribution difference of two datasets, and let one dataset \mathbf{X} satisfy P distribution, {\mathbf{X}^s} = \{ {\mathbf{x}}_1^s, \ldots, {\mathbf{x}}_{{n_1}}^s\} and the other dataset satisfy Q distribution which is set as {\mathbf{X}^t} = \{ {\mathbf{x}}_1^t, \ldots, {\mathbf{x}}_{{n_1}}^t\} When n and m tend to infinity, the MMD between data set {{\mathbf{X}}^s} and {{\mathbf{X}}^t} can be expressed as

    \begin{align} f({{\mathbf{X}}_S}, {{\mathbf{X}}_T}) = \left\| {\frac{1}{{{n_1}}}\sum\limits_{i = 1}^{{n_1}} {\phi ({\mathbf{x}}_i^s) - } \frac{1}{{{n_2}}}\sum\limits_{i = 1}^{{n_2}} {\phi ({\mathbf{x}}_i^t)} } \right\| \end{align} (16)

    As for process data, the distribution difference between the old process data and the new process data can be expressed as

    \begin{align} {D_{ist}}({\mathbf{X}'}_S, {\mathbf{X}'}_T) = \left\| {\frac{1}{{{m_s}}}\sum\limits_{i = 1}^{{m_s}} {\phi ({\mathbf{x}}_i^s) - } \frac{1}{{{m_t}}}\sum\limits_{i = 1}^{{m_t}} {\phi ({\mathbf{x}}_i^t)} } \right\|_H^2 \end{align} (17)

    The nonlinear mapping function can be obtained by minimizing (17). With the method presented in [24], (17) could be converted into

    \begin{align} \begin{array}{l} \min {\rm tr}({{\mathbf{W}}^T}{\mathbf{W}}) + \mu {\rm tr}({{\mathbf{W}}^T}{\mathbf{KLKW}})\\ {\rm s.t.}\;\;{{\mathbf{W}}^T}{\mathbf{KHKW}} = {\mathbf{I}} \end{array} \end{align} (18)

    where {\mathbf{K}} = \left[{\begin{array}{*{20}{c}} {{{\mathbf{K}}_{S, S}}} & {{{\mathbf{K}}_{S, T}}}\\ {{{\mathbf{K}}_{T, S}}} & {{{\mathbf{K}}_{T, T}}} \end{array}} \right] .

    {{\mathbf{l}}_{i, j}} = \left\{ {\begin{array}{*{20}{c}} {\frac{1}{{m_s^2}}, {{\mathbf{x}}_i}, {{\mathbf{x}}_j} \in {{\mathbf{X}}_{src}}}\\ {\frac{1}{{m_t^2}}, {{\mathbf{x}}_i}, {{\mathbf{x}}_j} \in {{\mathbf{X}}_{tar}}}\\ { - \frac{1}{{{m_s}{m_t}}}, {\rm others}} \end{array}} \right.

    \mu : Trade-off parameter.

    {\mathbf{H}} = {{\mathbf{I}}_{{n_1} + {n_2}}} - \frac{1}{{{n_1} + {n_2}}}{\mathbf{1}}{{\mathbf{1}}^T} .

    {{\mathbf{I}}_{{n_1} + {n_2}}} : Constant value matrix.

    The solution of (18) is the first d eigenvalues of {({\mathbf{I}} + \mu {\mathbf{KLK}})^{ - 1}}{\mathbf{KLK}} . And the processed process data {\mathbf{X}} \in {\mathbf{R}^{d \times m}} is obtained after the treatment. The processed data of the old process and the new process are used in the future modeling process.

    2) Fault Diagnosis Model of the Old Process: Since LS-SVM framework makes it possible to write the LOO error in a closed form, we use it to express the old process model. And the LOO error is an unbiased estimator of the classifier generalization error and it can be used in model selection [20]. Consider the problem of LS-SVM

    \begin{array}{l} \mathop {\min }\limits_{{\mathbf{W}}, b} \; \; \frac{1}{2}{\left\| {\mathbf{W}} \right\|^2} + \frac{C}{2}\sum\limits_{i = 1}^M {{\xi _i}^2} \\ {\rm s.t.}\; \; \; {y_i} = {{\mathbf{W}}^T} \cdot \phi ({{\mathbf{x}}_i}) + b + {\xi _i}\; \; \; \; \; \; \forall i \in \left\{ {1, 2, \ldots, M} \right\} \end{array}

    where {\xi _i} is a slack variable, minimizing \frac{C}{2}\sum\limits_{i = 1}^M {{\xi _i}^2} to allow as many samples as possible to satisfy the constraint, and the model parameters ({\mathbf{W}}, {\mathbf{b}}) are found by introducing Lagrangian function and solving the optimization problem. The optimal {\mathbf{W}} is expressed as {\mathbf{W}} = \sum\limits_{i = {\mathbf{1}}}^M {{{\mathbf{ \pmb{\mathsf{ α}} }}_i}} \phi ({{\mathbf{x}}_{i}}) . And ({\boldsymbol{\alpha }}, b) are found to solve \left[{\begin{array}{*{20}{c}} {{\mathbf{K}} + \frac{1}{C}{\mathbf{I}}} & {\mathbf{1}}\\ {{{\mathbf{1}}^T}} & 0 \end{array}} \right]\; \left[{\begin{array}{*{20}{c}} {\mathbf{A}}\\ {\mathbf{B}} \end{array}} \right] = \left[{\begin{array}{*{20}{c}} {\mathbf{Y}}\\ {\mathbf{0}} \end{array}} \right] . Then we get the optimal hyperplanes parameters {\mathbf{W'}} in the old process.

    3) Build A New Diagnosis Model in the New Process: After the two steps above, we have got what we need to build new diagnosis model as (15). After solving the optimization problem (15), the optimal hyperplane parameter {\mathbf{W}} and {\mathbf{b}} of the new process will be obtained.

    4) Get Fault Diagnosis Results: The diagnosis results of the new process data can be shown as:

    \mathop {\mathbf{Y}}\limits^ \wedge = {{\mathbf{W}}^T}{\mathbf{X}} + {\mathbf{b}} . For each sample {{\mathbf{x}}_i} , {y_i} is the corresponding class, such as normal, fault 2, fault N, and so on. Therefore, we can achieve fault detection and fault classification simultaneously.

    The experimental process is divided into two parts. Firstly, we validate the effectiveness and feasibility of the method with an accepted simulation platform (TE process). Then we verify the performance of the algorithm with the actual complex process (ore grinding classification process). In the simulation, all experiments are based on class expansion, 4 classes are taken for example, each class is used as the unknown class of the new process in turn, and the other three classes are used as known classes of the old process. The algorithm is verified by the cyclic cross validation. Similarly, we also expanded 3 classes to 4, 5, 6 classes in the new process.

    TE process is a real simulation of the manufacturing facilities of Tennessee Eastman chemical company. It is often used to test and verify the fault detection and diagnosis methods. The process is subjected to an indefinite Gauss noise perturbation so that we can study the control of the process. TE process consists of 12 operation variables and 41 measurement variables, and a total of preset 21 faults are included. There is a zero vector which can generate 20 kinds of faults (the last fault is not included) in the disturbance module. From [25], [26], faults 3, 4, 5, 9, 15 are usually difficult to detect. Since TE process is an actual industrial production process, it has six modes of operation to meet the market demand. The data generated by different operation modes are used to represent the different distributions between the old process and the new process. The classes of each expanded experimental process are shown in Table Ⅰ.

    Table  Ⅰ.  CLASS DISTRIBUTION OF EACH CASE IN THE TE EXPERIMENTS
    The total number of classesClass distribution
    3 classesnormal, fault 1, fault 2
    4 classesnormal, fault 1, fault 2, fault 3
    5 classesnormal, fault 1, fault 2, fault 3, fault 4
    6 classesnormal, fault 1, fault 2, fault 3, fault 4, fault 5
     | Show Table
    DownLoad: CSV

    This experiment is used to compare the performance of IMTL and the traditional fault diagnosis method KPCA (Kernel Principal Component Analysis). In both old process and new process, fault 1 is introduced from the 100th sample point, and fault 2 is introduced from the 200th sample point. Fig. 1 is the result of fault diagnosis by KPCA in the old process. Fig. 2 is the diagnosis result of the new process by using the old KPCA model. SPE and T2 are the monitoring indices of KPCA, and dotted line is the control limit. If the monitoring index does not exceed the control limit, the process is normal, otherwise, the process is in some fault states. Obviously, the method has a good fault detection effect in old process. However, the discernment of fault 2 in the new process is bad because of the distribution difference. And the fault 1 is not detected exactly. Thus, the traditional method of fault diagnosis is no longer applicable when distribution difference exists. The simulation result of IMTL is shown in Fig. 3. In the simulation graph, The Y -axis represents the fault diagnosis result matrix, and the 3 kinds of lines represent the distributions of 3 columns of the data in the result matrix. Because our model is based on LS-SVM, for each sample, the predictive value of the correct class should be close to 1 and the wrong one should be close to -1 . So in the picture, the highest point represents the current class of each sample.

    Figure  1.  KPCA detecting results of the old process.
    Figure  2.  KPCA detecting results of the new process.
    Figure  3.  The diagnosis result of IMTL.

    300 new process examples and 300 old process examples are selected to verify the superiority of IMTL compared with the method MULTIpLE presented in [17]. When MULTIpLE is used to classify the process data, we must extend it with kernel function first because of the nonlinearity of process data. The final simulation results are shown in Fig. 4, and the result of IMTL has been shown in Fig. 3. We can see that MULTIpLE can diagnose faults in general, but the boundary fault points can not be distinguished very well. However IMTL works well without this problem. Figs. 5 and 6 are the results of merely adding a global regularization term and a TCA term to MULTIpLE, respectively. After domain adaption by TCA, the classification graph is smoother and the accuracy is higher. So it is very necessary to dispose data in the model transfer with TCA. The global regularization term reduces graphic fluctuation. It optimizes the whole algorithm.

    Figure  4.  The diagnosis result of MULTIpLE.
    Figure  5.  The result of adding global regularization.
    Figure  6.  The result of adding TCA.

    The old process model and a few labeled sample of new process are used to train the new process model. We use 100 sets of examples to test the model, Figs. 7-9 represent using 10, 20, and 100 new examples to train new model respectively. We do verification experiments ten times, and the line chart of accuracy rate is shown in Fig. 10. Comparing the three groups of experiments, we come to the conclusion that the more the new process examples are, the better the results can be acquired. But when a new process which has only 20 or 10 labeled examples, the diagnosis results of IMTL also have a great guidance significance.

    Figure  7.  The result with 10 labeled samples.
    Figure  8.  The result with 20 labeled samples.
    Figure  9.  The result with 100 labeled samples.
    Figure  10.  The accuracy rate of 10 experiments.

    We do a series of experiments on multi-faults classification and obtain the simulation results when there are some new faults in the new process. There are 3 classes in the old process, Fig. 11 is the diagnosis effect of the original faults in the new process. Figs. 12-14 indicate that the fault classes in the new process extend from 3 to 4, 5, and 6, respectively. For TE process, there are 3 known classes in old process, the effect of fault classification is not very desirable when the class of the new process increases to 6. In this case, we use the method described in (15) to adjust the model, and Fig. 15 is the adjusted effect. In addition, for the fault 3 and fault 4 in TE process, which are difficult to distinguish with many other methods, our algorithm can separate it correctly.

    Figure  11.  The diagnosis result of 3 classes in the new process.
    Figure  12.  The diagnosis result of 4 classes in the new process.
    Figure  13.  The diagnosis result of 5 classes in the new process.
    Figure  14.  The diagnosis result of 6 classes in the new process.
    Figure  15.  The diagnosis effect after the model adjustment.

    Hydro-Cyclone grading is an important part of ore grinding classification process. Because of the nonlinearity and the serious coupling among the variables, hydro-cyclone grading process is often with high failure rate and must be monitored. Typical process is shown in Fig. 16. During the process, the crushed row ore will be sent to the ball mill by adjusting the feeding device. At the same time, supplementary water and steel ball are added according to a certain proportion. After the ball mill rotating, the ore pulp is discharged and enters the slime dam together with the supplementary water. The mixed ore pulp is conveyed to hydro-cyclone under the action of the slurry pump with a certain hydro-cyclone pressure. The ore pulp passes into the hydro-cyclone and forms a rotary current inside the hydro-cyclone. Then the tiny particles are discharged through the overflow of the hydro-cyclone and become the final products. The coarse particles (grit) are discharged from the ore discharge outlets of the hydro-cyclone, and then the grit is returned to the ball mill to carry out cyclic grinding.

    Figure  16.  Hydro-Cyclone classification technological process.

    In ore grinding classification process, the detectable variables mainly include ore feeding flow (FIT2003), the ore feed pressure of hydro-cyclones (PIT2001), pump sump level (LIT2004 and LIT2005), ore feed concentration of hydro-cyclones (DIT2001), overflow granularity (AT2001), the overflow concentration (DIT2002), flow-rate of overflow (FIT2005), and the amount of the returning sand (RS). The controllable variables mainly include the amount of supplementary water (FIT2004) in slime dam and the rotational speed of turning pump (ZJB2DL). The 11 main variables above are selected as monitor data in the experiments. Because there exist serious coupling and nonlinearity among the liquid level control of the slime dam, the control of the hydro-cyclone pressure, and the control of the cyclone concentration, many kinds of faults occur frequently during the process. The typical faults and the causes are shown in Table Ⅱ.

    Table  Ⅱ.  TYPICAL FAULTS AND CAUSES
    Monitoring variables The creative phenomenon The possible reasons
    Pressure of hydro-cyclone Data abnormal a) Hydro-cyclone blocking
    b) Liquid level of slim dam abnormal
    c) Concentration abnormal
    d) Slurry pump abnormal
    Feed concentration of hydro-cyclone Data abnormal a) Supplementary water abnormal
    b) Liquid level of slim dam abnormal
    Liquid level of slime dam Data abnormal a) Pressure of hydro-cyclone abnormal
    b) Supplementary water abnormal
    c) Production load abnormal
    Supplementary water of slime dam Data abnormal a) Fault of water supply valve
    b) Fault of pipeline
    Overflow granularity of hydro-cyclone Data abnormal a) Pressure of hydro-cyclone abnormal
    b) Feed concentration of hydro-cyclone abnormal
    c) Change of ore quality
    d) Fault of granularity analyzer
     | Show Table
    DownLoad: CSV

    In our experiments, the normal data is collected in regular producing process and the fault data is collected in debugging state. The experiments use the data of 5 kinds of fault states and 1 normal state. The fault 1 is caused by the blockage of the outlet of the hydro-cyclone. The fault 2 is caused by the abnormal of supplementary water. The fault 3 is caused by the abnormal of production load. The fault 4 is caused by the fault of feed water valve. And the fault 5 is caused by the granularity analyzer fault. The label is defined according to the experienced workers. The empirical values of each variable in the six typical states are shown in Table Ⅲ.

    Table  Ⅲ.  THE EMPIRICAL VALUES OF VARIABLES IN EACH MODE
    Normal Fault 1 Fault 2 Fault 3 Fault 4 Fault 5
    ZJB2DL 235-308 0 218-253 0 240-308 218-321
    FIT2003 587-694 0.029-523 259-336 0.058-326 671.3-676 327-521
    DIT2001 56-60 0.028-52.8 0.028-52.8 57.4-57.9 57.47-58.71 57.80-60.01
    PIT2001 0.049-0.070 0.002-0.007 0.003-0.078 0-0.02 0-0.04 0.03-0.06
    FIT2005 301-438 12.73-451 227-338 12.73-338 0.05-0.70 87.0-247.7
    DIT2002 30.1-42.39 9.08-57.84 9.14-24.65 99.83 308-335 44.0-58.07
    LIT2004 4.16-4.20 4.17-4.2 4.16-4.18 1.25-4.17 1.15-4.16 1.66-2.34
    LIT2005 1.10-1.34 1.10-1.34 1.18-4.18 2.72-2.77 2.73-2.77 1.79-1.85
    FIT2004 93.66-95.25 93.66-95.25 93.66-94.10 99.83 99.83 98.0-99.57
    AT2001 89.76-90.10 89.76-90.10 89.76-90.2 0.29 98.0-99.6 0.29
    RS 280-300 430-480 280-300 280-300 280-300 280-300
     | Show Table
    DownLoad: CSV

    Firstly, we use the method MULTIpLE and IMTL to diagnose ore grinding classification process. The effects are shown in Figs. 17 and 18. Even if the MULTIpLE method is extended to a nonlinear method, the effect is much worse than that of IMTL, which fully demonstrates the necessity of our improvement.

    Figure  17.  The diagnosis result of MULTIpLE.
    Figure  18.  The diagnosis result of IMTL.

    Figs. 19 and 20 are the results of adding the following two terms to objective function on the MULTIpLE. Term 1 is a global regularization term and term 2 is a TCA term. After adding the two items, the figure has a smaller fluctuation and a smoother waveform compared with Fig. 17. Fig. 21 is the contribution of the two items in the error rate. From the result, there is a marked drop in error rate when only adding term 1 or term 2, and the effect of the two terms in the TE process is more obvious. The error rate of our method IMTL is much lower than that of MUTIpLE when adding the two items.

    Figure  19.  The result of adding global regularization.
    Figure  20.  The result of adding TCA.
    Figure  21.  Error rate comparison.

    In the simulation results, Figs. 22-25 are the simulation result of training new model by using old model and a small amount of samples in the new process. The results of training with 10, 20 and 100 labeled samples are shown in Figs. 22-25 is a line chart of the corresponding accuracy. We can see that there will be a high accuracy even though the labeled samples are rare in new process. So the algorithm can be applied to a new process without many labeled data, which is also a superiority of the model transfer learning. It greatly shortens the modeling time and cost.

    Figure  22.  The result with 10 labeled samples.
    Figure  23.  The result with 20 labeled samples.
    Figure  24.  The result with 100 labeled samples.
    Figure  25.  The accuracy rate of 10 experiments.

    Similar to the TE process simulation, we also use the ore grinding classification process data to do a series of fault classification experiments. When the old process has 3 known classes, the simulation results of 3, 4, 5, 6 classes in new process are correspondingly shown in Figs. 27-29, respectively.

    Figure  26.  The diagnosis result of 3 classes in new process.
    Figure  27.  The diagnosis result of 4 classes in new process.
    Figure  28.  The diagnosis result of 5 classes in new process.
    Figure  29.  The diagnosis result of 6 classes in new process.

    The classification effect of the first 5 classes is better. And with the increase of new classes, the overall performance almost keeps unchanged. After adding the sixth new category, the overall performance decreases. It indicates that the data of fault 5 is similar to the data of other faults in some variables. Some classes are not affected, which indicates that these classes are less coupled with fault 5. In these simulation results, the more new classes the new process owns, or the more complex the couplings among the faults are, the worse effect of overall fault separation will be. At this moment, we need to modify the model to decrease the coupling. The adjusted effect is shown in Fig. 30. It can be seen that each fault can be diagnosed accurately after adjustment.

    Figure  30.  The diagnosis effect after the model adjustment.

    A promising method IMTL of fault diagnosis is presented in this paper. For the process data with different distributions, an effective domain adaptation processing strategy is used, which creates good conditions for model reconstruction. Aiming at the problem that new faults may occur in new process, because of the same feature space, new faults can be regarded as a linear combination of the original faults, thus we obtain the representation of each new fault and realize the diagnosis of all faults in the new process. Moreover, the proposed method can realize fault detection and fault classification synchronously. The results based on complex process data demonstrate the effectiveness and practicability of the method. In future research, we expected to explore the setting and optimization strategies for the weights of regularized terms in the model.

  • [1]
    W. C. Sang, H. P. Jin, and I. B. Lee, "Process monitoring using a gaussian mixture model via principal component analysis and discriminant analysis, " Computers & Chemical Engineering, vol. 28, no. 8, pp. 1377-1387, 2004. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=65dba41b16cab1ad18e6181b8673da19
    [2]
    V. Venkatasubramanian, R. Rengaswamy, K. Yin, and S. N. Kavuri, "A review of process fault detection and diagnosis: part Ⅰ: quantitative model-based methods, " Computers & Chemical Engineering, vol. 27, no. 3, pp. 293-311, 2003. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_672e437ca64c6970ffd6c1d01ad2d258
    [3]
    C. Tong and X. Yan, "Statistical process monitoring based on a multimanifold projection algorithm, " Chemometrics & Intelligent Laboratory Systems, vol. 130, no. 2, pp. 20-28, 2014.
    [4]
    X. Wang, H. Feng, and Y. Fan, "Fault detection and classification for complex processes using semi-supervised learning algorithm, " Chemometrics & Intelligent Laboratory Systems, vol. 149, pp. 24-32, 2015. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=82ca5a4ad55c1a1329b652f15d109671
    [5]
    H. V. Khang, R. Puche-Panadero, J. S. L. Senanayaka, and K. G. Robbersmyr, "Bearing fault detection of gear-box drive train using active filters, " in Proc. Int. Conf. on Electrical Machines & Systems, 2016.
    [6]
    D. Aguado and C. Rosen, "Multivariate statistical monitoring of continuous wastewater treatment plants, " Engineering Applications of Artificial Intelligence, vol. 21, no. 7, pp. 1080-1091, 2008. doi: 10.1016/j.engappai.2007.08.004
    [7]
    Y. Hu, H. Ma, and H. Shi, "Local model based kpls with application to fault detection of batch process, " in Proc. Control and Decision Conf., 2013, pp. 1097-1103.
    [8]
    K. H. Hui, H. L. Meng, M. S. Leong, and S. M. Al-Obaidi, "Dempstershafer evidence theory for multi-bearing faults diagnosis, " Engineering Applications of Artificial Intelligence, vol. 57, pp. 160-170, 2017. doi: 10.1016/j.engappai.2016.10.017
    [9]
    K. H. Sorsa T, Koivo H N, "Neural networks in process fault diagnosis, " IEEE Transactions Systems Man & Cybernetics, vol. 21, no. 4, pp. 815-825, 1991. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_0ad8c1ae40e777ef590f62d2dd20f5ab
    [10]
    H. Wang, S. Fan, J. Song, Y. Gao, and X. Chen, "Reinforcement learning transfer based on subgoal discovery and subtask similarity, " IEEE/CAA Journal of Automatica Sinica, vol. 1, no. 3, pp. 257-266, 2015. http://www.ieee-jas.org/en/article/id/adcd1be6-2455-4465-95ee-be31144eb2d6
    [11]
    S. T. E., "Transfer learning progress and potential, " AI Magazine, vol. 32, no. 1, pp. 84-86, 2011.
    [12]
    Y. Q. Pan S J, "A survey on transfer learning, " IEEE Transactions on Knowledge & Data Engineering, vol. 22, no. 10, pp. 1345-1359, 2010. http://d.old.wanfangdata.com.cn/OAPaper/oai_doaj-articles_84078f189d8147bc08febd7908b6de2b
    [13]
    X. Wang, J. Ren, and S. Liu, "Distribution adaptation and manifold alignment for complex processes fault diagnosis, " Knowledge-Based Systems, 2018. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=1d389eb6824347d977c688f11423490c
    [14]
    R. Y. Siyu S, Stephen M A, "Highly-accurate machine fault diagnosis using deep transfer learning, " IEEE Transactions on Industrial Informatics, 2018.
    [15]
    L. Wen, L. Gao, X. Li, L. Wen, L. Gao, and X. Li, "A new deep transfer learning based on sparse auto-encoder for fault diagnosis, " IEEE Transactions on Systems Man & Cybernetics Systems, pp. 1-9, 2017.
    [16]
    D. Hu, A. Sarosh, and Y. F. Dong, "A novel kfcm based fault diagnosis method for unknown faults in satellite reaction wheels, " ISA Transactions, vol. 51, no. 2, pp. 309-316, 2012. doi: 10.1016/j.isatra.2011.10.005
    [17]
    I. Kuzborskij, F. Orabona, and B. Caputo, "From N to N+1: multiclass transfer incremental learning, " IEEE Computer Vision and Pattern Recognition, 2013, pp. 3358-3365.
    [18]
    J. Lu, K. Yao, and F. Gao, "Process similarity and developing new process models through migration, " Aiche Journal, vol. 55, no. 10, pp. 2318-2328, 2010. http://d.old.wanfangdata.com.cn/NSTLQK/NSTL_QKJJ0210968209/
    [19]
    N. Segevetal, M. Harel, S. Mannor, K. Crammer, and R. Elyaniv, "Learn on source, refine on target: a model transfer learning framework with random forests, " IEEE Transactions on Pattern Analysis & Machine Intelligence, pp. 1-1, 2016.
    [20]
    T. Tommasi, F. Orabona, and B. Caputo, "Safety in numbers: learning categories from few examples with multi model knowledge transfer, " in Proc. Computer Vision and Pattern Recognition, 2010, pp. 3081-3088.
    [21]
    C. Chen, F. Shen, and R. Yan, "Enhanced least squares support vector machine-based transfer learning strategy for bearing fault diagnosis, " Chinese Journal of Scientific Instrument, vol. 38, no. 1, pp. 33-40, 2017. http://www.wanfangdata.com.cn/details/detail.do?_type=perio&id=yqyb201701005
    [22]
    G. C. Cawley, "Leave-One-Out cross-validation based model selection criteria for weighted ls-svms, " in Proc. IEEE Int. Joint Conf. on Neural Network Proc., 2006, pp. 1661-1668.
    [23]
    J. Luo, T. Tommasi, and B. Caputo, "Multiclass transfer learning from unconstrained priors, " in Proc. Int. Conf. on Computer Vision, 2011, pp. 1863-1870.
    [24]
    S. J. Pan, I. W. Tsang, J. T. Kwok, and Q. Yang, "Domain adaptation via transfer component analysis, " in Proc. Int. Jont Conf. on Artifical Intelligence, 2009, pp. 1187-1192.
    [25]
    Z. Wu, "Research on fault detection method based on tennessee-eastman process, " Ph.D. dissertation, East China Jiaotong University, 2016.
    [26]
    X. Y. Ning Lv, "Fault diagnosis of te process based on two order mutual information feature selection, " Journal of Chemical Industry, vol. 60, no. 9, pp. 2252-2258, 2009.
  • Related Articles

    [1]Ke Chen, Wenjie Wang, Fangfang Zhang, Jing Liang, Kunjie Yu. Correlation-Guided Particle Swarm Optimization Approach for Feature Selection in Fault Diagnosis[J]. IEEE/CAA Journal of Automatica Sinica. doi: 10.1109/JAS.2025.125306
    [2]Yongyi Chen, Dan Zhang, Ruqiang Yan, Min Xie. Applications of Domain Generalization to Machine Fault Diagnosis: A Survey[J]. IEEE/CAA Journal of Automatica Sinica. doi: 10.1109/JAS.2025.125120
    [3]Yanzheng Zhu, Nuo Xu, Fen Wu, Xinkai Chen, Donghua Zhou. Fault Estimation for a Class of  Markov Jump Piecewise-Affine Systems: Current Feedback Based Iterative Learning Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 418-429. doi: 10.1109/JAS.2023.123990
    [4]Xiaoting Du, Lei Zou, Maiying Zhong. Set-Membership Filtering Approach to Dynamic Event-Triggered Fault Estimation for a Class of Nonlinear Time-Varying Complex Networks[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(3): 638-648. doi: 10.1109/JAS.2023.124119
    [5]Hongmin Liu, Qi Zhang, Yufan Hu, Hui Zeng, Bin Fan. Unsupervised Multi-Expert Learning Model for Underwater Image Enhancement[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(3): 708-722. doi: 10.1109/JAS.2023.123771
    [6]Oguzhan Dogru, Junyao Xie, Om Prakash, Ranjith Chiplunkar, Jansen Soesanto, Hongtian Chen, Kirubakaran Velswamy, Fadi Ibrahim, Biao Huang. Reinforcement Learning in Process Industries: Review and Perspective[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 283-300. doi: 10.1109/JAS.2024.124227
    [7]Meng Wang, Huaicheng Yan, Jianbin Qiu, Wenqiang Ji. Fuzzy-Model-Based Finite Frequency Fault Detection Filtering Design for Two-Dimensional Nonlinear Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(10): 2099-2110. doi: 10.1109/JAS.2024.124452
    [8]Xiang Li, Shupeng Yu, Yaguo Lei, Naipeng Li, Bin Yang. Dynamic Vision-Based Machinery Fault Diagnosis With Cross-Modality Feature Alignment[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(10): 2068-2081. doi: 10.1109/JAS.2024.124470
    [9]Jiaxin Ren, Jingcheng Wen, Zhibin Zhao, Ruqiang Yan, Xuefeng Chen, Asoke K. Nandi. Uncertainty-Aware Deep Learning: A Promising Tool for Trustworthy Fault Diagnosis[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1317-1330. doi: 10.1109/JAS.2024.124290
    [10]Bin Yang, Yaguo Lei, Xiang Li, Naipeng Li, Asoke K. Nandi. Label Recovery and Trajectory Designable Network for Transfer Fault Diagnosis of Machines With Incorrect Annotation[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 932-945. doi: 10.1109/JAS.2023.124083
    [11]Zefeng Zheng, Luyao Teng, Wei Zhang, Naiqi Wu, Shaohua Teng. Knowledge Transfer Learning via Dual Density Sampling for Resource-Limited Domain Adaptation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(12): 2269-2291. doi: 10.1109/JAS.2023.123342
    [12]Hamed Kazemi, Alireza Yazdizadeh. Optimal State Estimation and Fault Diagnosis for a Class of Nonlinear Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(2): 517-526. doi: 10.1109/JAS.2020.1003051
    [13]Alejandro White, Ali Karimoddini, Mohammad Karimadini. Resilient Fault Diagnosis Under Imperfect Observations–A Need for Industry 4.0 Era[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(5): 1279-1288. doi: 10.1109/JAS.2020.1003333
    [14]Panayiotis M. Papadopoulos, Vasso Reppa, Marios M. Polycarpou, Christos G. Panayiotou. Scalable Distributed Sensor Fault Diagnosis for Smart Buildings[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(3): 638-655. doi: 10.1109/JAS.2020.1003123
    [15]Jinchuan Qian, Li Jiang, Zhihuan Song. Locally Linear Back-propagation Based Contribution for Nonlinear Process Fault Diagnosis[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(3): 764-775. doi: 10.1109/JAS.2020.1003147
    [16]Mayank Agarwal, Santosh Biswas, Sukumar Nandi. Discrete Event System Framework for Fault Diagnosis with Measurement Inconsistency: Case Study of Rogue DHCP Attack[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(3): 789-806. doi: 10.1109/JAS.2017.7510379
    [17]Zhanjun Huang, Zhanshan Wang, Huaguang Zhang. Multilevel Feature Moving Average Ratio Method for Fault Diagnosis of the Microgrid Inverter Switch[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(2): 177-185. doi: 10.1109/JAS.2017.7510496
    [18]Hongfeng Tao, Dapeng Chen, Huizhong Yang. Iterative Learning Fault Diagnosis Algorithm for Non-uniform Sampling Hybrid System[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(3): 534-542. doi: 10.1109/JAS.2016.7510052
    [19]Xiaoli Li, Kang Wang, Dexin Liu. An Improved Result of Multiple Model Iterative Learning Control[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3): 315-322.
    [20]Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao, Xingguo Chen. Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3): 257-266.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(30)  / Tables(4)

    Article Metrics

    Article views (1388) PDF downloads(84) Cited by()

    Highlights

    • Introducing model transfer into fault diagnosis field, IMTL achieves fault diagnosis with very few labeled samples. And fault detection and classification are achieved at the same time.
    • Combining model transfer with incremental learning, the proposed method (IMTL) achieves the extension from identifying the existing N faults in the old process to N+M faults in the new process, where M is the number of new faults in the new process.
    • Breaking the limit of [17] that the new process can only have one more class of faults than the old process, IMTL can identify M faults more in the new process.
    • IMTL gives a solution to a series of problems caused by the increase of fault classes.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return