IEEE/CAA Journal of Automatica Sinica

Multiagent Reinforcement Learning:Rollout and Policy Iteration

Dimitri Bertsekas

2021, 8(2): 249-272. doi: 10.1109/JAS.2021.1003814

Abstract(6150) HTML (795) PDF(601)

Abstract:
We discuss the solution of complex multistage decision problems using methods that are based on the idea of policy iteration (PI), i.e., start from some base policy and generate an improved policy. Rollout is the simplest method of this type, where just one improved policy is generated. We can view PI as repeated application of rollout, where the rollout policy at each iteration serves as the base policy for the next iteration. In contrast with PI, rollout has a robustness property: it can be applied on-line and is suitable for on-line replanning. Moreover, rollout can use as base policy one of the policies produced by PI, thereby improving on that policy. This is the type of scheme underlying the prominently successful AlphaZero chess program.In this paper we focus on rollout and PI-like methods for problems where the control consists of multiple components each selected (conceptually) by a separate agent. This is the class of multiagent problems where the agents have a shared objective function, and a shared and perfect state information. Based on a problem reformulation that trades off control space complexity with state space complexity, we develop an approach, whereby at every stage, the agents sequentially (one-at-a-time) execute a local rollout algorithm that uses a base policy, together with some coordinating information from the other agents. The amount of total computation required at every stage grows linearly with the number of agents. By contrast, in the standard rollout algorithm, the amount of total computation grows exponentially with the number of agents. Despite the dramatic reduction in required computation, we show that our multiagent rollout algorithm has the fundamental cost improvement property of standard rollout: it guarantees an improved performance relative to the base policy. We also discuss autonomous multiagent rollout schemes that allow the agents to make decisions autonomously through the use of precomputed signaling information, which is sufficient to maintain the cost improvement property, without any on-line coordination of control selection between the agents.For discounted and other infinite horizon problems, we also consider exact and approximate PI algorithms involving a new type of one-agent-at-a-time policy improvement operation. For one of our PI algorithms, we prove convergence to an agent-by-agent optimal policy, thus establishing a connection with the theory of teams. For another PI algorithm, which is executed over a more complex state space, we prove convergence to an optimal policy. Approximate forms of these algorithms are also given, based on the use of policy and value neural networks. These PI algorithms, in both their exact and their approximate form are strictly off-line methods, but they can be used to provide a base policy for use in an on-line multiagent rollout scheme.

A Survey on Smart Agriculture: Development Modes, Technologies, and Security and Privacy Challenges

Xing Yang, Lei Shu, Jianing Chen, Mohamed Amine Ferrag, Jun Wu, Edmond Nurellari, Kai Huang

2021, 8(2): 273-302. doi: 10.1109/JAS.2020.1003536

Abstract(14142) HTML (1348) PDF(2952)

Abstract:
With the deep combination of both modern information technology and traditional agriculture, the era of agriculture 4.0, which takes the form of smart agriculture, has come. Smart agriculture provides solutions for agricultural intelligence and automation. However, information security issues cannot be ignored with the development of agriculture brought by modern information technology. In this paper, three typical development modes of smart agriculture (precision agriculture, facility agriculture, and order agriculture) are presented. Then, 7 key technologies and 11 key applications are derived from the above modes. Based on the above technologies and applications, 6 security and privacy countermeasures (authentication and access control, privacy-preserving, blockchain-based solutions for data integrity, cryptography and key management, physical countermeasures, and intrusion detection systems) are summarized and discussed. Moreover, the security challenges of smart agriculture are analyzed and organized into two aspects: 1) agricultural production, and 2) information technology. Most current research projects have not taken agricultural equipment as potential security threats. Therefore, we did some additional experiments based on solar insecticidal lamps Internet of Things, and the results indicate that agricultural equipment has an impact on agricultural security. Finally, more technologies (5 G communication, fog computing, Internet of Everything, renewable energy management system, software defined network, virtual reality, augmented reality, and cyber security datasets for smart agriculture) are described as the future research directions of smart agriculture.

A Survey of Evolutionary Algorithms for Multi-Objective Optimization Problems With Irregular Pareto Fronts

Yicun Hua, Qiqi Liu, Kuangrong Hao, Yaochu Jin

2021, 8(2): 303-318. doi: 10.1109/JAS.2021.1003817

Abstract(5110) HTML (786) PDF(344)

Abstract:
Evolutionary algorithms have been shown to be very successful in solving multi-objective optimization problems (MOPs). However, their performance often deteriorates when solving MOPs with irregular Pareto fronts. To remedy this issue, a large body of research has been performed in recent years and many new algorithms have been proposed. This paper provides a comprehensive survey of the research on MOPs with irregular Pareto fronts. We start with a brief introduction to the basic concepts, followed by a summary of the benchmark test problems with irregular problems, an analysis of the causes of the irregularity, and real-world optimization problems with irregular Pareto fronts. Then, a taxonomy of the existing methodologies for handling irregular problems is given and representative algorithms are reviewed with a discussion of their strengths and weaknesses. Finally, open challenges are pointed out and a few promising future directions are suggested.

Physical Safety and Cyber Security Analysis of Multi-Agent Systems: A Survey of Recent Advances

Dan Zhang, Gang Feng, Yang Shi, Dipti Srinivasan

2021, 8(2): 319-333. doi: 10.1109/JAS.2021.1003820

Abstract(5553) HTML (777) PDF(535)

Abstract:
Multi-agent systems (MASs) are typically composed of multiple smart entities with independent sensing, communication, computing, and decision-making capabilities. Nowadays, MASs have a wide range of applications in smart grids, smart manufacturing, sensor networks, and intelligent transportation systems. Control of the MASs are often coordinated through information interaction among agents, which is one of the most important factors affecting coordination and cooperation performance. However, unexpected physical faults and cyber attacks on a single agent may spread to other agents via information interaction very quickly, and thus could lead to severe degradation of the whole system performance and even destruction of MASs. This paper is concerned with the safety/security analysis and synthesis of MASs arising from physical faults and cyber attacks, and our goal is to present a comprehensive survey on recent results on fault estimation, detection, diagnosis and fault-tolerant control of MASs, and cyber attack detection and secure control of MASs subject to two typical cyber attacks. Finally, the paper concludes with some potential future research topics on the security issues of MASs.

Digital Twin for Human-Robot Interactive Welding and Welder Behavior Analysis

Qiyue Wang, Wenhua Jiao, Peng Wang, YuMing Zhang

2021, 8(2): 334-343. doi: 10.1109/JAS.2020.1003518

Abstract(9057) HTML (714) PDF(175)

Abstract:
This paper presents an innovative investigation on prototyping a digital twin (DT) as the platform for human-robot interactive welding and welder behavior analysis. This human-robot interaction (HRI) working style helps to enhance human users’ operational productivity and comfort; while data-driven welder behavior analysis benefits to further novice welder training. This HRI system includes three modules: 1) a human user who demonstrates the welding operations offsite with her/his operations recorded by the motion-tracked handles; 2) a robot that executes the demonstrated welding operations to complete the physical welding tasks onsite; 3) a DT system that is developed based on virtual reality (VR) as a digital replica of the physical human-robot interactive welding environment. The DT system bridges a human user and robot through a bi-directional information flow: a) transmitting demonstrated welding operations in VR to the robot in the physical environment; b) displaying the physical welding scenes to human users in VR. Compared to existing DT systems reported in the literatures, the developed one provides better capability in engaging human users in interacting with welding scenes, through an augmented VR. To verify the effectiveness, six welders, skilled with certain manual welding training and unskilled without any training, tested the system by completing the same welding job; three skilled welders produce satisfied welded workpieces, while the other three unskilled do not. A data-driven approach as a combination of fast Fourier transform (FFT), principal component analysis (PCA), and support vector machine (SVM) is developed to analyze their behaviors. Given an operation sequence, i.e., motion speed sequence of the welding torch, frequency features are firstly extracted by FFT and then reduced in dimension through PCA, which are finally routed into SVM for classification. The trained model demonstrates a 94.44% classification accuracy in the testing dataset. The successful pattern recognition in skilled welder operations should benefit to accelerate novice welder training.

Visual Object Tracking and Servoing Control of a Nano-Scale Quadrotor: System, Algorithms, and Experiments

Yuzhen Liu, Ziyang Meng, Yao Zou, Ming Cao

2021, 8(2): 344-360. doi: 10.1109/JAS.2020.1003530

Abstract(2377) HTML (800) PDF(125)

Abstract:
There are two main trends in the development of unmanned aerial vehicle (UAV) technologies: miniaturization and intellectualization, in which realizing object tracking capabilities for a nano-scale UAV is one of the most challenging problems. In this paper, we present a visual object tracking and servoing control system utilizing a tailor-made 38 g nano-scale quadrotor. A lightweight visual module is integrated to enable object tracking capabilities, and a micro positioning deck is mounted to provide accurate pose estimation. In order to be robust against object appearance variations, a novel object tracking algorithm, denoted by RMCTer, is proposed, which integrates a powerful short-term tracking module and an efficient long-term processing module. In particular, the long-term processing module can provide additional object information and modify the short-term tracking model in a timely manner. Furthermore, a position-based visual servoing control method is proposed for the quadrotor, where an adaptive tracking controller is designed by leveraging backstepping and adaptive techniques. Stable and accurate object tracking is achieved even under disturbances. Experimental results are presented to demonstrate the high accuracy and stability of the whole tracking system.

Dependent Randomization in Parallel Binary Decision Fusion

Weiqiang Dong, Moshe Kam

2021, 8(2): 361-376. doi: 10.1109/JAS.2021.1003823

Abstract(2359) HTML (752) PDF(57)

Abstract:
We consider a parallel decentralized detection system employing a bank of local detectors (LDs) to access a commonly-observed phenomenon. The system makes a binary decision about the phenomenon, accepting one of two hypotheses ($H_0$ (“absent”) or $H_1$ (“present”)). The $k{{\rm{th}}}$ LD uses a local decision rule to compress its local observations $y_k$ into a binary local decision $u_k$; $u_k=0$ if the $k{{\rm{th}}}$ LD accepts $H_0$ and $u_k=1$ if it accepts $H_1$. The $k{{\rm{th}}}$ LD sends its decision $u_k$ over a noiseless dedicated channel to a Data Fusion Center (DFC). The DFC combines the local decisions it receives from $n$ LDs ($u_1, u_2,\ldots, u_n$) into a single binary global decision $u_0$ ($u_0=0$ for accepting $H_0$ or $u_0=1$ for accepting $H_1$). If each LD uses a single deterministic local decision rule (calculating $u_k$ from the local observations $y_k$) and the DFC uses a single deterministic global decision rule (calculating $u_0$ from the $n$ local decisions), the team receiver operating characteristic (ROC) curve is in general non-concave. The system’s performance under a Neyman-Pearson criterion may then be suboptimal in the sense that a mixed strategy may yield a higher probability of detection when the probability of false alarm is constrained not to exceed a certain value, $\alpha>0$. Specifically, a “dependent randomization” detection scheme can be applied in certain circumstances to improve the system’s performance by making the ROC curve concave. This scheme requires a coordinated and synchronized action between the DFC and the LDs. In this study, we specify when dependent randomization is needed, and discuss the proper response of the detection system if synchronization between the LDs and the DFC is temporarily lost.

Set-Membership Filtering Subject to Impulsive Measurement Outliers: A Recursive Algorithm

Lei Zou, Zidong Wang, Hang Geng, Xiaohui Liu

2021, 8(2): 377-388. doi: 10.1109/JAS.2021.1003826

Abstract(4283) HTML (653) PDF(106)

Abstract:
This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers. A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length (i.e., the minimum duration between two adjacent impulsive signals) and minimum norm (i.e., the minimum of the norms of all impulsive signals) are larger than certain thresholds that are adjustable according to engineering practice. In order to guarantee satisfactory filtering performance, a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state. First, a novel outlier detection strategy is developed, based on a dedicatedly constructed input-output model, to examine whether the received measurement is corrupted by an outlier. Then, through the outcome of the outlier detection, the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations. Furthermore, the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated. Finally, a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.

Automated Silicon-Substrate Ultra-Microtome for Automating the Collection of Brain Sections in Array Tomography

Long Cheng, Weizhou Liu, Chao Zhou, Yongxiang Zou, Zeng-Guang Hou

2021, 8(2): 389-401. doi: 10.1109/JAS.2021.1003829

Abstract(4423) HTML (649) PDF(74)

Abstract:
Understanding the structure and working principle of brain neural networks requires three-dimensional reconstruction of brain tissue samples using array tomography method. In order to improve the reconstruction performance, the sequence of brain sections should be collected with silicon wafers for subsequent electron microscopic imaging. However, the current collection of brain sections based on silicon substrate involve mainly manual collection, which requires the involvement of automation techniques to increase collection efficiency. This paper presents the design of an automatic collection device for brain sections. First, a novel mechanism based on circular silicon substrates is proposed for collection of brain sections; second, an automatic collection system based on microscopic object detection and feedback control strategy is proposed. Experimental results verify the function of the proposed collection device. Three objects (brain section, left baffle, right baffle) can be detected from microscopic images by the proposed detection method. Collection efficiency can be further improved with position feedback of brain sections well. It has been experimentally verified that the proposed device can well fulfill the task of automatic collection of brain sections. With the help of the proposed automatic collection device, human operators can be partially liberated from the tedious manual collection process and collection efficiency can be improved.

Efficient and High-quality Recommendations via Momentum-incorporated Parallel Stochastic Gradient Descent-Based Learning

Xin Luo, Wen Qin, Ani Dong, Khaled Sedraoui, MengChu Zhou

2021, 8(2): 402-411. doi: 10.1109/JAS.2020.1003396

Abstract(1216) HTML (654) PDF(58)

Abstract:
A recommender system (RS) relying on latent factor analysis usually adopts stochastic gradient descent (SGD) as its learning algorithm. However, owing to its serial mechanism, an SGD algorithm suffers from low efficiency and scalability when handling large-scale industrial problems. Aiming at addressing this issue, this study proposes a momentum-incorporated parallel stochastic gradient descent (MPSGD) algorithm, whose main idea is two-fold: a) implementing parallelization via a novel data-splitting strategy, and b) accelerating convergence rate by integrating momentum effects into its training process. With it, an MPSGD-based latent factor (MLF) model is achieved, which is capable of performing efficient and high-quality recommendations. Experimental results on four high-dimensional and sparse matrices generated by industrial RS indicate that owing to an MPSGD algorithm, an MLF model outperforms the existing state-of-the-art ones in both computational efficiency and scalability.

A Risk-Averse Remaining Useful Life Estimation for Predictive Maintenance

Chuang Chen, Ningyun Lu, Bin Jiang, Cunsong Wang

2021, 8(2): 412-422. doi: 10.1109/JAS.2021.1003835

Abstract(2598) HTML (617) PDF(99)

Abstract:
Remaining useful life (RUL) prediction is an advanced technique for system maintenance scheduling. Most of existing RUL prediction methods are only interested in the precision of RUL estimation; the adverse impact of over-estimated RUL on maintenance scheduling is not of concern. In this work, an RUL estimation method with risk-averse adaptation is developed which can reduce the over-estimation rate while maintaining a reasonable under-estimation level. The proposed method includes a module of degradation feature selection to obtain crucial features which reflect system degradation trends. Then, the latent structure between the degradation features and the RUL labels is modeled by a support vector regression (SVR) model and a long short-term memory (LSTM) network, respectively. To enhance the prediction robustness and increase its marginal utility, the SVR model and the LSTM model are integrated to generate a hybrid model via three connection parameters. By designing a cost function with penalty mechanism, the three parameters are determined using a modified grey wolf optimization algorithm. In addition, a cost metric is proposed to measure the benefit of such a risk-averse predictive maintenance method. Verification is done using an aero-engine data set from NASA. The results show the feasibility and effectiveness of the proposed RUL estimation method and the predictive maintenance strategy.

Consensus Control of Leader-Following Multi-Agent Systems in Directed Topology With Heterogeneous Disturbances

Qinglai Wei, Xin Wang, Xiangnan Zhong, Naiqi Wu

2021, 8(2): 423-431. doi: 10.1109/JAS.2021.1003838

Abstract(1412) HTML (650) PDF(148)

Abstract:
This paper investigates the consensus problem for linear multi-agent systems with the heterogeneous disturbances generated by the Brown motion. Its main contribution is that a control scheme is designed to achieve the dynamic consensus for the multi-agent systems in directed topology interfered by stochastic noise. In traditional ways, the coupling weights depending on the communication structure are static. A new distributed controller is designed based on Riccati inequalities, while updating the coupling weights associated with the gain matrix by state errors between adjacent agents. By introducing time-varying coupling weights into this novel control law, the state errors between leader and followers asymptotically converge to the minimum value utilizing the local interaction. Through the Lyapunov directed method and Itô formula, the stability of the closed-loop system with the proposed control law is analyzed. Two simulation results conducted by the new and traditional schemes are presented to demonstrate the effectiveness and advantage of the developed control method.

Coherent H^∞ Control for Linear Quantum Systems With Uncertainties in the Interaction Hamiltonian

Chengdi Xiang, Shan Ma, Sen Kuang, Daoyi Dong

2021, 8(2): 432-440. doi: 10.1109/JAS.2020.1003429

Abstract(1458) HTML (665) PDF(55)

Abstract:
This work conducts robust H^∞ analysis for a class of quantum systems subject to perturbations in the interaction Hamiltonian. A necessary and sufficient condition for the robustly strict bounded real property of this type of uncertain quantum system is proposed. This paper focuses on the study of coherent robust H^∞ controller design for quantum systems with uncertainties in the interaction Hamiltonian. The desired controller is connected with the uncertain quantum system through direct and indirect couplings. A necessary and sufficient condition is provided to build a connection between the robust H^∞ control problem and the scaled H^∞ control problem. A numerical procedure is provided to obtain coefficients of a coherent controller. An example is presented to illustrate the controller design method.

Output Constrained Adaptive Controller Design for Nonlinear Saturation Systems

Yongliang Yang, Zhijie Liu, Qing Li, Donald C. Wunsch

2021, 8(2): 441-454. doi: 10.1109/JAS.2020.1003524

Abstract(1343) HTML (646) PDF(96)

Abstract:
This paper considers the adaptive neuro-fuzzy control scheme to solve the output tracking problem for a class of strict-feedback nonlinear systems. Both asymmetric output constraints and input saturation are considered. An asymmetric barrier Lyapunov function with time-varying prescribed performance is presented to tackle the output-tracking error constraints. A high-gain observer is employed to relax the requirement of the Lipschitz continuity about the nonlinear dynamics. To avoid the “explosion of complexity”, the dynamic surface control (DSC) technique is employed to filter the virtual control signal of each subsystem. To deal with the actuator saturation, an additional auxiliary dynamical system is designed. It is theoretically investigated that the parameter estimation and output tracking error are semi-global uniformly ultimately bounded. Two simulation examples are conducted to verify the presented adaptive fuzzy controller design.

Using Event-Based Method to Estimate Cybersecurity Equilibrium

Zhaofeng Liu, Ren Zheng, Wenlian Lu, Shouhuai Xu

2021, 8(2): 455-467. doi: 10.1109/JAS.2020.1003527

Abstract(1399) HTML (604) PDF(74)

Abstract:
Estimating the global state of a networked system is an important problem in many application domains. The classical approach to tackling this problem is the periodic (observation) method, which is inefficient because it often observes states at a very high frequency. This inefficiency has motivated the idea of event-based method, which leverages the evolution dynamics in question and makes observations only when some rules are triggered (i.e., only when certain conditions hold). This paper initiates the investigation of using the event-based method to estimate the equilibrium in the new application domain of cybersecurity, where equilibrium is an important metric that has no closed-form solutions. More specifically, the paper presents an event-based method for estimating cybersecurity equilibrium in the preventive and reactive cyber defense dynamics, which has been proven globally convergent. The presented study proves that the estimated equilibrium from our trigger rule i) indeed converges to the equilibrium of the dynamics and ii) is Zeno-free, which assures the usefulness of the event-based method. Numerical examples show that the event-based method can reduce 98% of the observation cost incurred by the periodic method. In order to use the event-based method in practice, this paper investigates how to bridge the gap between i) the continuous state in the dynamics model, which is dubbed probability-state because it measures the probability that a node is in the secure or compromised state, and ii) the discrete state that is often encountered in practice, dubbed sample-state because it is sampled from some nodes. This bridge may be of independent value because probability-state models have been widely used to approximate exponentially-many discrete state systems.

Boundary Gap Based Reactive Navigation in Unknown Environments

Zhao Gao, Jiahu Qin, Shuai Wang, Yaonan Wang

2021, 8(2): 468-477. doi: 10.1109/JAS.2021.1003841

Abstract(1335) HTML (613) PDF(44)

Abstract:
Due to the requirements for mobile robots to search or rescue in unknown environments, reactive navigation which plays an essential role in these applications has attracted increasing interest. However, most existing reactive methods are vulnerable to local minima in the absence of prior knowledge about the environment. This paper aims to address the local minimum problem by employing the proposed boundary gap (BG) based reactive navigation method. Specifically, the narrowest gap extraction algorithm (NGEA) is proposed to eliminate the improper gaps. Meanwhile, we present a new concept called boundary gap which enables the robot to follow the obstacle boundary and then get rid of local minima. Moreover, in order to enhance the smoothness of generated trajectories, we take the robot dynamics into consideration by using the modified dynamic window approach (DWA). Simulation and experimental results show the superiority of our method in avoiding local minima and improving the smoothness.

Joint Algorithm of Message Fragmentation and No-Wait Scheduling for Time-Sensitive Networks

Xi Jin, Changqing Xia, Nan Guan, Peng Zeng

2021, 8(2): 478-490. doi: 10.1109/JAS.2021.1003844

Abstract(1414) HTML (684) PDF(56)

Abstract:
Time-sensitive networks (TSNs) support not only traditional best-effort communications but also deterministic communications, which send each packet at a deterministic time so that the data transmissions of networked control systems can be precisely scheduled to guarantee hard real-time constraints. No-wait scheduling is suitable for such TSNs and generates the schedules of deterministic communications with the minimal network resources so that all of the remaining resources can be used to improve the throughput of best-effort communications. However, due to inappropriate message fragmentation, the real-time performance of no-wait scheduling algorithms is reduced. Therefore, in this paper, joint algorithms of message fragmentation and no-wait scheduling are proposed. First, a specification for the joint problem based on optimization modulo theories is proposed so that off-the-shelf solvers can be used to find optimal solutions. Second, to improve the scalability of our algorithm, the worst-case delay of messages is analyzed, and then, based on the analysis, a heuristic algorithm is proposed to construct low-delay schedules. Finally, we conduct extensive test cases to evaluate our proposed algorithms. The evaluation results indicate that, compared to existing algorithms, the proposed joint algorithm improves schedulability by up to 50%.

Supplementary Material for “A Survey of Evolutionary Algorithms for Multi- Objective Optimization Problems With Irregular Pareto Fronts”

Yicun Hua, Qiqi Liu, Kuangrong Hao, Yaochu Jin

2021, 8(2): 1-4. doi: 10.1109/JAS.2021.1003817

Abstract(106) HTML (24) PDF(14)

Abstract:

Vol. 8, No. 2, 2021

Links
More

E-mail Alert

Vol. 8, No. 2, 2021

Links More

E-mail Alert

Links
More