Processing math: 100%
A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Yuzhen Liu, Ziyang Meng, Yao Zou and Ming Cao, "Visual Object Tracking and Servoing Control of a Nano-Scale Quadrotor: System, Algorithms, and Experiments," IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 344-360, Feb. 2021. doi: 10.1109/JAS.2020.1003530
Citation: Yuzhen Liu, Ziyang Meng, Yao Zou and Ming Cao, "Visual Object Tracking and Servoing Control of a Nano-Scale Quadrotor: System, Algorithms, and Experiments," IEEE/CAA J. Autom. Sinica, vol. 8, no. 2, pp. 344-360, Feb. 2021. doi: 10.1109/JAS.2020.1003530

Visual Object Tracking and Servoing Control of a Nano-Scale Quadrotor: System, Algorithms, and Experiments

doi: 10.1109/JAS.2020.1003530
Funds:  This work was supported in part by the Institute for Guo Qiang of Tsinghua University (2019GQG1023), in part by Graduate Education and Teaching Reform Project of Tsinghua University (202007J007), in part by National Natural Science Foundation of China (U19B2029, 62073028, 61803222), and in part by the Independent Research Program of Tsinghua University (2018Z05JDX002)
More Information
  • There are two main trends in the development of unmanned aerial vehicle (UAV) technologies: miniaturization and intellectualization, in which realizing object tracking capabilities for a nano-scale UAV is one of the most challenging problems. In this paper, we present a visual object tracking and servoing control system utilizing a tailor-made 38 g nano-scale quadrotor. A lightweight visual module is integrated to enable object tracking capabilities, and a micro positioning deck is mounted to provide accurate pose estimation. In order to be robust against object appearance variations, a novel object tracking algorithm, denoted by RMCTer, is proposed, which integrates a powerful short-term tracking module and an efficient long-term processing module. In particular, the long-term processing module can provide additional object information and modify the short-term tracking model in a timely manner. Furthermore, a position-based visual servoing control method is proposed for the quadrotor, where an adaptive tracking controller is designed by leveraging backstepping and adaptive techniques. Stable and accurate object tracking is achieved even under disturbances. Experimental results are presented to demonstrate the high accuracy and stability of the whole tracking system.

     

  • IN recent years, the study of UAVs has attracted increasing attention from both industrial and academic communities. Miniaturization is one of the major trends of the development of UAV technologies. In 2019, DJI-innovations launched a small quadrotor named the “Mavic mini”, which is the smallest of their products. However, it still weighs $ 249 $ g with a volume of $ 245 $ mm $\times\,290$ mm $\times\,55$ mm. Most of commercial UAVs, even micro UAVs (with weights between $ 0.4 $–2 lbs or $ 180 $$ 907 $ g [1]) can easily cause damage to their surroundings. In contrast, nano-scale UAVs [2]–[6] (with weights below $ 0.4 $ lb or $ 180 $ g) have advantages including safe and silent operation, and are suitable for flying in a tightly constrained environment and in batches [7]. In 2005, the defense advanced research projects agency (DARPA) announced the nano air vehicle (NAV) program, whose goal was to design a nano unmanned air vehicle (size under $ 75 $ mm, total mass under 10 g) with the capabilities including continuous hovering, forward flying, and being able to perform indoor and outdoor missions.

    In addition to miniaturization, intellectualization is another major trend in the development of UAV technologies. In particular, in order to excute various tasks in many practical applications (e.g., surveillance, augmented reality, environmental monitoring, behavior modeling, rescue, and search), the capability of autonomous object tracking is indispensable [5]–[14]. For example, the authors of [13] apply an open-TLD (tracking-learning-detection) [15] tracker on a UAV platform, i.e., AR Drone 2.0, to accomplish the task of tracking non-artificial targets including people or moving cars. However, the TLD-based approach cannot achieve comparable performance to state-of-the-art tracking algorithms. Li et al. [14] realize a vision-based pedestrian tracking system on the UAV platform DJI Matrice100 by exploiting the well-known correlation filter-based tracker. In particular, a frequency domain correlation filter acts as the base tracker, and its online training model is further transformed to the spatial domain to obtain the re-detection model. Since the detection classifier depends on the accuracy of the base tracking model and the generic object proposal [16], this method still cannot efficiently deal with situations where there are variations in object appearance. The above research works focus on relatively large UAV platforms. In contrast, few research works involving visual object tracking use nano-scale UAV platforms due to their very limited volume size, payload ability and power consumption. Firstly, a majority of existing onboard localization and navigation sensors cannot be mounted on nano-scale UAVs, which include a GPS module, laser range finders and normal cameras. Moreover, most existing state-of-the-art visual object tracking solutions require a significant amount of computation resources, and thus, they are not available to be directly employed on nano-scale UAVs. Last but not least, the flight of the nano-scale UAVs is more susceptible to the external disturbances due to their light weight and small volume. Srisamosorn et al. [4] use a nano-scale quadrotor to realize person tracking with the use of multiple environmental cameras. In [5], Briod propose an ego-motion estimation algorithm for use with a $ 46 $ g nano-scale UAV by using an optical flow sensor. However, the position error drifts to $ 50 $ cm within two minutes. In [6], Palossi et al. use a nano-scale quadrotor to track a red target, but the average tracking error is shown to be larger than $ 30 $ cm. In addition, since the adopted object tracking method depends on color features, it is easily disturbed by the external environment. To sum things up, due to volume and weight limitations, as well as difficulties of designing an effective visual tracking algorithm and robust visual servoing controller, it is still a challenging task to obtain stable and accurate visual object tracking using a nano-scale UAV less than $ 40 $ g. We now provide a detailed survey on the study of visual object tracking algorithms and visual servoing control algorithms.

    Typical visual object tracking approaches have been recently proposed in [11], [14], [15], and [17]–[24]. In particular, early template-based methods take advantage of selectively updating template and multiple key frames to find an optimal patch to describe object appearance [14]. Moreover, discriminative approaches [11], [17], [18] have been proposed, and both the foreground and background information is considered from sequential images. In [11], two linear support vector machine (SVM) classifiers are trained with simple local binary pattern (LBP) features to detect and track a drogue object during aerial refueling. In addition, the correlation filter-based tracker [19]–[22] emerges as one of the most successful and popular tracking frameworks due to its promising performance and computational efficiency. In particular, Bolme et al. [19] leverage the correlation filter-based method to achieve object tracking, in which the minimum output sum of squared error (MOSSE) filter is proposed with an operation speed of 669 frames per second (FPS). Henriques et al. [20] propose a popular correlation filter-based object tracking algorithm, denoted by kernelized correlation filter (KCF), where the kernel technique is used to improve performance by allowing classification on a rich, high-dimensional feature space. The fact that cyclic matrices can be diagonalized in the Fourier space is leveraged to significantly speed up the matrix operation. However, because most of the aforementioned tracking methods employ relatively risky update schemes and are based on the spatiotemporal consistency of visual cues, they can only be used to handle short-term tracking. In other words, the tracking errors will inevitably be accumulated during long-term tracking, and the lack of robustness against the variations of object appearance (e.g., changes in geometry/photometry, different camera viewpoints, partial occlusion, or object disappearing) usually leads to tracking failure in challenging scenarios. On the other hand, some state-of-the-art object detection methods based on deep-learning (e.g., YOLOv3 [23], MASK R-CNN [24]) have been proposed, and are shown to be robust with excellent detection accuracy. However, the required computations are significant and may lead to the poor real-time performance on CPU or average performance GPU platforms.

    In parallel, visual servoing control methods [25]–[29] can be classified into two major categories: image-based visual servoing (IBVS) and position-based visual servoing (PBVS) depending on whether the image measurements from the camera are directly implemented in the control loop. In the IBVS scheme, the controller is designed using two-dimensional pixel coordinates from the image plane. In [25], Zheng et al. propose an IBVS controller for a quadrotor. The trajectories of the image moment features are first designed, following the definition in the virtual image plane. Then, a feature trajectory tracking controller is proposed to track the designed trajectories. In contrast, the PBVS controller requires the reconstruction of the three-dimensional Cartesian coordinates of the observed object from visual data. The main advantage of PBVS is that the control law is precisely formulated in the Cartesian coordinate space such that the control problem (computation of the feedback signal) is separated from the pose estimation problem. In general, PBVS control can be resolved into two sub-tasks: relative position estimation (between the object and the UAV) and trajectory tracking controller design. In particular, most of existing methods need the prior model information of object or depth data measured by sensors (e.g., RGB-D sensor or stereo sensor). In [28], Popova and Liu develop a PBVS control method based on cascaded proportional-integral-derivative (PID) controllers for a quadrotor to track a ground moving object. In this work, the relative position between the object and the quadrotor is estimated based on the assumption that the object is always on flat ground during the whole tracking process. However, it is not general enough for most practical tracking cases. Also, extensive trajectory tracking control techniques have been proposed [30]–[37]. Some traditional linear control methods [30], [31] are employed to stabilize the quadrotor in a small neighborhood around the equilibrium by linearizing the dynamic model of the quadrotor. The authors of [12], [14], and [32] adopt the cascade PID controllers consisting of the inner-loop attitude subsystem and the outer-loop position/velocity subsystems. However, in real-world applications, the control system usually suffers from input deadzones, parametric uncertainties and external disturbances. Hence, nonlinear adaptive control methods have been widely proposed [33]–[37]. In [33], the authors proposed a neural network-based adaptive control method for a ship-mounted crane to achieve boom/rope positioning while simultaneously ensuring payload swing suppression in the presence of external disturbances and input dead-zone effects. In [35], an asymptotic tracking controller is presented for a quadrotor using the robust integral of the signum of the error (RISE) method and the immersion and invariance (I&I)-based adaptive control method. In [36], an adaptive backstepping control algorithm is proposed to drive a helicopter to achieve trajectory tracking. However, only numerical simulation results are presented in this work, and the control strategy is not verified in real-world flight experiments. In addition, although some research works have been conducted on trajectory tracking control or visual servoing control for quadrotors, most of them have been performed on the platforms which typically weigh more than $ 500 $ g and few studies consider nano-scale quadrotors less than $ 40 $ g.

    Mainly motivated by the above references, in this paper, we develop a $ 38 $ g nano-scale UAV including both its hardware and software, to realize a monocular vision-based adaptive object tracking system. This system has been preliminarily verified in our previous work [32]. The present work improves the original version significantly. First, we have developed a new positioning deck to provide more stable and accurate pose estimation for a nano-scale UAV. Second, we present a novel object tracking algorithm, denoted by a robust multi-collaborative tracker (RMCTer), to realize more robust and accurate object tracking. This is in contrast with the hand tracking method proposed in [32], where the object’s color and shape features are critical and therefore, performance is sensitive to environmental variations. Third, we present a new estimation method to obtain the relative position between the object and the UAV. In the proposed system, it is not essential to know the prior model information of the tracking object. Fourth, we propose an adaptive tracking controller, where the uncertain model parameters of the quadrotor and the existence of external disturbances are both considered.

    To the best of our knowledge, few works have reported stable and robust object tracking on a UAV weighing less than $ 40 $ g in the presence of disturbances. The contributions of this paper are threefold. First, we propose a complete visual object tracking and servoing control system using a tailor-made $ 38 $ g nano-scale quadrotor platform. This tracking system is composed of a versatile and robust visual object tracking module, and an efficient PBVS control module. In addition, the control module consists of a two-stage relative position estimator and a nonlinear adaptive tracking controller. Due to the limited payload, a lightweight monocular visual module is integrated to equip the quadrotor with the capability of object tracking. Additionally, we present a micro positioning deck to provide stable and accurate pose estimation for the quadrotor. The complete prototype is shown in Fig. 1, and the overview of its hardware configuration is illustrated in Fig. 2. Second, we propose a novel object tracking algorithm, i.e., RMCTer, where a two-stage short-term tracking module and an efficient long-term processing module are tightly integrated to collaboratively process the input frames. Compared with the tracking algorithms proposed in [19]–[22], the proposed tracker is more applicable in the presence of variations in object appearance and can effectively compensate for visual tracking errors thanks to adequate model modification provided by the long-term processing module. Third, we propose an adaptive PBVS control algorithm by leveraging the backstepping and adaption techniques. Compared with [14], [25], and [28], the proposed controller is robust against the uncertain model parameters and the existence of external disturbances, and their exact model information is not needed in the design of the controller.

    Figure  1.  The developed nano-scale UAV platform. For the convenience of presentation, the body frame {B} is assumed to be coincident with the camera frame {C}. If it is not the case, these two frames can be related to a constant transformation matrix according to their relative pose.
    Figure  2.  Hardware configuration.

    The rest of this paper is organized as follows. In Section II, we give an overview of the overall system and present the hardware structure and system flow chart. Sections III and IV, respectively show the key components of the proposed tracking system: visual object tracking module and position-based visual servoing control module. In Section V, implementation details and experimental results are presented. In addition, some main notations used in this paper are given in Table I.

    Table  I.  Nomenclature
    NotationsDefinitions
    {I}, {B}, {C}Inertial frame, body frame, and camera frame, respectively; {B} is assumed to be coincident with {C}
    $(u, v)$The coordinates of the point in the image plane
    $\phi,\theta,\psi$Euler angles, i.e., the roll, pitch, and yaw
    $^A{\bf{x}}$The vector ${\bf{x}}$ is expressed in the frame $\{A\}$
    $^A_B{\bf{R}}$Rotation matrix from the frame {B} to the frame {A}
    $^A{\bf{p}}_B$The position of the origin of the frame $\{{B}\}$ expressed in the frame $\{{A}\}$
    ${\bf{S}}({{{\omega}}})$Skew-symmetric matrix of vector ${{\omega}}=[\omega_x,\omega_y,\omega_z]^T$, i.e., $ \left[ {0ωyωzωy0ωxωzωx0} \right]$
    ${\bf{I}}_n$$n \times n$ identity matrix
     | Show Table
    DownLoad: CSV

    In this section, we first introduce the hardware structure of the developed nano-scale UAV platform. Then, the flow chart of the proposed tracking system is presented.

    1) Nano-Scale UAV Platform: Our UAV platform is the Crazyflie 2.0 (Fig. 1), a nano-scale quadrotor, which weighs only 27 g with a size of 92 mm $ {\rm{ \times }}\;92 $ mm $ {\rm{ \times }}\;29 $ mm (length, width, and height). Its maximum flying time is 7 minutes and maximum payload is 15 g. The on-board micro controller running the main flight control algorithm is implemented with STM32F405, and the radio communication module is implemented with NRF51822. Moreover, the core on-board sensors include a pressure sensor (LPS25H) and an inertial measurement unit (IMU, MPU9250) composed of a three-axis gyro, a three-axis accelerometer and a three-axis magnetometer. The pressure sensor and IMU sensor are disabled in our implementations due to poor measurement accuracies. Additionally, there is a flow deck consisting of a ToF (time-of-flight) sensor (VL53L0X) and an optical flow sensor (PMW3901) which are respectively responsible in measuring the vertical distance and horizontal velocity relative to the ground. However, the upper measurement limit of VL53L0X is only 2 m, therefore, this flow deck is not used in our developed platform.

    2) Positioning Deck: In order to provide the UAV platform with accurate pose estimation, we integrate a new positioning deck (Fig. 2) (with a weight of $ 4.9 $ g and size of $ 28 $ mm $ \times \;28 $ mm $ {\rm{ \times }} \;2 $ mm) with reference to the schematics of the aforementioned flow deck. This positioning deck is composed of an MCU (micro-controller unit, STM32F130), a better quality IMU (LSM6DSOX), a new version of the ToF sensor (VL53L1X), and a new optical flow sensor (PMW3901). Compared with MPU9250, LSM6DSOX has better zero bias stability and provides more accurate acceleration and angular velocity measurements. Moreover, the upper measurement limit of VL53L1X is $ 4\;{\rm{m}} $, twice that of VL53L0X. In addition, the extended Kalman filter algorithm is implemented on the MCU STM32F130 to fuse measurements from the IMU, optical flow sensor and ToF sensor to estimate the pose of UAV in real time [38], [39].

    3) Visual Module: In order to enable the considered nano-scale UAV with the capability of object tracking, a lightweight visual module is integrated, and correspondingly, the ground station is equipped with an image acquisition card to receive image frames in real time (30 Hz). This visual module weighs only 4.7 g and includes a camera and an analogue video transmitter as shown in Fig. 2. In particular, the camera consists of a complementary metal oxide semiconductor (CMOS) image sensor with an image resolution of $ 1500 $ transmission line pulsing (TVL) and a wide-angle lens with field-of-view of $ 60^{\circ}\times95^{\circ}\times125^{\circ} $ (vertical, horizontal, and diagonal). In order to ensure low transmission latency and sufficient communication distance, we chose the Kingkong Q25 5.8 GHz video transmitter with an image translation delay of $ 35 $$ 45 $ ms and maximum transmission distance of 150–200 m (without occlusion).

    The flow chart of the proposed tracking system is shown in Fig. 3. First, the nano-scale quadrotor takes off from the ground. The visual object tracking module processes the received images to estimate the location of the object in the image plane. Then, the relative position between the object and the quadrotor in the inertial frame are calculated based on the visual tracking results. Finally, the adaptive tracking controller calculates the corresponding control input to track the object. In addition, if the visual object tracking module fails to track the object, the yaw angle of the quadrotor will be adjusted in time to search again the missing target. If the object fails to be found after a preset period of time, the quadrotor will land automatically for safety reasons.

    Figure  3.  Flow chart of the proposed object tracking system.

    As shown in Fig. 3, the proposed tracking system mainly consists of two key components: the visual object tracking module (Section III) and position-based visual servoing control module (Section IV), in which the considered PBVS control is resolved into two sub-tasks: relative position estimation (Section IV-A) and adaptive tracking controller design (Section IV-B).

    In this section, we will detail the proposed visual object tracking module, where two operation modes are included, i.e., tracking an artificial object (the concentric circles), and tracking a non-artificial object (a pedestrian).

    In the object tracking problem, the considered objects are usually divided into artificial and non-artificial objects. Generally speaking, the case of artificial objects is relatively simple with obvious features (e.g., special geometry shape or color), such that one can directly leverage frame by frame detection method to achieve tracking with these simple features. In comparison, the case of non-artificial objects, e.g., pedestrians, vehicles, and UAVs, is relatively complex since the selection and extraction of features are difficult. Therefore, for non-artificial objects, it is not possible to directly apply simple feature-based detection methods to realize stable and accurate tracking.

    In this section, we first introduce an artificial object tracking algorithm based on the frame by frame detection method with concentric circles as an example. Then, we propose RMCTer, a dual-component object tracking algorithm, where the pedestrian is regarded as a tracking example of a non-artificial object. It is important to point out that the proposed RMCTer algorithm is general and can be easily applied to track other artificial or non-artificial objects.

    The details are described in Algorithm 1 below, where $ P $ represents a mask of the same size as the input image, $ F $ represents a function, $ C $ represents all the closed contours in the image, $C[i]\in C$ and $C[j]\in C$ represent the $ i $-th and $ j $-th closed contours which are stored in point sets, respectively, $ B $ and $ B_s $ are arrays which contain the parameters of the fitted ellipses including the coordinates of the center point, the height, the width, the angle and the fitting error. Subscripts are used to explain the meaning of variables or functions. For example, $ P_{\rm{raw}} $ is the original image, $ F_{\rm{pre}} $ represents the image pre-processing function, i.e., it denoises and drops corrupted frames, $ F_{\rm{binary}} $ is the image binarization function, $ F_{\rm{contour}} $ represents the function of fitting closed contours, and $ F_{\rm{ellipse}} $ is the function of fitting ellipse.

    During flight, the concentric circles may distort into two concentric ellipses in the image, while the height (or width) ratio of the two ellipses ($ B.height/B_s.height $) stays invariant. In view of this fact, the target is uniquely identified. Furthermore, the coordinates of the center point ($ (u_c,v_c) $) and the area of the outer circle ($ Area $) can also be calculated. Different from [40], the inclusion relationship of the concentric circles is leveraged in the detection. In particular, we first find a closed contour, and then only test its parent or sub profiles instead of all other closed contours in the image. In this way, the computation is simplified and the processing speed is much improved.

    Algorithm 1 $ {\rm{Tracking}} $ $ {\rm{the}} $ $ {\rm{concentric}} $ $ {\rm{circles}} $

    Input: $P_{\rm{{raw}}}$

    Output: $ B $, $ B_s $, $ (u_c,v_c) $, $ Area $

    1: $P_{\rm{{pre}}}=F_{\rm{{pre}}}(P_{\rm{{raw}})}$

    2: $P_{\rm{binary}}=F_{\rm{binary}}(P_{\rm{pre}})$

    3: $C=F_{\rm{contour}}(P_{\rm{binary}})$

    4: for $ C[i] $ in $ C $ do

    5:  if (the number of points in $ C[i] $) $ > $ $ threshold_1 $ and $ C[i] $ owns    a sub-contour $C[j] \in C$ then

    6:   $B =F_{\rm{ellipse}}(C[i])$, $B_s=F_{\rm{ellipse}}(C[j])$

    7:   if $ B.error $ $ < $ $ threshold_2 $, $ B_s.error $ $ < $ $ threshold_2 $ and      $ (B.height/B_s.height)\in(threshold_3,threshold_4) $ then

    8:    $ u_c=(B.center.u+B_s.center.u)/2 $

    9:    $ v_c=(B.center.v+B_s.center.v)/2 $

    10:    $Area=\pi/4 \times B.width\times B.height$

    11:    return TRUE

    12:   end if

    13:  end if

    14: end for

    15: return FALSE

    In this section, we propose a robust multi-collaborative tracker (RMCTer) to achieve stable and accurate pedestrian tracking in real time. As shown in Fig. 4, the RMCTer consists of a short-term tracking module and a long-term processing module. In particular, a two-stage correlation filter-based tracker consisting of a translation filter and a scale filter is employed for the short-term tracking module. This component generally works accurately and efficiently in relatively stable scenarios. Along with this short-term module, in order to be robust against variations in the appearance of object and compensate for visual tracking errors, an efficient long-term processing module is presented. It is composed of a learning-based multi-object detection network (i.e., YOLOv3), and a two-stage template scoring component based on the image histogram and SURF (speeded up robust features) matching. Note that, although the detection network (YOLOv3) requires a relatively large amount of computation resource, our short-term module is computationally efficient with reliable short-term tracking performance. A periodic takeover control strategy is adopted for the proposed tracker to ensure both effectiveness and real-time performance. For each $ N $ received frames, the first $ N-1 $ continuous frames are directly processed by the short-term tracking module, while the $ N $-th frame is the input to the long-term processing module and its output is leveraged to modify the current tracking model in a timely manner. Obviously, a smaller $ N $ makes the proposed tracker more robust to visual tracking errors and object appearance changes, but excessively decreasing $ N $ leads to increasing amounts of computations and therefore affects real-time performance. According to experimental experience, we set $N \in \left( {{f_i}/3 - {f_i}/2} \right)$ with $ f_i $ being the image frequency, after which satisfying tracking performance can be achieved. In addition, in order to observe the tracking state and determine whether to re-initialize, the short-term tracking state and the long-term processing state are defined in terms of the peak-to-sidelobe ratio (PSR) value and the evaluation scores based on image histogram and SURF matching, respectively.

    Figure  4.  Overview of the proposed robust multi-collaborative tracker: RMCTer, which mainly consists of a short-term tracking module and a long-term processing module.

    The following subsections detail successively all the components of RMCTer: the two-stage short-term tracking module, long-term processing module and initialization/re-initialization approach.

    1) Two-Stage Short-Term Tracking Module: In order to obtain accurate and efficient short-term tracking performance, we leverage a correlation filter-based tracker, where a two-stage filter process is performed to obtain translation estimation and scale estimation. In particular, the employed tracker is based on the KCF [20] and discriminative scale space correlation filter (DSSCF) [22]. Compared with [20]–[22] where only the current frame is used to train the discriminative model such that the temporal correlation is inevitably neglected, we use the recent frames in a sliding window to train our discriminative model. In this way, the tracking model is weighted by time order and inherently contains a temporal context, leading to a better description of the current object appearance.

    For translation estimation, the goal of training is to find a classifier $ f({\bf{x}}) = \langle{\bf{w}},\phi({\bf{x}})\rangle $ which minimizes the squared error over samples $ {\bf{x}}_i $ and their regression targets $ y_i $

    minwkm=m0βmi(w,ϕ(xi)yi)2+λw2 (1)

    where $ \langle\cdot,\cdot\rangle $ denotes the dot product, $ \phi({\cdot}) $ is the mapping to a Hilbert space, $ \lambda $ is a regularization parameter that controls overfitting, $m_0 = \max\{k-M+1,1\}$ represents the index of the first frame in the sliding window, $ k $ denotes the current frame index, $M $ is the number of total frames in the sliding window, and $\beta_m={\frac{1-\eta}{1-\eta^{\min\{M,k\}}}}\eta^{k-m}$ is the weight for $m$-th frame with $ \eta\in(0,1) $. It can be seen that $\beta_m$ increases with m, and the correlation filter model owns advantage to reflect the temporal variation of the tracking object. In addition, it follows from [20] that, the classifier uses all the cyclic shift versions of $ {\bf{x}} $ (i.e., $ {\bf{x}}_i $, $i\in\{0,\dots,I-1\}$) to train the model instead of using dense sliding windows to extract training samples. Each example $ {\bf{x}}_i $ is assigned with a score $ y_i $ ($ y_i\in[0,1] $) generated by a Gaussian function in terms of the shifted distance. Employing a kernel $ \kappa({\bf{x}},{\bf{x}}') = \langle \phi({\bf{x}}),\phi({\bf{x}}')\rangle $, the classifier is further derived as $ f({\bf{x}}) = \langle{\bf{w}},\phi({\bf{x}})\rangle = \displaystyle\sum\nolimits_{i}{\alpha_i\kappa({\bf{x}},{\bf{x}}_i)} $, where ${{{\alpha}}}$ is the dual variable of $ {\bf{w}} $, i.e., $ {\bf{w}} = \displaystyle\sum\nolimits_{i}\alpha_i\phi({\bf{x}} $) [20].

    The solution to (1) is given by

    ˆα=km=m0βmˆymkm=m0βmˆkxxm+λ (2)

    where $ \hat{{{\alpha}}} $, $ \hat{{\bf{y}}}_m $, and $ \hat{{\bf{k}}}^{{\bf{x}}{\bf{x}}}_m $ represent the discrete Fourier transforms of $ {{\alpha}} $, $ {{\bf{y}}}_m $, and $ {{\bf{k}}}^{{\bf{x}}{\bf{x}}}_m $, and $ {{\bf{k}}}^{{\bf{x}}{\bf{x}}} $ is a vector with $ i $-th element being $ \kappa({\bf{x}}_i, {\bf{x}}) $. In particular, for image data with $ C $ feature channels, a concatenation ${\bf{x}} = [_1{\bf{x}};\dots; {_C{\bf{x}}}]$ is first constructed, and then the kernel correlation $ {\bf{k}}^{{\bf{x}}{\bf{x}}} $ derived from a selected kernel can be efficiently computed by element-wise products and simple summation over the feature channels in the Fourier domain. For example, based on the gaussian kernel, $ {\bf{k}}^{{\bf{x}}{\bf{x}}} $ is calculated by

    kxx=exp(1σ2(x2+x22F1(Cc=1cˆxcˆx))) (3)

    where $ \odot $ denotes the operator of element-wise products, and $ c $ is the index of the feature channels.

    During the detection, given a candidate image patch $ {\bf{z}} $ which is determined by the tracking result of the last frame, all cyclic patches of $ {\bf{z}} $ are evaluated via

    ˆf(z)=ˆkxzˆα. (4)

    Then, we take the inverse Fourier transform of $\hat{{\bf{f}}}({\bf{z}})$ and find the maximum response. The corresponding patch is the result. In order to provide the tracker with better memory, the filter coefficients of $ {{\alpha}} $ and the target template $ {\bf{x}} $ are updated in an interpolating manner with a learning rate $ \iota $.

    In order to handle scale variation, it follows from [22] that a second-stage one dimensional correlation filter is performed. To evaluate the scale filter, the scale pyramids are built around the tracking target. In particular, the image patches centered around the location obtained by the first-stage translation filter are cropped from the image. The sizes of image patches are $ a^nP\times a^nR $, where $ P\times R $ is the current size of the target, $ a $ is the scale factor, and $n\in\left\{-\frac{S-1}{2},\dots,\frac{S-1}{2}\right\}$ with $ S $ denoting the size of the scale filter. Then, the cropped image patches are resized to the template size for feature extraction. The scale with the maximum filtering response is selected as the current tracking scale. Similar to the first-stage translation filter, the model parameters are also updated by using a linear interpolation method. More details can be found in [22].

    In addition, the short-term tracking state is estimated by the value of PSR [19], which is defined as

    ε=gmaxμσ (5)

    where $g_{\max}$ is the maximal value on the correlation map ${{\bf{f}}}({\bf{z}})$, $ \mu $ and $ \sigma $ are the mean and standard deviation of the sidelobe. Given $ \varepsilon $, we simply define three short-term tracking states $ S_{t} $ based on two predefined threshold $ \tau_1 $ and $ \tau_2 $

    St={excellence,ετ2success,τ1ε<τ2failure,ε<τ1. (6)

    If the state is $ success $ or $ excellence $, the current tracking model is updated in a general manner. If the state is $ failure $, the re-initialization module starts. In addition, if the state is $ excellence $, the current tracking patch will be selected as the template update candidates leveraged in the long-term processing module.

    2) Long-Term Processing Module: As described in Section I, a single short-term tracking module cannot effectively adapt to the variations of object appearance and handle tracking errors. Therefore, we propose a long-term processing module, which is composed of a state-of-the-art deep learning-based object detection network (i.e., YOLOv3) and a two-stage template scoring component.

    In particular, the input frame is first processed by the YOLOv3 network, whose output consists of multiple bounding boxes containing various objects and the corresponding labels. Then, the image patches corresponding to the bounding boxes with pedestrian labels are cropped from the image and selected as the candidate image patches. If there are no bounding boxes with the pedestrian label, this frame is directly processed by the short-term tracking module, and the next frame is input to the long-term processing module. If this condition occurs in $ E $ ($ E = 4 $ in our implementation) consecutive frames, then re-initialization is started. Note that if our objective is to track other targets, the selected labels can be easily modified. Next, the two-stage template scoring component is employed to determine the unique tracking object from the candidates. In the first stage, we perform similarity testing in terms of an image histogram between the candidate image patches and template images. In the second stage, SURF matching is performed between the template images and the top three candidate image patches in similarity testing. In terms of the above two-stage results, these three candidate image patches are assigned with comprehensive evaluation scores. In particular, we denote the highest score as

    δmax=maxi{Hi+A×Ki} (7)

    where $ i $ represents the index of the top three candidate image patches in similarity testing, $ H_i $ denotes the similarity value based on image histogram, $ A $ represents the area size of the image patch, $ \ell $ represents an adjustable coefficient, and $ K_i $ represents the number of matched SURF points. If $ \delta_{\max} $ is less than the predefined threshold $ \tau_d $, the long-term processing state is $ failure $, and the re-initialization module is activated. If $ \delta_{\max} $ exceeds $ \tau_d $, the state is set to $ success $. The corresponding image patch with $ \delta_{\max} $ is the result of long-term processing module, denoted by $ {\bf{x}}_{l} $. Then, we use it to modify the current short-term tracking model. In particular, the tracking patch $ {\bf{x}} $ is first updated according to

    x=ρxl+(1ρ)xs (8)

    where $\, \rho\in[0,1] $ denotes the weight coefficient and $ {\bf{x}}_{s} $ represents the current tracking result from the short-term tracking component. Then, the model parameter $ \hat{{\alpha}} $ is correspondingly updated in terms of (2), (3), and (8).

    As the tracking process proceeds, the template images should be updated in real time due to possible variations in the object appearance. Therefore, we perform a fast templates updating method. In particular, the tracking patches with short-term tracking state $ S_t $ being $ excellence $ are first selected as template candidates, and then, the candidate which has a certain parallax with the latest template image is selected as a new template. In order to guarantee the bounded computational complexity for template scoring, the total number of templates is set to 3–5 in our implementation.

    3) Initialization/Re-Initialization: The initialization/re-initialization approach becomes simple thanks to the well-established long-term processing module. To be specific, the processing result of the long-term processing module serves as the initial bounding box of the object after the long-term processing state stays as $ success $ for $ L $ successive frames. The bounding box is then leveraged to activate the short-term tracking module.

    Based on the visual object tracking results (Section III), we propose a PBVS control method for a nano-scale quadrotor to achieve tracking. In this section, we first introduce a method of estimating the relative position between the object and the quadrotor. Then, a nonlinear adaptive tracking controller is proposed by taking into account the dynamics of the nano-scale quadrotor and the presence of internal and external disturbances.

    In order to perform the PBVS task, the relative position between the object and the quadrotor in the inertial frame must be calculated. Therefore, we propose a feasible two-stage estimation method, and the following assumptions are first imposed.

    Assumption 1: The object height is relatively stable.

    Assumption 2: In the initialization stage, the altitude of the quadrotor is available and the tracking object is on flat ground.

    Remark 1: Assumption 1 is reasonable since most common practical tracking objects satisfy such an assumption, e.g., rigid-body, cars, standing pedestrian. Note that Assumption 2 is only needed in the initialization stage instead of being required in the whole tracking process as considered in [9] and [28]. Therefore, Assumption 2 can be more easily guaranteed and more applicable to the practical tracking assignment (e.g., slope ground, object moving up and moving down during tracking).

    As shown in Fig. 5, we denote $ f_A $ and $ f_B $ as the highest and lowest points of the tracking object, and $ A' $ and $ B' $ as their projection points on the image plane, respectively. In our implementation, the projection points $ A' $ and $ B' $ are determined by the midpoints of the upper bound and the lower bound of the tracking bounding box, respectively. According to the standard pinhole imaging model, the positions of $ f_A $ and $ f_B $ in the inertial frame are given by

    Figure  5.  Illustration of relative position estimation between the object and the nano-scale quadrotor.
    IpfA=sAICRK1[uAvA1]+IpC=sA[xfAyfAzfA]+IpCIpfB=sBICRK1[uBvB1]+IpC=sB[xfByfBzfB]+IpC (9)

    where $ {\left[ {uA,vA} \right]^T}$ and $ {\left[ {uB,vB} \right]^T}$, obtained by the proposed visual object tracking module in Section III, represent the coordinates of $ A' $ and $ B' $ on the image plane. $ {^I_C{\bf{R}}} $ represents the rotation matrix from the camera frame to the inertial frame, $ ^I{\bf{p}}_C $ is the position of the origin of the camera frame expressed in the inertial frame, $ {\bf{K}} $ is the intrinsic matrix of the camera, ${\left[ x_{f_i}, y_{f_i}, {\textit{z}}_{f_i} \right]^T} = {^I_C\mathbf{R}\mathbf{K}^{-1}\left[u_{i'}, v_{i'}, 1 \right]^T},\ i=\{A, B\}$ and $ s_{A} $ and $ s_{B} $ represent the unknow scale factors.

    By denoting the height of the tracking object as $ h_p $, we have that

    IpfBIpfA=[00hp] (10)

    i.e.,

    sAxfA=sBxfB (11)
    sAyfA=sByfB (12)
    hp=sAzfAsBzfB. (13)

    In terms of (11) and (12), we have that

    sB=sA2(xfAxfB+yfAyfB). (14)

    Substituting (14) into (13) yields

    sA=hpzfAzfB2(xfAxfB+yfAyfB). (15)

    In the tracking stage (Fig. 5(a)), the relative position between the object and the quadrotor in the inertial frame, i.e., $ ^I{\bf{d}} = [d_x, d_y, d_{\textit{z}}]^T $, is defined by

    Id=[dxdydz]=[IpC(1)IpfA(1)IpC(2)IpfA(2)IpC(3)IpfA(3)]=[sAxfAsAyfAsAzfA] (16)

    where $ {^I{\bf{p}}_C}(i) $ represents the $ i $-th element of the vector $ {^I{\bf{p}}_C} $. Then, substituting (15) into (16) yields

    Id=[hpxfAzfB2(xfAxfB+yfAyfB)zfAhpyfAzfB2(xfAxfB+yfAyfB)zfAhpzfAzfB2(xfAxfB+yfAyfB)zfA]. (17)

    However, the value of object height cannot be directly obtained since there is no prior model information of the tracking object. We therefore design a feasible method to estimate the object height.

    In the initialization stage, as shown in Fig. 5(b), it follows from Assumption 2 that:

    huo=IpC(3)IpfB(3)=sBzfB (18)

    where $ {h_u}_o $ represents the altitude of quadrotor retrieved from fusing the datum of the altitude sensor (e.g., ToF sensor) and IMU sensor in the initialization stage. According to (18), the scale factor ${s_B}$ can be calculated by

    sB=huozfB. (19)

    Substituting (14) and (19) into (13) yields

    hp=huo[1zfBzfA2(xfAxfB+yfAyfB)]. (20)

    Based on (9) and (20), the height of the object $ h_p $ is obtained. Therefore, for every upcoming frame in the tracking stage, it follows from Assumption 1 that the relative position between the object and the quadrotor is calculated by implementing (9) and (17) with the obtained $ h_p $.

    Note that after initialization, our estimation method does not depend on the flat ground assumption and does not need the altitude information of the UAV. It is especially useful in the cases where the object moves up or moves down during tracking, or the altitude information of the UAV is not reliable (note that the upper measurement limit of ToF sensor is only $ 4 $ m).

    Next, we present a nonlinear adaptive tracking controller for the nano-scale quadrotor to achieve stable object tracking with consideration of the presence of internal and external disturbances.

    For notation simplicity, $ {\bf{p}} $, $ {\bf{v}} $, $ {\bf{R}} $, $ {{\omega}} $, $ {\bf{f}} $, $ {\bf{\Delta f}} $, $ {{\tau}} $, $ {\bf{\Delta}} {{\tau}} $, $ {\bf{p}}^r $, $ {\bf{d}} $, $ {\bf{\Delta d}} $, and ${\bf{d}}^{\rm {exp} }$ denote $ ^I{\bf{p}}_B $, $ ^I{\bf{v}}_B $, $ ^I_B{\bf{R}} $, $ ^B{{\omega}} $, $ ^B{\bf{f}} $, $ ^I{\bf{\Delta f}} $, $ ^B{{\tau}} $, $ ^B{\bf{\Delta}} {{\tau}} $, $ ^I({\bf{p}}^r)_B $, $ ^I{\bf{d}} $, $ ^I{\bf{\Delta d}} $, and $^I{\bf{d}}^{\rm {exp} }$, respectively.

    1) The Nano-Scale Quadrotor Model: By considering the nano-scale quadrotor model as a rigid body, its kinematics and dynamics equations are described as follows [41]:

    ˙p=v (21)
    m˙v=mge3+Rf+Δf (22)
    f=Fe3 (23)
    ˙R=RS(ω) (24)
    J˙ω=S(ω)Jω+τ+Δτ (25)

    where $ {\bf{p}} = [p_x,p_y,p_{\textit{z}}]^T $ and $ {\bf{v}} = [v_x,v_y,v_{\textit{z}}]^T $ represent the position and velocity of the nano-scale quadrotor in the inertial frame, respectively, $ m $ denotes the mass of the quadrotor, $ g $ is the gravitational acceleration, $ {\bf{e_3}} = [0,0,1]^T $, $ {{\omega}} = [\omega_x,\omega_y,\omega_{\textit{z}}]^T $ represents the angular velocity of the quadrotor in the body frame, $ {\bf{f}} $ and $ {{\tau}} $ are the force and torque vectors in the body frame, respectively, $F$ is the total thrust force from the four rotors, $ {\bf{\Delta{f}}} $ and $ {\bf{\Delta}} {{\tau}} $ are the external and internal disturbance forces and torques, $ {\bf{R}} $ denotes the rotation matrix from the body frame to the inertial frame, i.e.,

    R=[cθcψcψsθsϕcϕsψcϕcψsθsϕsψcθsψsψsθsϕ+cϕcψcψsψsθsϕcψsθcθsϕcθcϕ]

    with $ \phi,\theta,\psi $ representing the corresponding Euler angles, i.e., the roll, pitch, and yaw, respectively, where $ c(\cdot) $ and $ s(\cdot) $ are short for $ \cos(\cdot) $ and $ \sin(\cdot) $, and $ {\bf{J}} = {\rm{diag}}(J_x,J_y,J_{\textit{z}}) $ denotes the inertial matrix with respect to the body frame.

    The applied lift force $F$ and the rotation torque $ {{\tau}} = [\tau_x, \tau_y, \tau_{\textit{z}}]^T $ are determined by the propeller thrusts. In particular, define $ l_i(i = 1, 2, 3, 4) $ as the thrust generated by each propeller. $F$ and $ {{\tau}} $ are computed by [42]

    [Fτxτyτz]=[11110d0dd0d0kkkk][l1l2l3l4] (26)

    where $ d $ denotes the distance from the quadrotor center of gravity (c.g.) to the rotation axis of each propeller, and $ k $ is the anti-torque coefficient. Since the thrust of each propeller $ l_i (i = 1, 2, 3, 4) $ can be evaluated with the applied thrust $F$ and torque $ {{\tau}} $ explicitly via (26), $F$ and $ {{\tau}} $ will be used for the subsequent controller development.

    Assumption 3: The disturbance forces $ {\bf{\Delta{f}}} $ and torques $ {\bf{\Delta}} {{\tau}} $ are bounded. In particular,

    |Δfi|ϖi,|Δτi|Θi,i=x,y,z. (27)

    Assumption 4: During the flight progress, the pitch angle $ \theta $ and the roll angle $ \phi $ are both in the region $\left( { - \pi /2,\pi /2} \right)$.

    Remark 2: Assumption 3 is valid because the forces ${\bf{ \Delta}} {\bf{f}}$ and torques ${\bf{\Delta}} {{\tau}}$ are always bounded in practical applications. Assumption 4 is reasonable since our nano-scale quadrotor is supposed to operate only under hovering or low velocity flight modes. Agile motions of quadrotors are beyond the scope of this research. The same assumption is also utilized in [37], [43], [44].

    2) Simplified Model: Due to the strong coupling of the above quadrotor model (21)–(25), it should be decoupled to facilitate the control design. In particular, the quadrotor model can be divided into two subsystems [44]

    a) Altitude system

    ˙pz=vz (28)
    m˙vz=mg+cθcϕF+Δfz. (29)

    b) Longitudinal-lateral subsystem

    ˙ˉp=ˉv (30)
    m˙ˉv=FˉR+Δˉf (31)
    ˙ˉR=ˇRˉω (32)
    ˙ψ=sϕcθωy+cϕcθωz (33)
    J˙ω=S(ω)Jω+τ+Δτ (34)

    where $ \bar{{\bf{p}}} = [p_x,p_y]^T $, $ \bar{{\bf{v}}} = [v_x,v_y]^T $, ${\bf{\Delta}} \bar{{\bf{f}}} = [\Delta f_x, \Delta f_y]^T$, $\bar{{{\omega}}} = [\omega_x,\omega_y]^T$, $ \bar{{\bf{R}}} = [R_{13},R_{23}]^T $ and the invertible matrix $ \check{{\bf{R}}} = \left[ {R12R11R22R21} \right] $ with $ R_{ij} $ denoting the element in the $ i $-th row and the $ j $-th column of the rotation matrix $ {\bf{R}} $.

    3) Desired Trajectory: We define the desired trajectories of the nano-scale quadrotor in the inertial frame as

    [prψr]=[p+Δdψ+Δφ] (35)
    Δd=ddexp (36)

    where ${\bf{p}}^r = [p_x^r, p_y^r, p_{\textit{z}}^r]^T$ and ${\bf{p}} = [p_x, p_y, p_{\textit{z}}]^T$ represent the desired and the current positions of the quadrotor, and $ \psi^r $ and $ \psi $ denote the desired and the current yaw angles. Note that both $ {\bf{p}} $ and $ \psi $ are known, which are estimated by fusing multi-sensors data (i.e., ToF sensor, IMU, and optical flow sensor) with the EKF algorithm. In addition, $ {\bf{d}} $ represents the relative position between the object and quadrotor in the inertial frame (calculated in Section IV-A), $ {\bf{d}}^{\exp} $ denotes the expected relative position, and $ \Delta\psi $ represents the yaw angle that the quadrotor needs to adjust. In particular, $ \Delta\psi $ is generally set to be $ 0 $ in the tracking process, and it will be adjusted to drive UAV to search again the object when it disappears.

    4) Control Law Design: Before designing the controller, a lemma is first introduced.

    Lemma 1: Given any scalar positive function $\varepsilon(t): [0,+\infty)\rightarrow {\mathbb{R}}_{+}$, the following inequality holds:

    |η|η2η2+ε(t)<ε(t),ηR. (37)

    Proof: Since $\varepsilon(t)>0$, we have

    |η|η2η2+ε(t)=|η|η2+ε(t)|η|η2+ε(t)<η2+ε(t)|η|η2+ε(t)|η|=ε(t). (38)

    Our control objective is to develop control thrust $F$ and torque $ {{\tau}} $ such that the nano-scale quadrotor tracks a time-varying desired trajectory $ [ {({\bf{p}}^r)}^T, \psi^r]^T $, and the tracking error is bounded and driven to an arbitrarily small neighborhood of the origin. The control scheme is designed in terms of the backstepping methodology.

    We first consider the altitude subsystem, and derive a control thrust $F$ such that the altitude tracking error is bounded. Define the altitude tracking error as $ \tilde{p}_{\textit{z}} = {p}_{\textit{z}}-{p}_{\textit{z}}^r $. It follows from (28) that:

    ˙˜pz=˙pz˙prz=vz˙prz. (39)

    Define a Lyapunav function $ V_{p_{\textit{z}}} = \dfrac{1}{2}{\tilde{p}_{\textit{z}}}^2 $ and its derivative along (39) is

    ˙Vpz=˜pz(vz˙prz). (40)

    Design a virtual control law $ \alpha_{{\textit{z}}} $ as

    αz=kpz˜pz+˙prz,kpz>0. (41)

    Substituting (41) into (40) yields

    ˙Vpz=kpz˜p2z+˜pz˜vz (42)

    where $ \tilde{v}_{{\textit{z}}} = v_{{\textit{z}}}-\alpha_{{\textit{z}}} $. Next, in terms of (29), it follows that:

    m˙˜vz=m(˙vz˙αz)=mg+cθcϕF+Δfzm˙αz. (43)

    Assign a Lyapunov function $ V_{v_{\textit{z}}} = \dfrac{m}{2}\tilde{v}_{{\textit{z}}}^2 $. The derivative of $ V_{v_{\textit{z}}} $ along (27) satisfies

    ˙Vvz=˜vz(mg+cθcϕF+Δfzm˙αz)˜vz(mg+cθcϕFm˙αz)+|˜vz|ϖz.

    In view of Lemma 1, it follows that:

    ˙Vvz<˜vz(mg+cθcϕFm˙αz+˜vzϖz˜v2z+εvz)+εvzϖz. (44)

    Under Assumption 4, we design $F$ as

    F=1cθcϕ(m(g+˙αz)kvz˜vz˜pz˜vzˆϖz˜v2z+εvz) (45)

    where $ k_{v_{\textit{z}}}>0 $, $ {\hat{\varpi}}_{\textit{z}} $ is the estimated value of $ {\varpi}_{\textit{z}} $. Choose an augmented Lyapunov function $V_{v_{\textit{z}}}^* = V_{v_{\textit{z}}}+\tilde \varpi _z^2/2{\displaystyle\varsigma _{{v_z}}},{\displaystyle\varsigma _{{v_z}}} > 0$, where $ \tilde{\varpi}_{\textit{z}} = {\varpi}_{\textit{z}}-\hat{\varpi}_{\textit{z}} $. Substituting (45) into (44) and differentiating $ V_{v_{\textit{z}}}^* $ yield

    ˙Vvz<kvz˜v2z+˜vz(˜vz˜ϖz˜v2z+εvz˜pz)+εvzϖz1ςvz˜ϖz˙ˆϖz.

    Then, the update law for $ {\hat{\varpi}}_{\textit{z}} $ is deigned by

    ˙ˆϖz=ςvz(˜v2z˜v2z+εvzσvzˆϖz),σvz>0 (46)

    and it follows that:

    ˙Vvz<kvz˜v2z˜vz˜pz+εvzϖz+σvz˜ϖzˆϖz. (47)

    We next consider the longitudinal-lateral subsystem, and design $ {{\tau}} $ with the designed $F$, such that the longitudinal-lateral position tracking and yaw tracking are achieved without destroying the tracking performance of the altitude subsystem.

    Define the longitudinal-lateral position error $ \tilde{\bar{{\bf{p}}}} = \bar{{\bf{p}}}-\bar{{\bf{p}}}^r $, where $ \bar{{\bf{p}}}^r = [p_x^r,p_y^r]^T $. Following (30), we get $ \dot{\tilde{\bar{{\bf{p}}}}} = \dot{\bar{{\bf{p}}}}-\dot{\bar{{\bf{p}}}}^r = \bar{{\bf{v}}}-\dot{\bar{{\bf{p}}}}^r $. Define a Lyapunov function $ V_{\bar{p}} = \dfrac{1}{2}\tilde{\bar{{\bf{p}}}}^T\tilde{\bar{{\bf{p}}}} $, and its derivative satisfies

    ˙Vˉp=˜ˉpT(ˉv˙ˉpr). (48)

    We then design a virtual control law $ {{\alpha}}_{\bar{p}} $

    αˉp=kˉp˜ˉp+˙ˉpr,kpz>0. (49)

    Substituting (49) into (48) yields

    ˙Vˉp=kˉp˜ˉp2+˜ˉpT˜ˉv (50)

    where $ \tilde{\bar{{\bf{v}}}} = \bar{{\bf{v}}}-{{\alpha}}_{\bar p} $. Next, in view of (31), we have

    m˙˜ˉv=m˙ˉvm˙αˉp=FˉR+Δˉfm˙αˉp. (51)

    Assign a Lyapunov function $ V_{\bar{v}} = \dfrac{m}{2}\tilde{\bar{{\bf{v}}}}^T\tilde{\bar{{\bf{v}}}} $. Considering (27) and (51), the derivative of $ V_{\bar{v}} $ satisfies

    ˙Vˉv=˜ˉvT(FˉR+Δˉfm˙αˉp)˜ˉvT(FˉRm˙αˉp)+i=x,y|˜ˉvi|ϖi. (52)

    In view of Lemma 1, it follows that:

    ˙Vˉv<˜ˉvT(FˉRm˙αˉp+Fεˉv(˜ˉv)ˉϖ)+εˉvi=x,yϖi (53)

    where $ {\bf{F}}_{\varepsilon_{\bar{v}}}(\tilde{\bar{{\bf{v}}}}) = {{\rm{diag}}}{\left(\dfrac{\tilde{\bar{ v}}_x}{\sqrt{\tilde{\bar{ v}}_x^2+\varepsilon_{\bar{v}}}},\dfrac{\tilde{\bar{ v}}_y}{\sqrt{\tilde{\bar{ v}}_y^2+\varepsilon_{\bar{v}}}}\right)} $ and $ \bar{{{\varpi}}} = [\varpi_x, \varpi_y]^T $. Design a virtual control $ {{\alpha}}_{\bar{v}} $

    αˉv=1F(kˉv˜ˉv+m˙αˉpFεˉv(˜ˉv)ˆˉϖ˜ˉp) (54)

    where $ k_{\bar{v}}>0 $ and $ \hat{\bar{{{\varpi}}}} $ is the estimated value of $ \bar{{{\varpi}}} $. Choose an augmented Lyapunov function

    Vˉv=Vˉv+12˜ˉϖTΞ1ˉv˜ˉϖ (55)

    where $ {\bf{\Xi}}_{\bar v} $ is a positive definite matrix and $ \tilde{\bar{{{\varpi}}}} = \bar{{{\varpi}}}-\hat{\bar{{{\varpi}}}} $. Substituting (54) into (53) and differentiating $ V_{\bar v}^* $ yields

    ˙Vˉv<kˉv˜ˉv2+˜ˉvT(F˜ˉR˜ˉp+Fεˉv(˜ˉv)˜ˉϖ)+εˉvi=x,yϖi˜ˉϖTΞ1ˉv˙ˆˉϖ (56)

    where $ \tilde{\bar{{\bf{R}}}} = \bar{{\bf{R}}}-{{\alpha}}_{\bar{v}} $. Design the update law for $ \hat{\bar{{{\varpi}}}} $

    ˙ˆˉϖ=Ξˉv(Fεˉv(˜ˉv)˜ˉvσˉvˆˉϖ),σˉv>0 (57)

    such that

    ˙Vˉv<kˉv˜ˉv2+˜ˉvT(F˜ˉR˜ˉp)+εˉvi=x,yϖi+σˉv˜ˉϖTˆˉϖ. (58)

    In view of (32), the error dynamics of $ \tilde{\bar{{\bf{R}}}} $ is $ \dot{\tilde{\bar{{\bf{R}}}}} = \dot{\bar{{\bf{R}}}}-\dot{{{\alpha}}}_{\bar v} = \check{{\bf{R}}}\bar{{{\omega}}}-\dot{{{\alpha}}}_{\bar v} $. Assign a Lyapunov function $ V_{\bar{R}} = \dfrac{1}{2}\tilde{\bar{{\bf{R}}}}^T\tilde{\bar{{\bf{R}}}} $, and its derivative satisfies

    ˙VˉR=˜ˉRT(ˇRˉω˙αˉv). (59)

    Design a virtual control $ {{\alpha}}_{\bar{R}} $ as

    αˉR=ˇR1(kˉR˜ˉRF˜ˉv+˙αˉv). (60)

    Substituting (60) into (59) yields

    ˙VˉR=kˉR˜ˉR2+˜ˉRT(F˜ˉv+ˇR˜ˉω) (61)

    where $ \tilde{\bar{{{\omega}}}} = \bar{{{\omega}}}-{{\alpha}}_{\bar{R}} $.

    Define the yaw error $ \tilde{\psi} = \psi-\psi^r $. In terms of (33), the error dynamics is $\dot{\tilde{\psi}} = \dot{\psi}-\dot{\psi}^r = \dfrac{s\phi}{c\theta}\omega_y+\dfrac{c\phi}{c\theta}\omega_z-\dot{\psi}^r$. Choose a Lyapunov function $ V_{\psi} = \dfrac{1}{2}\tilde{\psi}^2 $, and its derivative is

    ˙Vψ=˜ψ(sϕcθωy+cϕcθωz˙ψr). (62)

    Design a virtual control law $ \alpha_\psi $

    αψ=cθcϕ(kψ˜ψ+˙ψrsϕcθωy),kψ>0. (63)

    Substituting (63) into (62) yields

    ˙Vψ=kψ˜ψ2+cϕcθ˜ψ˜ωz (64)

    where $\tilde{\omega}_{\textit{z}} = \omega_{\textit{z}}-\alpha_\psi$.

    Define $ {{\alpha}}_{R} = [{{\alpha}}_{\bar{R}}^T,\alpha_\psi]^T $ and $ \tilde{\omega} = {{\omega}}-{{\alpha}}_{R} $. From (34), the error dynamics of the angular velocity is

    J˙˜ω=J˙ωJ˙αR=S(ω)Jω+τ+ΔτJ˙αR. (65)

    In view of Assumption 3, choose a Lyapunov function $ V_\omega = \dfrac{1}{2}\tilde{{{{\omega}}}}^T{{\bf{J}}}\tilde{{{{\omega}}}} $. Its derivative satisfies

    ˙Vω=˜ωT(S(ω)Jω+τ+ΔτJ˙αR)˜ωT(S(ω)Jω+τJ˙αR)+i=x,y,z|˜ωi|Θi. (66)

    From Lemma 1, it follows that:

    ˙Vω<˜ωT[S(ω)Jω+τJ˙αR+Fεω(˜ω)Θ]+εωi=x,y,zΘi (67)

    where ${{\bf{F}}}_{\varepsilon_\omega}(\tilde{{{{\omega}}}}) = {{\rm{diag}}}{\left(\dfrac{\tilde{ \omega}_x}{\sqrt{\tilde{{{{\omega}}}}_x^2+\varepsilon_\omega}},\dfrac{\tilde{{{{\omega}}}}_y}{\sqrt{\tilde{{{{\omega}}}}_y^2+\varepsilon_\omega}},\dfrac{\tilde{{{{\omega}}}}_{\textit{z}}}{\sqrt{\tilde{{{{\omega}}}}_{\textit{z}}^2+\varepsilon_\omega}}\right)}$. Finally, we can design $ {{\tau}} $ as

    τ=kω˜ως+S(ω)Jω+J˙αRFεω(˜ω)ˆΘ (68)

    where $k_\omega > 0\; {\rm{and}}\ {{\varsigma}} = \left[\tilde{\bar{{\bf{R}}}}^T\check{{\bf{R}}},\dfrac{c\phi}{c\theta}\tilde{\psi}\right]^T$. Choose an augmented Lyapunov function as $ V_\omega^* = V_\omega+\dfrac{1}{2}\tilde{{\bf{\Theta}}}^T{\bf{\Xi}}_\omega^{-1}\tilde{{\bf{\Theta}}} $, where $ \tilde{{\bf{\Theta}}} = {\bf{\Theta}}-\hat{{\bf{\Theta}}} $, and $ {\bf{\Xi}}_\omega $ is a positive definite matrix. The derivative of $ V_\omega^* $ satisfies

    ˙Vω<kω˜ω2+˜ωT[Fεω(˜ω)˜Θς]+εωi=x,y,zΘi˜ΘTΞ1ω˙ˆΘ. (69)

    In addition, the update law for $ \hat{{\bf{\Theta}}} $ is designed as

    ˙ˆΘ=Ξω(Fεω(˜ω)˜ωσωˆΘ),σω>0 (70)

    then it follows that:

    ˙Vω<kω˜ω2˜ωTς+εωi=x,y,zΘi+σω˜ΘTˆΘ. (71)

    Theorem 1: Consider the nano-scale quadrotor system (21)–(25). Suppose Assumptions 3 and 4 hold. The proposed control laws (45), (68), and the parameter update laws (46), (57), (70) guarantee that the tracking errors of the closed-loop system remain uniformly bounded and converge to a small compact set containing the origin.

    Proof: Consider the Lyapunov function

    V=Vpz+Vvz+Vˉp+Vˉv+VˉR+Vψ+Vω.

    In terms of (42), (47), (50), (58), (61), (64), and (71), the derivative of $ V $ satisfies

    ˙V<kpz˜p2zkvz˜v2zkˉp˜ˉp2kˉv˜ˉv2kˉR˜ˉR2kψ˜ψ2kω˜ω2+εvzϖz+εˉvi=x,yϖi+εωi=x,y,zΘi+σvz˜ϖzˆϖz+σˉv˜ˉϖTˆˉϖ+σω˜ΘTˆΘkpz˜p2zkvz˜v2zkˉp˜ˉp2kˉv˜ˉv2kˉR˜ˉR2kψ˜ψ2kω˜ω2σvz2˜ϖ2zσˉv2˜ˉϖ2σω2˜Θ2+εvzϖz+εˉvi=x,yϖi+εωi=x,y,zΘi+σvzϖz22+σˉvˉϖ22+σωΘ222kminV+C

    where $k_{\min} = {k_1}/{k_2}$ with $k_1 = \min\left\{k_{p_{\textit{z}}},k_{v_{\textit{z}}},k_{\bar p},k_{\bar v},k_{\bar{R}}, k_\psi,k_{\omega}, {\sigma _{{v_z}}}/2,{\sigma _{\bar v}}/2,{\sigma _\omega }/2\right\}$ and $k_2 = \max\{1,\varsigma_{v_{\textit{z}}}^{-1},m,\lambda_{\min}({\bf{\Xi}}_{\bar v})^{-1},$, $\lambda_{\max}({\bf{J}}), \lambda_{\min} ({\bf{\Xi}}_{\omega})^{-1}\} $ and $C = \sqrt{\varepsilon_{v_{\textit{z}}}}\varpi_{\textit{z}}+\!\sqrt{\varepsilon_{\bar v}}\displaystyle\sum\nolimits_{i = x,y} {} \;{\varpi _i}+ \sqrt{\varepsilon_\omega}\displaystyle\sum\nolimits_{i = x,y,z\;} {\Theta _i} + \sigma_{v_{\textit{z}}} \dfrac{\|{\varpi}_{\textit{z}}\|^2}{2}+ \sigma_{\bar{v}}\dfrac{\|\bar{{{\varpi}}}\|^2}{2}+\sigma_{\omega}\dfrac{\|{\bf{\Theta}}\|^2}{2}$. According to [45], $ V $ remains uniformly bounded and converges to the compact set $ \Omega = \left\{V|V\leq\sqrt{\frac{C}{k_{\min}}}\right\} $. Therefore, the tracking error $\tilde{{{\zeta}}} = [\tilde{\bar{{\bf{p}}}}^T, \tilde{p}_{\textit{z}}, \tilde{\psi}]^T$ satisfies $ \|\tilde{{{\zeta}}}\|\leq\sqrt{2V}\leq\sqrt{\dfrac{2C}{k_{\min}}} $ ultimately.■

    Remark 3: It follows from (60) and (68) that the derivatives of $ {{{\alpha}}_{\bar v}} $ and $ {{\alpha}}_{R} $ are required to be available to calculate the applied torque $ {{\tau}} $. However, it is complicated to give explicit expressions of $ \dot{{{\alpha}}}_{\bar v} $ and $ \dot{{{\alpha}}_{R}} $ according to (54) and (60). Instead, a high-order Levant differentiator proposed in [46] can be used to estimate $ \dot{{{\alpha}}}_{\bar v} $ and $ \dot{{{\alpha}}_{R}} $. We take the estimation of $ \dot{{{\alpha}}}_R $ as an example

    ˙α=h,h=λ0|ααR|12sign(ααR)+αα=λ1sign(αh) (72)

    where $ {{\alpha}} $ and $ {{\alpha'}} $ are the estimations of $ {{\alpha}}_R $ and $ \dot{{{\alpha}}}_R $, respectively, and $ \lambda_0 $ and $ \lambda_1 $ are properly specified positive constants.

    Remark 4: Compared with the visual servoing controllers adopted in [14], [25], and [28], the proposed algorithm considers the existence of disturbances, in which we leverage the $F(\eta) = \eta /\sqrt {{\eta ^2} + \varepsilon (t)}$ function to replace the traditional ${\rm sign(\cdot)}$ function, such that the non-smooth and discontinuous control forces are avoided ingeniously. In this way, the standard Lyapunov method can be directly used for stability analysis and control law design.

    In this section, we demonstrate the capabilities of the proposed tracking system with the tailor-made nano-scale quadrotor platform. In particular, the experiments consist of four parts. First, we demonstrate the visual object tracking module (Section III), which includes two operation modes: tracking an artificial object (the concentric circles), and tracking a non-artificial object (a pedestrian). Second, the method of estimating the relative position between the object and the quadrotor (Section IV-A) is verified. Third, hovering control tests in the presence of periodic disturbances are performed to demonstrate the proposed adaptive tracking controller (Section IV-B). Last but not least, real-world object tracking flight tests are performed to validate the system. The video of the experiments is available online at https://www.youtube.com/watch?v=o7yIqGukQvE.

    We then present the implementation details. In particular, the parameters for RMCTer are set as follows: the number of frames used for initialization is $ L = 10 $, and the periodic frame number used for switching modules is $ N = 11 $. The frame number of the sliding window and the weight adjustment factor used in the short-term tracking module are $M = 4$ and $ \eta = 0.8 $ in (1). The threshold values used for determining the tracking states are, respectively, $ \tau_1 = 0.6 $, $ \tau_2 = 0.88 $, $ \tau_d = 0.95 $. The scaling factor used for calculating the long-term processing scores is $\ell = 20$ in (7), and the weight coefficient used for updating the tracking patch is $ 0.85 $ ($ \rho = 0.85 $ in (8)). The control and update parameters for the tracking controller are set as follows: $ k_{p_{\textit{z}}} = k_{\bar{p}} = 1 $, $ k_{v_{\textit{z}}} = k_{\bar v} = 0.6 $, $ {k_{\bar{R}}} = 2.2 $, $ {k_\psi} = 4.4 $, $ {k_\omega} = 12.4 $, $ {\varepsilon_{v_{\textit{z}}}} = {\varepsilon_{\bar {v}}} = 0.1 $, $ \varepsilon_{\omega} = 0.15 $, $ {\varsigma_{v_z}} = 0.15 $, $ {\Xi_{\bar v}} = 0.15 \times {\bf{I}}_3 $, $ {\Xi_{\bar \omega}} = 0.15 \times {\bf{I}}_3 $, where $ {\bf{I}}_3 $ denotes a $ 3 \times 3 $ identity matrix. Note that the on-board processor of the proposed system is STM32, which has very limited computing power. Therefore, we deploy the visual object tracking module on the ground station, and the on-board processor is responsible for the visual servoing control and pose estimation. According to practical testing, the maximum stable communication distance between the nano-scale UAV and the ground station is about 90–100 m, thus our system with this configuration can work stably in most indoor environments. The ground station runs with an i7-8750H CPU and a GTX 1060 Max-q GPU in real time, where the GPU is only used for the deep-learning-based detection network YOLOv3.

    As described in Section III, the proposed visual object tracking module includes two operation modes: tracking an artificial object (the concentric circles) and tracking a non-artificial object (a pedestrian).

    First, we compare the developed concentric circle tracking algorithm (Section III-A) with the method adopted in [40]. The result is shown in Fig. 6. The left image is the result by leveraging the method of [40], and the right image shows the result by performing the proposed algorithm. The time consumptions of algorithms are marked with yellow text in the lower left corner of the images. It can be seen that, although these two methods both successfully identify the concentric circles, the proposed algorithm uses less time due to the use of the inclusion relationship of concentric circles.

    Figure  6.  The result of concentric circle tracking.

    Next, we validate the capabilities of the proposed tracking algorithm RMCTer (Section III-B) on public datasets and in real-world environment. We conduct the evaluation on the popular benchmark proposed by [47], which contains $ 100 $ fully annotated sequences. For the comparison tracking algorithms, we consider several state-of-the-art approaches including the well-known long-term tracker TLD, discriminative tracker Struck, and correlation filter-based trackers CSK [21], KCF, and discriminative scale space tracker (DSST) [22]. The comparison results on the representative videos/frames are shown in Fig. 7. If a tracker is missing, there is no tracking bounding box drawn in the corresponding frame. In particular, we focus on testing the datasets with tracking objects being pedestrians or vehicles because they are included in the detection categories of YOLOv3 (used in the long-term processing module of our tracker). It can be seen that RMCTer exhibits better robustness and accuracy compared with the aforementioned tracking algorithms. In particular, the bounding box of RMCTer tightly encloses the tracking object in all frames, while other trackers do not consistently resize the bounding boxes when the scale or shape of the object changes and even fail in several cases due to occlusion, objects out-of-view and deformation of objects. For our real-world experiment, we collect two challenging datasets denoted as dataset_1 and dataset_2, where dataset_1 is collected in the bright environment and dataset_2 is collected in low light. In addition, both two datasets also reflect other challenges in visual tracking including objects with scale variation, occlusion, deformation, out-of-plane rotation and out-of-view. For our evaluation, RMCTer is compared with state-of-the-art tracking algorithms, i.e., DSST, KCF, and CSK. Also, we compare the proposed algorithm with the detection network YOLOv3 adopted in the long-term processing module, so as to further demonstrate the efficiency of the proposed tracker. Note that manual initialization is required when performing the above tracking algorithms. This means that we need to manually draw a bounding box that tightly contains the object. In contrast, our method does not require this manual initialization thanks to the proposed automatic initialization mechanism. The experimental results are shown in Fig. 8. Firstly, we can see that RMCTer exhibits much better robustness and accuracy compared with other trackers. In particular, the proposed algorithm tracks the target accurately in all frames, and successfully deals with object disappearance (7th frame in dataset_1 and 5th frame in dataset_2), poor illumination (dataset_2) and occlusion (3rd frame in dataset_1 and 8th frame in dataset_2). On the other hand, KCF, CSK, and DSST may fail to track the object and cannot be restored due to these factors. Secondly, compared with other object detection algorithms, i.e., YOLOv3, the RMCTer are computationally more efficient with comparable accuracy. To be specific, the average speed of the RMCTer achieves approximately 60/63 FPS in dataset_1/dataset_2, while YOLOv3 achieves only 10/12 FPS under the same parameter configuration (both are implemented by Python). In addition, from Fig. 8(b), we can also see that YOLOv3 is a multi-target detection network, and therefore cannot distinguish the specific object from the candidates belonging to the same category.

    Figure  7.  Tracking results for the proposed tracking algorithm in representative videos/frames on the public datasets. The illustrative examples are from sequences $ Bolt $, $ Car4 $, $ Woman $, $ Human6 $, $ Couple $, $ Jump $, and $ Human2 $.
    Figure  8.  Single pedestrian tracking in real environment. In dataset_1, the tracking object is the right pedestrian of the first frame; in dataset 2, the tracking target is the left pedestrian of the first frame.

    In order to evaluate the accuracy of relative position estimation introduced in Section IV-A, we conduct two indoor experiments, where the object is fixed and we manually control the quadrotor moving to change the relative position between the object and the quadrotor in real time. Meanwhile, an OptiTrack motion capture system in Fig. 9 is used to record the three-dimensional positions of the object and the quadrotor to obtain the groundtruth of relative position. The experimental results are shown in Figs. 10 and 11. In particular, Fig. 10 shows the comparison between the estimated and groundtruth of the relative position between the nano-scale quadrotor and the concentric circles, and Fig. 11 shows the comparison between the quadrotor and the pedestrian. It can be seen that the estimated results achieve good accuracy. In particular, corresponding to Fig. 10, the estimation errors are respectively $ 2.52 $ cm, $ 1.17 $ cm, and $ 2.17 $ cm for X-axis, Y-axis, and Z-axis. In addition, for Fig. 11, the errors are respectively $ 2.74 $ cm, $ 4.49 $ cm, and $ 2.53 $ cm.

    Figure  9.  The OptiTrack motion capture system.
    Figure  10.  Relative position estimation between the nano-scale quadrotor and the concentric circles.
    Figure  11.  Relative position estimation between the nano-scale quadrotor and the pedestrian.

    In order to verify the proposed tracking controller (Section IV-B), adaptive hovering control experiment is performed. The control objective is to let the nano-scale quadrotor hovering at the original point even under a periodic disturbance. The desired horizontal position is set to be $ x = 0 $ m, $ y = 0 $ m, and the desired altitude is set to be $ {\textit{z}} = 0.55 $ m. The external disturbance is applied in the vertical direction, and its amplitude and the frequency are 0.1 N and 1 Hz, respectively. Therefore, we mainly observe the altitude change of the nano-scale UAV in this experiment. For purposes of comparison, triple cascaded PID controllers consisting of an outer-loop position subsystem, and inner-loop attitude and angular rate subsystems are also implemented on this nano-scale quadrotor platform. The OptiTrack system is used to record the altitude value of the quadrotor. Fig. 12 illustrates the comparison of altitude variation between the proposed controller and the cascaded PID controllers. It can be seen that both algorithms achieve hovering with bounded tracking errors. However, the hovering error with the proposed adaptive mechanism is much smaller than the one with cascaded PID mechanism. In particular, the average error decreases from 4.24 cm to 1.69 cm, and the maximum error decreases from 9.44 cm to 5.05 cm. The result indicates that the proposed adaptive control design exhibits good robustness to the disturbances and achieves better control performance than the PID controller does.

    Figure  12.  Real-world hovering experiment in the presence of periodic disturbance.

    To demonstrate the whole proposed tracking system, real-world object tracking flight tests are performed. In the experiment, the tracking object moves randomly in three-dimensional space, and the disturbance is the same as the one for the hovering control test (Section V-C). The three-dimensional positions of the object and the quadrotor are recorded by the OptiTrack system. Figs. 13 and 14 show the tracking performance of concentric circle tracking and single pedestrian tracking, respectively. It can be seen that the trajectories of the nano-scale quadrotor are consistent with those of different objects. In particular, in Fig. 13, the average tracking errors are 6.78 cm, 8.65 cm, and 5.19 cm, respectively for the $ X $, $ Y $, and $ Z $ axes, and in Fig. 14, the average tracking errors are 4.84 cm, 9.48 cm, and 7.06 cm. The results indicate that the proposed visual object tracking algorithm and position-based visual servoing control algorithm are effective, and can be implemented on the developed nano-scale quadrotor platform to achieve stable, accurate and real-time object tracking in the presence of disturbances.

    Figure  13.  Real-world flight experiment of concentric circle tracking in the presence of disturbance.
    Figure  14.  Real-world flight experiment of single pedestrian tracking in the presence of disturbance.

    In this paper, we propose a monocular vision-based object tracking and servoing control system using a tailor-made nano-scale UAV platform. In order to guarantee tracking robustness and accuracy, we propose a novel visual object tracking algorithm, i.e., RMCTer, where a powerful short-term tracking module and an efficient long-term processing module are tightly integrated. Moreover, an adaptive PBVS control method is proposed to achieve stable and accurate object tracking in the presence of internal and external disturbances. The experimental results demonstrate that each component of the proposed system is effective, and that the whole tracking system has high accuracy, strong stability, and low latency. An interesting direction for further work is the realization of a lightweight and robust state estimation algorithm (i.e., visual-inertial odometry or simultaneous localization and mapping) on the nano-scale UAV platform, such that stable object tracking can be obtained in more complex and more challenging environment.

  • 1 https://www.dji.com/cn
    2 https://wiki.bitcraze.io/
    3 In Algorithm 1, we test the sub profiles.
    4 http://www.optitrack.com/
  • [1]
    K. Fregene, “Unmanned aerial vehicles and control: Lockheed martin advanced technology laboratories,” IEEE Control Syst. Mag., vol. 32, no. 5, pp. 32–34, Oct. 2012. doi: 10.1109/MCS.2012.2205474
    [2]
    X. Zhang, B. Xian, B. Zhao, and Y. Zhang, “Autonomous flight control of a nano quadrotor helicopter in a GPS-denied environment using on-board vision,” IEEE Trans. Ind. Electron., vol. 62, no. 10, pp. 6392–6403, Oct. 2015. doi: 10.1109/TIE.2015.2420036
    [3]
    G. W. Cai, J. Dias, and L. Seneviratne, “A survey of small-scale unmanned aerial vehicles: Recent advances and future development trend,” Unmanned Syst., vol. 2, no. 2, pp. 175–199, Apr. 2014. doi: 10.1142/S2301385014300017
    [4]
    V. Srisamosorn, N. Kuwahara, A. Yamashita, T. Ogata, and J. Ota, “Human-tracking system using quadrotors and multiple environmental cameras for face-tracking application,” Int. J. Adv. Robot. Syst., vol. 14, no. 5, pp. 1–18, Oct. 2017.
    [5]
    A. Briod, J. C. Zufferey, and D. Floreano, “Optic-flow based control of a 46 g quadrotor,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Tokyo, Japan, 2013, pp. 149–158.
    [6]
    D. Palossi, J. Singh, M. Magno, and L. Benini, “Target following on nano-scale unmanned aerial vehicles,” in Proc. 7th IEEE Int. Workshop on Advances in Sensors and Interfaces, Vieste, Italy, 2017, pp. 170–175.
    [7]
    D. Floreano and R. J. Wood, “Science, technology and the future of small autonomous drones,” Nature, vol. 521, no. 7553, pp. 460–466, May 2015. doi: 10.1038/nature14542
    [8]
    P. Serra, R. Cunha, T. Hamel, D. Cabecinhas, and C. Silvestre, “Landing of a quadrotor on a moving target using dynamic image-based visual servo control,” IEEE Trans. Robot., vol. 32, no. 6, pp. 1524–1535, Dec. 2016. doi: 10.1109/TRO.2016.2604495
    [9]
    H. Cheng, L. S. Lin, Z. Q. Zheng, Y. W. Guan, and Z. C. Liu, “An autonomous vision-based target tracking system for rotorcraft unmanned aerial vehicles,” in Proc. IEEE/RSJ Int. Conf. Intelligent Robots and Systems, Vancouver, Canada, 2017, pp. 1732–1738.
    [10]
    A. Rodriguez-Ramos, A. Alvarez-Fernandez, H. Bavle, P. Campoy, and J. P. How, “Vision-based multirotor following using synthetic learning techniques,” Sensors, vol. 19, no. 21, pp. 4794, Nov. 2019. doi: 10.3390/s19214794
    [11]
    Y. J. Yin, X. G. Wang, D. Xu, F. F. Liu, Y. L. Wang, and W. Q. Wu, “Robust visual detection-learning-tracking framework for autonomous aerial refueling of UAVs,” IEEE Trans. Instrum. Meas., vol. 65, no. 3, pp. 510–521, Mar. 2016. doi: 10.1109/TIM.2015.2509318
    [12]
    P. Campoy, J. F. Correa, I. Mondragón, I. Mondragón, C. Martínez, M. Olivares, L. Mejías, and J. Artieda, “Computer vision onboard UAVs for civilian tasks,” J. Intell. Robot. Syst., vol. 54, no. 1–3, pp. 105–134, 2009. doi: 10.1007/s10846-008-9256-z
    [13]
    J. Pestana, J. L. Sanchez-Lopez, S. Saripalli, and P. Campoy, “Computer vision based general object following for GPS-denied multirotor unmanned vehicles,” in Proc. American Control Conf., Portland, USA, 2014, pp. 1886–1891.
    [14]
    R. Li, M. Pang, C. Zhao, G. Y. Zhou, and L. Fang, “Monocular long-term target following on UAVs,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops, Las Vegas, USA, 2016, pp. 29–37.
    [15]
    Z. Kalal, K. Mikolajczyk, and J. Matas, “Tracking-learning-detection,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 7, pp. 1409–1422, Jul. 2012. doi: 10.1109/TPAMI.2011.239
    [16]
    C. L. Zitnick and P. Dollár, “Edge boxes: Locating object proposals from edges,” in Proc. 13th European Conf. Computer Vision, Zurich, Switzerland, 2014, pp. 391–405.
    [17]
    B. Babenko, M. H. Yang, and S. Belongie, “Robust object tracking with online multiple instance learning,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 8, pp. 1619–1632, Aug. 2011. doi: 10.1109/TPAMI.2010.226
    [18]
    S. Hare, S. Golodetz, A. Saffari, V. Vineet, M. M. Cheng, S. L. Hicks, and P. H. S. Torr, “Struck: Structured output tracking with kernels,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 10, pp. 2096–2109, Oct. 2016. doi: 10.1109/TPAMI.2015.2509974
    [19]
    D. S. Bolme, J. R. Beveridge, B. A. Draper, and Y. M. Lui, “Visual object tracking using adaptive correlation filters,” in Proc. IEEE Computer Society Conf. Computer Vision and Pattern Recognition, San Francisco, USA, 2010, pp. 2544–2550.
    [20]
    J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “High-speed tracking with kernelized correlation filters,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 3, pp. 583–596, Mar. 2015. doi: 10.1109/TPAMI.2014.2345390
    [21]
    J. F. Henriques, R. Caseiro, P. Martins, and J. Batista, “Exploiting the circulant structure of tracking-by-detection with kernels,” in Proc. 12th European Conf. Computer Vision, Florence, Italy, 2012, pp. 702–715.
    [22]
    M. Danelljan, G. Häger, F. S. Khan, and M. Felsberg, “Discriminative scale space tracking,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 8, pp. 1561–1575, Aug. 2017. doi: 10.1109/TPAMI.2016.2609928
    [23]
    J. Redmon and A. Farhadi, “YOLOv3: An incremental improvement,” arXiv preprint arXiv: 1804.02767, 2018.
    [24]
    K. M. He, G. Gkioxari, P. Dollár, and R. Girshick, “Mask R-CNN,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 42, no. 2, pp. 386–397, Feb. 2020. doi: 10.1109/TPAMI.2018.2844175
    [25]
    D. L. Zheng, H. S. Wang, W. D. Chen, and Y. Wang, “Planning and tracking in image space for image-based visual servoing of a quadrotor,” IEEE Trans. Ind. Electron., vol. 65, no. 4, pp. 3376–3385, Apr. 2018. doi: 10.1109/TIE.2017.2752124
    [26]
    F. Chaumette and S. Hutchinson, “Visual servo control. I. Basic approaches,” IEEE Robot. &Automat. Mag., vol. 13, no. 4, pp. 82–90, Dec. 2006.
    [27]
    N. Guenard, T. Hamel, and R. Mahony, “A practical visual servo control for an unmanned aerial vehicle,” IEEE Trans. Robot., vol. 24, no. 2, pp. 331–340, Apr. 2008. doi: 10.1109/TRO.2008.916666
    [28]
    M. G. Popova and H. H. Liu, “Position-based visual servoing for target tracking by a quadrotor UAV,” in Proc. AIAA Guidance, Navigation, and Control Conf., San Diego, USA, 2016, pp. 2092–2103.
    [29]
    W. B. Zhao, H. Liu, F. L. Lewis, K. P. Valavanis, and X. L. Wang, “Robust visual servoing control for ground target tracking of quadrotors,” IEEE Trans. Control Syst. Technol., vol. 28, no. 5, pp. 1980–1987, Sept. 2020. doi: 10.1109/TCST.2019.2922159
    [30]
    F. Rinaldi, S. Chiesa, and F. Quagliotti, “Linear quadratic control for quadrotors UAVs dynamics and formation flight,” J. Intell. &Robot. Syst., vol. 70, no. 1–4, pp. 203–220, Apr. 2013.
    [31]
    D. Mellinger, M. Shomin, and V. Kumar, “Control of quadrotors for robust perching and landing,” in Proc. Int. Powered Lift Conf., Philadelphia, USA, 2010, pp. 205–225.
    [32]
    Y. Z. Liu and Z. Y. Meng, “Visual object tracking for a nano-scale quadrotor,” in Proc. 15th Int. Conf. Control, Automation, Robotics and Vision, Singapore, 2018, pp. 843–847.
    [33]
    T. Yang, N. Sun, H. Chen, and Y. C. Fang, “Neural network-based adaptive antiswing control of an underactuated ship-mounted crane with roll motions and input dead zones,” IEEE Trans. Neural Netw. Learn. Syst., vol. 31, no. 3, pp. 901–914, Mar. 2020. doi: 10.1109/TNNLS.2019.2910580
    [34]
    N. Sun, D. K. Liang, Y. M. Wu, Y. H. Chen, Y. D. Qin, and Y. C. Fang, “Adaptive control for pneumatic artificial muscle systems with parametric uncertainties and unidirectional input constraints,” IEEE Trans. Ind. Informatics, vol. 16, no. 2, pp. 969–979, Feb. 2020. doi: 10.1109/TII.2019.2923715
    [35]
    B. Zhao, B. Xian, Y. Zhang, and X. Zhang, “Nonlinear robust adaptive tracking control of a quadrotor UAV via immersion and invariance methodology,” IEEE Trans. Ind. Electron., vol. 62, no. 5, pp. 2891–2902, May 2015. doi: 10.1109/TIE.2014.2364982
    [36]
    Y. Zou and W. Huo, “Adaptive tracking control for a model helicopter with disturbances,” in Proc. American Control Conf, Chicago, USA, 2015, pp. 3824–3829.
    [37]
    P. Marantos, C. P. Bechlioulis, and K. J. Kyriakopoulos, “Robust trajectory tracking control for small-scale unmanned helicopters with model uncertainties,” IEEE Trans. Control Syst. Technol., vol. 25, no. 6, pp. 2010–2021, Nov. 2017. doi: 10.1109/TCST.2016.2642160
    [38]
    M. W. Mueller, M. Hamer, and R. D’Andrea, “Fusing ultra-wideband range measurements with accelerometers and rate gyroscopes for quadrocopter state estimation,” in Proc. IEEE Int. Conf. Robotics & Automation, Seattle, USA, 2015, pp. 1730–1736.
    [39]
    M. Greiff, “Modelling and control of the crazyflie quadrotor for aggressive and autonomous flight by optical flow driven state estimation,” M.S. thesis, Lund University, Sweden, 2017.
    [40]
    Y. J. Li, “The design and implementation of four rotor UAV fixed landing system based on machine vision,” M.S. thesis, South China University of Technology, Guangzhou, China, 2015.
    [41]
    R. Mahony, V. Kumar, and P. Corke, “Multirotor aerial vehicles: Modeling, estimation, and control of quadrotor,” IEEE Robot. &Autom. Mag., vol. 19, no. 3, pp. 20–32, Sept. 2012.
    [42]
    Y. Zou, “Trajectory tracking controller for quadrotors without velocity and angular velocity measurements,” IET Control Theory &Appl., vol. 11, no. 1, pp. 101–109, Jan. 2017.
    [43]
    M. Huang, B. Xian, C. Diao, K. Y. Yang, and Y. Feng, “Adaptive tracking control of underactuated quadrotor unmanned aerial vehicles via backstepping,” in Proc. American Control Conf., Baltimore, USA, 2010, pp. 2076–2081.
    [44]
    B. Zhu and W. Huo, “Robust nonlinear control for a model-scaled helicopter with parameter uncertainties,” Nonlinear Dyn., vol. 73, no. 1–2, pp. 1139–1154, Jul. 2013. doi: 10.1007/s11071-013-0858-z
    [45]
    M. Krstić, I. Kanellakopoulos, and P. Kokotović, Nonlinear and Adaptive Control Design. New York, UK: Wiley, 1995.
    [46]
    A. Levant, “Higher-order sliding modes, differentiation and output-feedback control,” Int. J. Control, vol. 76, no. 9–10, pp. 924–941, 2003. doi: 10.1080/0020717031000099029
    [47]
    Y. Wu, J. Lim, and M. H. Yang, “Object tracking benchmark,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 37, no. 9, pp. 1834–1848, Sept. 2015. doi: 10.1109/TPAMI.2014.2388226
  • Related Articles

    [1]Hongxing Xiong, Guangdeng Chen, Hongru Ren, Hongyi Li. Broad-Learning-System-Based Model-Free Adaptive Predictive Control for Nonlinear MASs Under DoS Attacks[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(2): 381-393. doi: 10.1109/JAS.2024.124929
    [2]Wei Qian, Yanmin Wu, Bo Shen. Novel Adaptive Memory Event-Triggered-Based Fuzzy Robust Control for Nonlinear Networked Systems via the Differential Evolution Algorithm[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(8): 1836-1848. doi: 10.1109/JAS.2024.124419
    [3]Yunfeng Hu, Chong Zhang, Bo Wang, Jing Zhao, Xun Gong, Jinwu Gao, Hong Chen. Noise-Tolerant ZNN-Based Data-Driven Iterative Learning Control for Discrete Nonaffine Nonlinear MIMO Repetitive Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 344-361. doi: 10.1109/JAS.2023.123603
    [4]Qinchen Yang, Fukai Zhang, Cong Wang. Deterministic Learning-Based Neural PID Control for Nonlinear Robotic Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(5): 1227-1238. doi: 10.1109/JAS.2024.124224
    [5]Wenqi Ren, Yang Tang, Qiyu Sun, Chaoqiang Zhao, Qing-Long Han. Visual Semantic Segmentation Based on Few/Zero-Shot Learning: An Overview[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(5): 1106-1126. doi: 10.1109/JAS.2023.123207
    [6]Qian Ma, Peng Jin, Frank L. Lewis. Guaranteed Cost Attitude Tracking Control for Uncertain Quadrotor Unmanned Aerial Vehicle Under Safety Constraints[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1447-1457. doi: 10.1109/JAS.2024.124317
    [7]Jin-Xi Zhang, Kai-Di Xu, Qing-Guo Wang. Prescribed Performance Tracking Control of Time-Delay Nonlinear Systems With Output Constraints[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(7): 1557-1565. doi: 10.1109/JAS.2023.123831
    [8]Qifan Yang, Huiping Li. RMPC-Based Visual Servoing for Trajectory Tracking of Quadrotor UAVs With Visibility Constraints[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(9): 2027-2029. doi: 10.1109/JAS.2024.124533
    [9]Tao Wang, Qiming Chen, Xun Lang, Lei Xie, Peng Li, Hongye Su. Detection of Oscillations in Process Control Loops From Visual Image Space Using Deep Convolutional Networks[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 982-995. doi: 10.1109/JAS.2023.124170
    [10]Yuhan Zhang, Zidong Wang, Lei Zou, Yun Chen, Guoping Lu. Ultimately Bounded Output Feedback Control for Networked Nonlinear Systems With Unreliable Communication Channel: A Buffer-Aided Strategy[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(7): 1566-1578. doi: 10.1109/JAS.2024.124314
    [11]Haoxiang Ma, Mou Chen, Qingxian Wu. Disturbance Observer-Based Safe Tracking Control for Unmanned Helicopters With Partial State Constraints and Disturbances[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2056-2069. doi: 10.1109/JAS.2022.105938
    [12]Wei Chen, Qinglei Hu. Sliding-Mode-Based Attitude Tracking Control of Spacecraft Under Reaction Wheel Uncertainties[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(6): 1475-1487. doi: 10.1109/JAS.2022.105665
    [13]Lingzhi Wang, Guo Xie, Fucai Qian, Jun Liu, Kun Zhang. A Novel PDF Shape Control Approach for Nonlinear Stochastic Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(8): 1490-1498. doi: 10.1109/JAS.2022.105755
    [14]Defeng He. Dual-mode Nonlinear MPC via Terminal Control Laws With Free-parameters[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(3): 526-533. doi: 10.1109/JAS.2016.7510013
    [15]Zaiyu Chen, Minghui Yin, Lianjun Zhou, Yaping Xia, Jiankun Liu, Yun Zou. Variable Parameter Nonlinear Control for Maximum Power Point Tracking Considering Mitigation of Drive-train Load[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(2): 252-259. doi: 10.1109/JAS.2017.7510520
    [16]Yanlong Zhou, Mou Chen, Changsheng Jiang. Robust Tracking Control of Uncertain MIMO Nonlinear Systems with Application to UAVs[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(1): 25-32.
    [17]Wei Zheng, Fan Zhou, Zengfu Wang. Robust and Accurate Monocular Visual Navigation Combining IMU for a Quadrotor[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(1): 33-44.
    [18]Lijiao Wang, Bin Meng. Distributed Force/Position Consensus Tracking of Networked Robotic Manipulators[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(2): 180-186.
    [19]Qiming Zhao, Hao Xu, Sarangapani Jagannathan. Near Optimal Output Feedback Control of Nonlinear Discrete-time Systems Based on Reinforcement Neural Network Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 372-384.
    [20]Jing Na, Guido Herrmann. Online Adaptive Approximate Optimal Tracking Control with Simplified Dual Approximation Structure for Continuous-time Unknown Nonlinear Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 412-422.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(14)  / Tables(1)

    Article Metrics

    Article views (2364) PDF downloads(125) Cited by()

    Highlights

    • This paper proposes a complete visual object tracking and servoing control system using a tailor-made 38 g nano-scale quadrotor platform. This tracking system is composed of a versatile and robust visual object tracking module, and an efficient PBVS control module. Due to the limited payload, a lightweight monocular visual module is integrated to equip the quadrotor with the capability of object tracking. Additionally, we present a micro positioning deck to provide more stable and accurate pose estimation for the quadrotor.
    • This paper proposes a novel object tracking algorithm, i.e., RMCTer, where a two-stage short-term tracking module and an efficient long-term processing module are tightly integrated to collaboratively process the input frames. Compared with the tracking algorithms such as STUCK, DSST and KCF, the proposed tracker is more applicable in the presence of the variations of object appearance and can effectively compensate the visual tracking errors thanks to the adequate model modification provided by the long-term processing module.
    • This paper proposes an adaptive PBVS control algorithm by leveraging the backstepping and adaption techniques. The proposed controller is robust against the uncertain model parameters and the existence of external disturbances, and their exact model information is not needed in the design of the controller.

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return