IEEE/CAA Journal of Automatica Sinica  2014, Vol.1 Issue (4): 435-448   PDF    
Reinforcement Learning Based Controller Synthesis for Flexible Aircraft Wings
Manoj Kumar, Karthikeyan Rajagopal, Sivasubramanya Nadar Balakrishnan, Nhan T. Nguyen    
1. Missouri University of Science & Technology, Rolla, Missouri 65409, USA;
2. NASA Ames Research Center, Moffet Field, CA 94035, USA
Abstract: Aeroelastic study of flight vehicles has been a subject of great interest and research in the last several years. Aileron reversal and flutter related problems are due in part to the elasticity of a typical airplane. Structural dynamics of an aircraft wing due to its aeroelastic nature are characterized by partial differential equations. Controller design for these systems is very complex as compared to lumped parameter systems defined by ordinary differential equations. In this paper, a stabilizing statefeedback controller design approach is presented for the heave dynamics of a wing-fuselage model. In this study, a continuous actuator in the spatial domain is assumed. A control methodology is developed by combining the technique of "proper orthogonal decomposition" and approximate dynamic programming. The proper orthogonal decomposition technique is used to obtain a low-order nonlinear lumped parameter model of the infinite dimensional system. Then a near optimal controller is designed using the single-network-adaptive-critic technique. Furthermore, to add robustness to the nominal single-network-adaptive-critic controller against matched uncertainties, an identifier based adaptive controller is proposed. Simulation results demonstrate the effectiveness of the single-network-adaptive-critic controller augmented with adaptive controller for infinite dimensional systems.
Key words: Single-network-adaptive-critic     proper orthogonal decomposition     partial differential equation     adaptive critic     uncertainty     adaptive control     flexible wing    
Ⅰ. INTRODUCTION

MANY real-world engineering problems are distributed in nature and can be described by a set of partial differential equations (PDEs). A lumped parameter approximation is often adequate in many cases such as in the study of stability and control of an airplane,rockets,and missiles,and design of cooling systems etc. There is a broad class of problems,however, (e.g.,fluid flow,aeroelastic structure,etc.) for which one must take the spatial distribution into account. These systems are also characterized as distributed parameter systems (DPS). Analysis and controller design for such systems are often far more complex than for the lumped parameter systems (which are governed by a finite number of ordinary differential equations).

Control of distributed parameter systems has been studied in the past both from mathematical as well as engineering points of view. An interesting but brief historical perspective of the control of such systems can be found in [1]. In a broad sense,control design techniques for distributed parameter system can be classified as either "approximate-then-design (ATD)" or "deign-then-approximate (DTA)" approaches. An interested reader can refer to [2] for discussion on the relative merits and limitations of the two approaches. In the DTA approach,the usual procedure is to use infinite-dimensional operator theory to design the control in the infinite-dimensional space[3]. While there are many advantages; these approaches are mainly confined to the linear systems and have difficulties in control implementation through an infinite dimensional operator. To overcome this difficulty in an interesting way,the study in [4] presents two techniques of control design based on the DTA approach. One combines the method of dynamic inversion with variational optimization theory which is applicable when there is a continuous actuator in the spatial domain. The other technique,which can be applied when there are a number of discrete actuators located at distinct places in the spatial domain,combines dynamic inversion with static optimization theory.

In the ATD approach,the idea is to first design a low-dimensional reduced model which retains the dominant modes of the system. A popular DPS analysis and synthesis technique is to use orthogonal basis functions in a Galerkin procedure to first create an approximate finite-dimensional system of ordinary differential equations (ODEs). This lumped parameter model is then used for controller design. If arbitrary basis functions (e.g.,Fourier and Chebyshev polynomial) are used in the Galerkin procedure,they can result in a high-dimensional ODE system. A better and powerful basis function design is obtained when the proper orthogonal decomposition (POD) technique is used with a Galerkin approximation. In the POD technique,a set of problem oriented basis functions is first obtained by generating a set of snap-shot solutions through simulations or from the actual process. Using these orthogonal basis functions in a Galerkin procedure,a low-dimensional ODE system can be developed. The same idea is used in this study to obtain a lumped parameter model. Interested readers can further refer to [5, 6, 7, 8].

After obtaining a lumped parameter model,the issue of optimal control synthesis should be addressed next. It is well known that the dynamic programming formulation offers the most comprehensive solution approach to nonlinear optimal control in a state feedback form[9, 10]. However,solving the associated Hamilton-Jacobi-Bellman (HJB) equation demands a large number of computations and storage space dedicated to this purpose. Werbos[11] proposed a means to get around this numerical complexity by using "approximate dynamic programming" (ADP) formulations. This ADP process,through the nonlinear function approximation capabilities of neural networks,overcomes the computational complexity that plagued the dynamic programming formulation of optimal control problems. More importantly,this solution can be implemented online,since the control computation requires a few multiplications of the network weights which are trained offline. This technique was used in [12] to solve an aircraft control problem. The solution to the ADP formulation is obtained through a dual neural network approach called "Adaptive critics (AC)". In one version of the AC approach,called the heuristic dynamic programming (HDP),one network,namely the action network represents the mapping between the state and control variables while a second network,called the critic network, represents the mapping between the state and the cost function to be minimized. In the dual heuristic programming (DHP) class with ACs, while the action network remains the same as the HDP,the critic network outputs the co-states with the current states as inputs. A single-network-adaptive-critic (SNAC)[13] was developed to eliminate the need for the second network. Advantages of SNAC include considerable decrease in the offline training effort and simplicity of the online implementation that results in less computational resources and storage. In [14],SNAC was implemented on a distributed parameter model for beaver population control. Another application was presented in [15],in which a robust optimal controller was designed to obtain a desired temperature profile of a re-entry vehicle. A heat application problem was solved in [16].

In this paper,the POD technique and approximate dynamic programming are combined to develop a generic control design approach for a class of nonlinear DPS. System with continuous control actuation will be studied in terms of performance analysis. This approach is applied to control the heave dynamics of an aircrafts flexible wing model. This application comes under the field of aeroelastic studies,which deals with the interaction of structural,inertial and aerodynamic forces. Current model is inspired from a "beam-mass-beam (BMB)" system[17- 18],where two Euler-Bernoulli beams connected to rigid mass represent a flexible wing with fuselage. A schematic of BMB system is shown in Fig. 1.

Download:
Fig. 1. Aircraft flexible wing model (BMB system).

In real world applications,the physical parameters used in the PDE models are often unknown. Online adaptive controllers can be used to estimate these parametric uncertainties and mitigate the loss in performance. Although the theory of adaptive controllers for finite dimensional systems is well developed[19, 20],for PDEs adaptive controllers exist only for a few classes of problems. Most of the available adaptive controls schemes are for parabolic type PDEs. See [21, 22, 23] for some of the adaptive control techniques proposed for linear parabolic systems. In general,the techniques for adaptive control of PDEs can be classified as reference model based or identifier based. The objective of model reference adaptive control techniques is to determine a feedback control law,which directs the state to follow a reference model asymptotically. On the other hand, identifier based techniques estimate the unknown system parameters using an approximate model of the system[24, 25],and use them in the design of adaptive controllers. In this paper,an identifier based adaptive control technique is proposed which can handle matched parametric uncertainties. The structure of the PDEs governing the dynamics of the flexible aircraft wings lends itself to use a parabolic type passive identifier for estimating the uncertainties. The proposed identifier assumes that all system states are measured. Simulations are carried out by assuming parametric uncertainties which can make the inherently stable BMB system unstable.

Rest of the paper is organized as follows: Section II describes the class of DPS problems and its control objective. Section III discusses the process of obtaining a finite-dimensional approximate model using the POD technique. Section IV briefly describes the optimal control synthesis using SNAC. Section V discusses the problem of aircraft flexible wing model. In Section VI,the theory behind augmenting the SNAC controller with an adaptive controller is discussed. Simulation results are presented in Section VII. Conclusions are drawn in Section VIII.

Ⅱ. PROBLEM DESCRIPTION

The class of PDEs considered in this paper describing second-order system dynamics with appropriate boundary conditions is given by

˙x1=f1(x,x1,x2,),˙x2=f2(x,x1,x2,)+g(x,x1,x2,)u,
(1)
where the state x=[x1(t,y),x2(t,y)]T and control u(t,y) are continuous functions of time t0 and the spatial variable y[0,l] with lR+ as the length. In the set of (1),˙xi represents partial derivative with respect to as xit and xi represents spatial partial derivatives as xi=[ix1yi(t,y),ix2yi(t,y)]T,
here f1 and f2 are nonlinear functions of the state and spatial partial derivatives of the state. Control u(t,y) is considered to be a scalar function. Note that the system dynamics is described in a control affine form. Furthermore,the function g(x,x1,x2,) is assumed to be bounded away from zero,i.e.,g(x,x1,x2,)0,  t,y. In this paper, situations where the control action enters the system dynamics through the boundary actions (i.e.,boundary control problems) are not considered.

The objective is to find the optimal control u(t,y) which minimizes the quadratic cost function,defined as

J=12tftoyfyo(xTqx+ru2)dydt,
(2)
where q is the weighting matrix on the state variables,and r is the weighting factor on the control variable. to and tf are initial and final time where yo and yf are initial and final points on the spatial coordinate axis.

Ⅲ. LUMPED PARAMETER APPROXIMATION

This section discusses the development of a low order finite dimensional model for control synthesis.

A. Generating Snapshot Solutions

Snapshot solution generation is one of the important steps in generating problem oriented basis functions. A low-order lumped parameter approximate model is obtained using these basis functions by applying the Galerkin procedure. The first step is to generate the snapshot solutions. These are representative solutions of the states at arbitrary instants of time that one may possibly encounter when the system operates. Intuition must be used to determine at what time the snapshots must be taken in order to obtain linearly independent basis function[16].

B. Proper Orthogonal Decomposition (POD): a Brief Review

POD is a technique for determining an optimal set of basis functions which results in a low order finite dimensional model for an infinite dimensional system when used in a Galerkin procedure. The set of snapshot solutions obtained by the method described in Section III-A is used to generate the basis functions.

Let {xi(y):1iN,y[0,l]} be a set of N snapshot solutions of a system over the domain [0,l] at arbitrary instants of time. The goal of the POD technique is to design a coherent structure that has the largest mean square projection on the snapshots,i.e.,to find a set of basis function φ such that I is maximized.

I=1NNi=1xi,φ2φ,φ
(3)
As a standard notation,the L2 inner product is defined as φ,Ψ=φΨdy over the domain [0,l]. It has been shown[26] that when the number of degree of freedom required to describe xi is larger than the number of snapshots N,it is sufficient to express the basis functions as linear combination of snapshots,
φ=Ni=1pixi,
(4)
here the coefficients pi are to be determined such that φ maximizes I. The steps involved for this are as follows:

1) Construct an eigenvalue problem

CP=λP,
(5)
where C=[cij]N×N,cij=1Nxi(y)xj(y)dy.

2) Obtain N eigenvalues and corresponding eigenvectors of C. Note that C is symmetric and hence the eigenvalues are real. Moreover,it can be shown [27] that it is a positive semi-definite matrix as well. So,all of the eigenvalues are non-negative.

3) Arrange eigenvalues [ˉλ1,ˉλ2,,ˉλN] of matrix C in descending order such that ˉλ1ˉλ2ˉλN0.

4) Find the corresponding eigenvectors,let P1=[p11,p12,,p1N]T,P2=[p21,p22,,p2N]T,PN=[pN1,pN2,,pNN]T.

5) Normalize the eigenvectors to satisfy Pl,Pl=(Pl)TPl=1Nλl.

This will ensure that the POD basis functions are orthonormal.

6) Truncate the eigen-spectrum such that ˜Nj=1ˉλjNj=1ˉλj.

Usually,it turns out that ˜NN.

7) Finally,construct the ˜N basis functions as

φj(y)=Ni=1pjixi(y),j=1,2,,˜N.
(6)

The method of snapshot generation and development for proper orthogonal decomposition based model reduction,as applicable to the present problem,are covered in detail in [28, 29]. For detailed discussion of this procedure,an interested reader can refer to [8, 27, 30]. An important property of the POD technique is that basis functions capture the greatest possible energy on an average and therefore,the POD technique leads to an optimally reduced system in general. For a proof of this claim along with some other important properties,the reader can refer to [8].

C. Lumped Parameter System

The reduction of the infinite-dimensional problem to a finite set of ordinary differential equations and the related cost function are explained in this subsection. After obtaining the basis functions,state x(t,y) and control u(t,y) are expanded as follows: x(t,y)˜Nj=1ˆxj(t)φj(y),

u(t,y)˜Nj=1ˆuj(t)φj(y),
here ˆxj(t) and ˆuj(t) (j=1,2,,˜N) are the auxiliary state and control variables of the lumped parameter model. One can notice that both x(t,y) and u(t,y) have the same basis functions. We assume that state feedback controller spans a subspace of the state variables and hence,the basis functions for the state are assumed to be capable of spanning the controller as well.

For our control problem,the basis functions for states x1 and x2 are designed separately. The state variables x1(t,y),and x2(t,y) can be written in terms of basis functions φ1j(y) and φ2j(y),j=1,2,,˜N respectively as

x1(t,y)=˜Nj=1ˆx1j(t)φ1j(y),x2(t,y)=˜Nj=1ˆx2j(t)φ2j(y).
(7)
Similarly,the expression for control is written as
u(t,y)=˜Nj=1ˆu1j(t)φ1j(y),+˜Nj=1ˆu2j(t)φ2j(y).
(8)

For convenience,we define ˆX1=[ˉx11,,ˉx1˜N]T,ˆX2=[ˉx21,,ˉx2˜N]T,ˆU1=[ˉu11,,ˉu1˜N]T,ˆU2=[ˉu21,, ˉu2˜N]T. Next,we discuss finite dimensional approximation of system using Galerkin procedure,please see what is more appropriate. Substituting expansions of state (7) and control (8) in (1),

˜Ni=1˙ˆx1i(t)φ1i(y)=f1([˜Ni=1ˆx1i(t)φ1i(y),˜Ni=1ˆx2i(t)φ2i(y)]T,[˜Ni=1ˆx1i(t)φ11i(y),˜Ni=1ˆx2i(t)φ12i(y)]T,),
(9)
and
˜Ni=1˙ˆx2i(t)φ2i(y)=f2([˜Ni=1ˆx1i(t)φ1i(y),˜Ni=1ˆx2i(t)φ2i(y)]T,)+
g([˜Ni=1ˆx1i(t)φ1i(y),˜Ni=1ˆx2i(t)φ2i(y)]T,)×(˜Nj=1ˆu1j(t)φ1i(y)+˜Nj=1ˆu2j(t)φ2j(y)).
(10)

Next,taking the Galerkin projection of (9) with φ1j(y) and of (10) with φ2j(y) and using the fact that these basis functions are orthonormal,it yields

˙ˆx1j=˙f1j(ˆX1,ˆX2),  j=1,2,,˜N,
(11)
and
˙ˆx2j=˙f2j(ˆX1,ˆX2)+ˆgj(ˆX1,ˆX2)[ˆU1,ˆU2]T,  j=1,2,,˜N,
(12)
where
ˆf1j(ˆX1,ˆX2)=f1(x,x1,x2,),φ1j=f1(x,x1,x2,)φ1jdy,
(13)
ˆf2j(ˆX1,ˆX2)=f2(x,x1,x2,),φ2j=f2(x,x1,x2,)φ2jdy,
(14)
ˆgj(ˆX1,ˆX2)[ˆU1,ˆU2]T=q(x,x1,x2,)u,φ2j=q(x,x1,x2,)uφ2jdy.
(15)

Next,in the cost function (2),the terms can be represented as

yfyo(xTqx)dy=yfyo([˜Ni=1ˆx1i(t)φ1i(y),˜Ni=1ˆx2i(t)φ2i(y)]×q[˜Ni=1ˆx1i(t)φ1i(y),˜Ni=1ˆx2i(t)φ2i(y)]T)dy,yfyo(xTqx)dy=[ˆXT1,ˆXT2]Q[ˆXT1,ˆXT2]T,
(16)
and
yfyo(ru2)dy=yfyo(r[˜Ni=1ˆu1i(t)φ1i(y)+˜Ni=1ˆu2i(t)φ2i(y)]2)dy,yfyo(ru2)dy=[ˆUT1,ˆUT2]R[ˆUT1,ˆUT2]T
(17)

Let q=[q11q12q21q22]2×2,Q=[Q11Q12Q21Q22]2˜N×2˜N,R=r[R11R12R21R22]2˜N×2˜N.

where R11(i,j)=φ1i(y)φ1j(y)dy,R12(i,j)=φ1i(y)φ2j(y)dy,R21(i,j)=φ2i(y)φ1j(y)dy,R22(i,j)=φ2i(y)φ2j(y)dy,Q11(i,j)=q11R11(i,j),Q12(i,j)=q12R12(i,j),Q21(i,j)=q21R21(i,j),Q22(i,j)=q22R22(i,j),
where i=1,2,,˜N and j=1,2,,˜N. Thus the cost function in (2) can be written as

J=12tfto([ˆXT1,ˆXT2]Q[ˆXT1,ˆXT2]T+[ˆUT1,ˆUT2]R[ˆUT1,ˆUT2]T)dt.
(18)

Ⅳ. OPTIMAL CONTROL FORMULATION

In this section,a fairly general discussion on optimal control of distributed parameter systems is presented in an ADP framework. Detailed derivations of these conditions can also be found in [11, 12],which are repeated here for the sake of completeness. Since the optimal controller is to be implemented online,the problem is formulated in discrete domain.

A. Problem Description and Optimality Conditions

Consider a system given by

ˆXk+1=Fk(ˆXk,ˆUk),
(19)
where ˆXk and ˆUk represents the n×1 state vector and m×1 control vector respectively at time step k. The objective is to find a control ˆuk that minimizes the scalar cost function given as
J=N1k=1Ψk(ˆXk,ˆUk),
(20)
where N represents the number of discrete time steps. Note that when N is large,(20) represents the cost function for an infinite horizon problem. Following (20),the cost function at time step k can be written as
Jk=N1ˉk=kΨˉk(ˆXˉk,ˆUˉk).
(21)
The cost function from k+1 to end can be written as
Jk=Ψk+Jk+1,
(22)
where Ψk represents any linear or nonlinear utility function. It is the cost to go from k to k+1. We now define the costate vector at time step k as
λk:=JkˆXk.
(23)

Then,the necessary condition for optimality for optimal control is

JkˆUk=0.
(24)

After some manipulations,the optimal control equation is obtained as

(ΨkˆUk)+(ˆXk+1ˆUk)Tλk+1=0.
(25)
The costate propagation equation is obtained as λk=JkˆXk=(ΨkˆXk)+(Jk+1ˆXk).

By using (23) in the equation above

λk=(ΨkˆXk)+(ˆXk+1ˆXk)Tλk+1.
(26)

B. Single-network-adaptive-critic (SNAC) Controller Synthesis

The adaptive-critic synthesis for optimal controller design of a lumped parameter system is discussed in detail in [12]. The core of the technique lies in the iterative training between the critic and action networks. The action network is to capture the relationship between the states and control at stage k and the critic network to capture the relationship between the states and the costates at stage k. By contrast,the SNAC captures the relationship between the state at k and the costate at k+1 and uses only one neural network. We use this approach to design optimal control for lumped parameter system.

In the present paper,the networks are split internally into 2ˆN sub-networks,assuming one network for each channel of the costate vector. The input to each sub-network,however,is the entire state vector. This is done to speed up the training process since cross coupling of weights for different components of the output vector are absent in such a framework. The first step of training procedure is state generation for neural network training. Note that the lumped parameter states can be computed from x(t,y) as ˆx1j(t)=x1(t,y),ϕ1j(y),

and ˆx2j(t)=x2(t,y),ϕ2j(y),
where j=1,2,,˜N. One can generate a state profile x(t,y) as described in Subsection III-A and then use it to construct the lumped parameter states,which can subsequently be used for training the networks. However,due to the requirement of large number of state profiles,this process would slow down the program significantly. Therefore,an alternative method of generating the lumped parameter state vector for training the network is followed. All generated snapshots are used to get the minimum and values for the individual elements of state vector. Training is first carried out by generating the state close to zero and then the training set is expanded including more data points. Let [ˆX1min,ˆX2min] and [ˆX1max,ˆX2max] denote the vectors of minimum and maximum values of the elements of [ˆX1,ˆX2]. The initial training set is obtained by setting a constant Ci[0,1],C1=0.05 and generating training points in the domain Si:=Ci[[ˆX1min,ˆX2min],[ˆX1max,ˆX2max]]. Once the network is trained in this set,Ci is changed as Ci=C1+0.05(i1) for i=2,3,, and the network is trained again. This process is repeated until Ci=1.

The steps for training (offline) the critic network which captures the relationships between state at time k and co-state at time k+1 are as follows:

1) Fix Ci and generate training domain Si;

2) For each element ˆXk of Si follow the steps below:

a) Input ˆXk to networks to get λk+1. Let us denote it as λak+1.

b) Calculate ˆUk,knowing ˆXk and λk+1, from optimal control (25);

c) Get ˆXk+1 from the state equation (19),using ˆXk and ˆUk.

d) Input ˆXk+1 to the networks to get λk+2.

e) Calculate λk+1 ,from the costate equation. Denote this as λtk+1.

3) Train the networks,with all ˆXk as inputs and corresponding λtk+1 as outputs.

4) If proper convergence is achieved,stop and revert to step 1, with Si+1. If not,go to Step 1) and retrain the network with a new Si.

For faster convergence,one can take a convex combination [βλtk+1+(1β)λak+1] as the target output for training,where 0<β1 is the learning rate for the neural network training. We set a tolerance value to check the convergence of network. If λtk+1λak+12λtk+12< tolerance then it can be said that the networks have converged. For the present problem,the value of learning rate and tolerance is selected as 0.8 and 107, respectively.

Download:
Fig. 2. Schematic of neural network synthesis.
C. Implementation of the Control Solution

After the offline controller synthesis,the following steps are used for online control computation. Fig. 3 shows the control solution implementation scheme. The offline and online operations are as follows:

Download:
Fig. 3. Implementation of control solution.

1) Offline operations

a) Train adaptive critic network controller for the system.

2) Online operations

a) Measure the state variables at time tk across the spatial dimension.

b) Compute the ˆXk using the basis functions

c) Using ˆXk in the neural network,compute the control ˆuk.

d) Get the desired control profile u(tk,y) from ˆUk,using the basis functions.

Ⅴ. A FLEXIBLE WING AIRCRAFT MODEL

A flexible wing aircraft model is represented by using two Euler-Bernoulli beams connected to a rigid mass as shown in Fig. 1. The BMB system primarily represents the heave dynamics of an aircraft model,which is initially assumed to be in a level flight with its wings straight and the lift force balancing the weight. Any perturbation in the wings shape (defined by transverse displacement w(t,y)) causes a change in the local angle-of-attack distribution over the wing and this in turn leads to perturbation in lift distribution. The objective is to achieve the level flight conditions (w(t,y)=0) using the proposed control action.

The basic dynamic model of BMB is taken from [17] in which a set of only two control actions are considered. In this study, however,the model is further generalized with a continuous form of control action over the left and right beams. First the dynamic model is presented and then the control design is developed. Discussion in Section III is used to obtain a finite dimension approximation and the SNAC described in Section IV is used to design the optimal control action. Transverse displacements for left and right beam are described by wL(t,y) and wR(t,y),respectively.

Dynamic equation for the left beam is given as

ρa2wL(t,y)t2+EI4wL(t,y)y4+γ1wL(t,y)t+γ2I5wL(t,y)ty4=ΔL(t,y)l2+bL(y)uL(t,y),
(27)
for 0yl2,t>0,and for the right beam as
ρa2wR(t,y)t2+EI4wR(t,y)y4+γ1wR(t,y)t+γ2I5wR(t,y)ty4=ΔL(t,y)l2+bR(y)uR(t,y),
(28)
for l2<yl,t>0,subject to a set of boundary conditions as EI2wL(t,0)y2+γ2I3wL(t,0)ty2=0,EI3wL(t,0)y3+γ2I4wL(t,0)ty3=0,EI2wR(t,l)y2+γ2I3wR(t,l)ty2=0,
EI3wR(t,l)y3+γ2I4wR(t,l)ty3=0,wL(t,l2)=wR(t,l2),wL(t,l2)y=wR(t,l2)y.
(29)
EI3wL(t,l2)y3+γ2I4wL(t,l2)ty3EI3wR(t,l2)y3γ2I4wR(t,l2)ty3=m2wL(t,l2)t2,EI2wL(t,l2)y2+γ2I3wL(t,l2)ty2EI2wR(t,l2)y2γ2I3wR(t,l2)ty2=IZ3wR(t,l2)t2y,
(30)
and the initial conditions
wL(0,y)=fL(y),wLt(0,y)=gL(y),
(31)
and
wR(0,y)=fR(y),wRt(0,y)=gR(y),
(32)
are such that these satisfy the boundary conditions in (29) and (30). Here,fL,fR,gL,gR are continuous bounded functions. E is the Youngs modulus,I is the area moment of inertia of the beam,IZ is the mass moment of inertia of the rigid mass,γ1 is the coefficient of viscous damping, γ2 is the coefficient of Kelvin-Voigt damping,and m is the mass of rigid connection between the beams,q is the density of the beam material. For simplicity,we only observe symmetrical profiles of the beam system in this paper. This is done to reduce the dimension of the system. Symmetry of left and right beams is considered in applying appropriate boundary conditions at the midpoint in the reduced order system. By considering only the left beam,new state variables are defined as
w1(t,y)=wL(t,y),w2(t,y)=wL(t,y)t,
(33)
and using these definitions,the system of equations are rewritten in the form of (1) as
w1(t,y)t=w2(t,y),w2(t,y)t=EIρa4w1(t,y)y4γ1ρaw2(t,y)γ2Iρa4w2(t,y)y4ΔL(t,y)lρa2+b(y)u(t,y)ρa.
(34)

Boundary conditions (29) remain the same and conditions at middle point are derived:

EI3w1(t,l2)y3+γ2I3w2(t,l2)y3=m2w2(t,l2)t,EI2w1(t,l2)y3+γ2I2w2(t,l2)y3=0.
(35)

The state variables w1(t,y),w2(t,y) and control variable u(t,y) can be written in terms of basis functions as

w1(t,y)=˜Nj=1ˆw1j(t)ϕ1j(y),w2(t,y)=˜Nj=1ˆw2j(t)ϕ2j(y),
(36)
u(t,y)=˜Nj=1ˆu1j(t)ϕ1j(y)+˜Nj=1ˆu2j(t)ϕ2j(y)
(37)
Here,auxiliary state variables are defined as ˆX1=[ˆw1i,,ˆw1˜N]T, ˆX2=[ˆw2i,,ˆw2˜N]T,ˆU1=[ˆu1i,,ˆu1˜N]T, ˆU2=[ˆu2i,,ˆu2˜N]T. Next,Galerkin procedure on (34) is carried out with basis functions φ1j(y) and φ2j(y) to yield
˙ˆw1j=˜Ni=1(l2y=0φ2iφ1jdy)ˆw2i,˙ˆw2j=EIρa˜Ni=1(l2y=0φ21iφ22jdy)ˆw1iγ2Iρa˜Ni=1(l2y=0φ22iφ22jdy)ˆw2iγ1ρaˆw2jl2y=0ΔL(t,y)lρa2φ2idy+1ρa˜Ni=1(l2y=0b(y)φ1iφ2jdy)ˆu1i+1ρa˜Ni=1(l2y=0b(y)φ2iφ2jdy)ˆu2i+1ρa[(EIw22+γ2Iw22)φ22j]|l21ρa[(EIw31+γ2Iw32)φ2j]|l2.
(38)

Notice the superscript as defined in Section II-A. ΔL(t,y) is the change in the lift distribution from the equilibrium position. The following expression for ΔL(t,y)is assumed:

ΔL(t,y)=12ρaV2ocClαΔα,
(39)
where ρa is the density of air at sea level,Vo is the airspeed,c is the chord length,Clα is lift-curve slope,and change in angle-of-attack Δα is defined as
Δα=sin1(w2V0).
(40)
In this study,a third order approximation of the sine inverse function is used as
Δαw2V+16(w2V0)3
(41)
By using (35) and (41),˙ˆw2j in (38) can be simplified as
˙ˆw2j=EIρa˜Ni=1(l2y=0φ21iφ22jdy)ˆw1iγ2Iρa˜Ni=1(l2y=0φ22iφ22jdy)ˆw2iγ1Iρaˆw2jρaV2ocClαlρal2y=0(w2V0+16(w2V0)3)φ2jdy+1ρa˜Ni=1(l2y=0b(y)φ1iφ2jdy)ˆu1i+
1ρa˜Ni=1(l2y=0b(y)φ2iφ2jdy)ˆu2im2ρa[(˜Nj=1˙ˆw2i(t)φ2i(y))φ2j]|l2,˙ˆw2j=EIρa˜Ni=1(l2y=0φ21iφ22jdy)ˆw1iγ2Iρa˜Ni=1(l2y=0φ22iφ22jdy)ˆw2iγ1ρaˆw2jρaV2ocClαlρaˆw2jρacClα6lρaV0×l2y=0×[k1+k2++k˜N=3(3!k1!k2!k˜N!)˜Ni=1(ˆw2i(t)φ2i(y))ki]×φ2jdy+1ρa˜Ni=1(l2y=0b(y)φ1iφ2jdy)ˆu1i+1ρa˜Ni=1(l2y=0b(y)φ2iφ2jdy)ˆu2im2ρa[(˜Nj=1˙ˆw2i(t)φ2i(y))φ2j]|l2.
(42)

Note that the first equation of (38) and (42) are in the form of (11) and (12),respectively.

Ⅵ. SNAC CONTROLLER AUGMENTED WITH ADAPTIVE CONTROLLER

To compensate for the real life uncertainties arising out of unknown physical parameters in the system,an adaptive controller using a passive identifier is proposed. Let the actual flexible aircraft system model be in the form given in (43). Note that (43) has an extra term h(w1,w2) as compared to (34) which represents the uncertainties.

w2(t,y)t=EIρa4w1(t,y)y4γ1ρaw2(t,y)γ2Iρa4w2(t,y)y4ΔL(t,y)lρa2+b(y)u(t,y)ρa+h(w1,w2).
(43)
It is also assumed that the uncertainties can be parameterized by using a linear in parameter neural network structure as
h(w1,w2)=WTNσ(w1,w2),
(44)
where WNRp×1 are the ideal weights of the neural network and σ(w1,w2)Rp×1 are the basis functions of the linear-in-parameter neural network. To estimate the unknown ideal weights,a passive identifier of the following form is considered:
ˉw2(t,y)t=EIρa4w1(t,y)y4γ1ρaw2(t,y)γ2Iρa4w2(t,y)y4ΔL(t,y)lρa2+b(y)u(t,y)ρa+ˆWTNσ(w1,w2)+a2(w2ˆw2),
(45)
where a2>0 is a design parameter and ˆw2 is the estimated state variable. Note that the identifier equation uses all available information to estimate the rate of beam displacement. The error signal is given by e=w2ˆw2 and by using (43) and (45),we can obtain the error dynamics as
e(t,y)t=a2e(t,y)+ˆWTNσ(w1,w2),
(46)
here ˜WN=WNˆWN is the error in weight estimation. Now to derive the weight update laws consider a Lyapunov function
V=12(l0e(t,y)2dy+˜WTNF1˜WN).
(47)

The time derivative of the Lyapunov equation (47) is written as

˙V=l0e(t,y)etdy+˜WTNF1˙˜WN.
(48)
By using error dynamics (46),(48) is rewritten as
˙V=a2l0e(t,y)2dy+l0e(t,y)˜WTNσ(w1,w2)dy˜WTNF1˙˜WN.
(49)
By using the weight update law
˙˜WN=Fl0e(t,y)σ(w1,w2)dy.
(50)
Equation (49) can be simplified as
˙V=a2Fl0e(t,y)2dy0.
(51)
Note that (51) ensures that ˙V is negative semi definite. By using the extension of Barbalats Lemma to infinite dimensional systems[31],it can be shown that limt|e(t)|2=0 and ˜WNIL.

Ⅶ. SIMULATION RESULTS

Numerical simulations are performed in Matlab. All the simulation parameter values used in this study for flexible beam problem are listed in Tables I and II. Ten basis functions (for each state) were found to be sufficient to "approximately" describe the flexible model in finite-dimensions. These basis functions were obtained from 3 840 snap-shots,which are captured by simulating the BMB system with random initial state profiles. Here,the term "approximately" indicates that these basis functions are able to capture nearly 99 % of energy (given by (3)) in the beam displacement and the rate of the beam displacement. Using these basis functions,a lumped parameter model is obtained and the controller is designed using SNAC. The following basis functions were used in the linear-in-parameter neural network for each costate of the lumped parameter model: [ˆw11,ˆw12,,ˆw110,ˆw21,,ˆw210,(ˆw21)2,(ˆw21)3,(ˆw21)4,ˆw11ˆw21]T24×1

TABLE Ⅰ
SYSTEM PARAMETERS FOR FLEXIBLE BEAM PROBLEM

TABLE Ⅱ
NUMERICAL PARAMETERS FPR FLEXIBLE BEAM PROBLEM

Simulation of the system dynamics was carried out using a finite difference approach. To proceed with this approach,the beam was discretized into spatial nodes with a value of Δy (the distance between neighboring nodes). A uniform discretization is kept over the beams. To get the numerical values of spatial partial derivatives at node (i),a central difference scheme was used.

(2xy2)(i)=x(i+1)2x(i)+x(i1)(Δy)2,

(3xy3)(i)=x(i+2)2x(i+1)+2x(i1)x(i2)2(Δy)3,(4xy4)(i)=x(i+2)4x(i+1)+6x(i)4x(i1)+x(i2)(Δy)4.
(52)
Simulation time step Δt of the order of (Δy)2 was chosen to obtain the numerically stable solution[32]. In order to compute the integral in the POD formulation,a trapezoidal rule was used. The BMB system was simulated by perturbing it from level flight conditions. A sudden gust (for example) is assumed to cause the wing to deform from its straight position. The desired (equilibrium) state of the system is taken as zero. In this study,the initial state profile was assumed as
w1(0,y)=0.01sin5(πyl),w2(0,y)=0.1sin5(πyl).
(53)

Figs. 4 and 5 show the results of the SNAC controller implementation. It can be observed that the control action is able to bring the beam system to the desired equilibrium state,i.e., both the beam displacement and the rate of the beam displacement go to zero. Analysis of the validity of POD approximation of the underlying PDE system is carried out by comparing the actual state profiles to approximated profiles generated by using (36). Fig. 6 shows that basis functions are able to capture the state profile quite well at the different time instants considered. Note that SNAC controller is developed based on the approximated profile in lumped parameter model. Due to the close approximation,as shown in Fig. 6,control actions calculated from a low-dimensional approximated system are effective in controlling the flexible wing which is a distributed system.

Download:
Fig. 4. Time history of state variables.

Download:
Fig. 5. Time history of continuous control action profile.

Download:
Fig. 6. Comparison of actual profile and approximate profile.

In order to show the versatility of the SNAC approach to initial conditions,state histories from using a different initial condition are provided. These simulation results shown in Figs. 7 and 8 clearly indicate that the system is stabilized. Fig. 7 shows the system trajectories with respect to time. Fig. 8 shows the stabilizing feedback control action which is able to direct the system towards desired equilibrium condition.

Download:
Fig. 7. Time histories of beam displacement profile and rate of beam displacement profile.

Download:
Fig. 8. Time history of continuous control action profile.

Next,the need for designing the online adaptive control is shown by simulating the flexible wing motion in the presence of uncertainties. For this purpose,the exact values of a few system parameters were assumed to be unknown which is the equivalent of parametric uncertainties in a system. Furthermore,a bias term is added to the actual system dynamics to make it inherently unstable. The chord length c and lift curve slope Clα are 1 m and 3 rad1 respectively which differ from the actual value listed in Table I. While designing the nominal (offline) control,as discussed in Sections III and IV,10 basis functions are considered,to obtain an approximate lumped parameter model. Same architecture of neural network as before is taken for SNAC and the nominal control is designed. A bias term 100ρaw2(t,y) is added to the right hand side of in the second equation of (34) which is treated as an uncertainty in the controller development. The procedure described in Section 6 is adopted to design an identifier and online adaptive control system.

Figs. 9 11 show the results of the system dynamics with uncertainties. System uncertainty is estimated with a linear-in-parameter neural network. The following basis functions are used to estimate the uncertainty: σ(w1,w2)=[w1,w2,(w1)2,(w2)2,w1w2,(w1)3,(w2)3,(w1)2w2,w1(w2)2]T

Download:
Fig. 9. Time histories of (a) beam displacement profile and (b) rate of beam displacement profile.

Download:
Fig. 10. Time histories of total control and adaptive control.

Download:
Fig. 11. Time histories of actual uncertainty and estimated uncertainty.

Identifier states and neural network weights are initialized as zero. Fig. 9 shows the system state histories with respect to time under the control action as shown in Fig. 10 (a). Fig. 10 (b) show the extra (adaptive) control needed to compensate for the estimated uncertainty. Fig. 11 (a) and (b) show the actual (modeled) and estimated uncertainties with respect to time, respectively. To see the importance of adaptive control,shown in Fig. 10 (b),the system dynamics was simulated with the nominal (offline) control architecture also.

Fig. 12 shows that system is unstable when just the nominal control,designed using SNAC,is applied to the system with uncertainty. Fig. 13 shows the corresponding nominal control action. Figs. 14 16 compare the results of system trajectory and control action at different spatial points. The control action,which is augmented with adaptive control,stabilizes the system whereas nominal control alone is not sufficient to do so, which clearly shows the need for adaptive control.

Download:
Fig. 12. Time histories of beam displacement profile and rate of beam displacement profile.

Download:
Fig. 13. Time history of continuous control action profile.

Download:
Fig. 14. Beam displacements at y = 0.2L and y = 0.4L.

Download:
Fig. 15. Rate of beam displacements at y = 0.2L and y= 0.4L.

Download:
Fig. 16. Control actions at y=0.2L and at y=0.4L.
Ⅷ. CONCLUSION

In this paper,a stabilizing state-feedback control design approach for controlling the heave dynamics of an aircrafts flexible wing problem was developed using reinforcement and adaptive control techniques. The proper orthogonal decomposition technique was used to get a finite dimensional approximate model (lumped parameter model) of a flexible wing by using a set of basis functions. Then,the optimal controller was synthesized using the single-network-adaptive-critic technique for this lumped system. Simulation results show the effectiveness of the designed single-network-adaptive-critic controller for different initial conditions. Further to mitigate the loss in performance due to uncertainties the single-network adaptive critic controller was augmented with a passive identifier based adaptive controller. Numerical results show that under destabilizing uncertainties the single-network adaptive critic controller augmented with adaptive controller is able to stabilize the system. Since the controller development is fairly general,it is possible to extend its applicability to control of other distributed parameter systems under uncertainties.

References
[1] Lasiecka I. Control of systems governed by partial differential equations:a historical perspective. In: Proceedings of the 34th Conference onDecision and Control. New Orleans, LA: IEEE, 1995.2792-2796
[2] Burns J A, King B B. Optimal sensor location for robust control ofdistributed parameter systems. In: Proceedings of the 1994 Conferenceon Decision and Control. Lake Buena Vista, FL, USA: IEEE, 1994.3967-3972
[3] Curtain R F, Zwart H J. An introduction to Infinite Dimensional LinearSystems Theory. New York: Springer-Verlag, 1995.
[4] Padhi R, Balakrishnan S N. Optimal dynamic inversion control designfor a class of nonlinear distributed parameter systems with continuousand discrete actuators. The Institute of Engineering and Technology,Control Theory, and Applications, 2007, 1(6): 1662-2671
[5] Haykin S. Neural Networks. New York: Macmillan College Company,1994.
[6] Gupta S K. Numerical Methods for Engineers. New Delhi: New AgeInternational Ltd., 1995.
[7] Christofides P D. Nonlinear and Robust Control of PDE Systems:Methods and Applications to Transport-Reaction Processes. Boston:Birkhauser Boston Inc., 2001.
[8] Holmes P, Lumley J L, Berkooz G. Turbulence Coherent Structures,Dynamical Systems and Symmetry. Cambridge: Cambridge UniversityPress, 1996.87-154
[9] Bryson A E, Ho Y C. Applied Optimal Control: Optimization, Estimation,and Control. Washington: Taylor and Francis, 1975
[10] Lewis F. Applied Optimal Control and Estimation. New Jersey: Prentice-Hall, 1992.
[11] Werbos P J. Neurocontrol and supervised learning: an overview andevaluation. Handbook of Intelligent Control: Neural, Fuzzy and AdaptiveApproaches. New York: Van Nostrand Reinhold, 1992
[12] Balakrishnan S N, Biega V. Adaptive-critic based neural networks foraircraft optimal control. Journal of Guidance, Control and Dynamics,1996, 19(4): 893-898
[13] Padhi R, Unnikrishnan N, Wang X, Balakrishnan S N. A single networkadaptive critic (SNAC) architecture for optimal control synthesis for aclass of nonlinear systems. Neural Networks, 2006, 19: 1648-1660
[14] Padhi R, Balakrishnan S N. Optimal beaver population managementusing reduced-order distributed parameter model and single networkadaptive critics. In: Proceedings of the 2004 American Control Conference.Boston, MA, 2004.1598-1603
[15] Yadav V, Padhi R, Balakrishnan S N. Robust/optimal temperature profilecontrol of a re-entry vehicle using neural networks. In: Proceedings ofthe 2006 AIAA Atmospheric Flight Mechanics Conference and Exhibit.Keystone, Colorado: AIAA, 2006.21-24
[16] Prabhat P, Balakrishnan S N, Look D C Jr. Experimental implementationof adaptive-critic based infinite time optimal neurocontrol for aheat diffusion system. In: Proceedings of the 2002 American ControlConference. Alaska, USA: IEEE, 2002.2671-2676
[17] Chakravarthy A, Evans K A, Evers J. Sensitivity & functional gains fora flexible aircraft-inspired model. In: Proceedings of the 2011 AmericanControl Conference. Baltimore, MD, USA: IEEE, 2010.4893-4898
[18] Chakravarthy A, Evans K A, Evers J, Kuhn L M. Target tracking strategiesfor a nonlinear, flexible aircraft-inspired model. In: Proceedings ofthe 2011 American Control Conference. San Francisco: IEEE, 2011.1783-1788
[19] Narendra K S, Annaswamy A M. Stable Adaptive Systems. EnglewoodCliffs, NJ: Prentice-Hall, 1989.
[20] Ioannou P A, Sun J. Robust Adaptive Control. Englewood Cliffs, NJ:Prentice-Hall, 1995.
[21] Böhm M, Demetriou M A, Reich S, Rosen I G. Model reference adaptivecontrol of distributed parameter systems. SIAM Journal of Control andOptimization, 1998, 36(1): 33-81
[22] Hong K S, Bentsman J. Direct adaptive control of parabolic systems:algorithm synthesis and convergence and stability analysis. IEEE Transactionsof Automatic Control, 1994, 39(10): 2018-2033
[23] Hong K S, Yang K J, Kang D H. Model reference adaptive control of atime-varying parabolic system.KSME International Journal, 2000, 14(2):168-176
[24] Krstic M, Smyshlyaev A. Adaptive control of PDEs. Annual Reviews inControl, 2008, 32(2): 149-160
[25] Demetriou A M, Rosen I G. On-line robust parameter identification forparabolic systems. International Journal of Adaptive Control and SignalProcessing, 15(6): 615-631
[26] Sirovich L. Turbulence and the dynamics of coherent structures. Quarterlyof Applied Mathematics, 1987, 45(3): 561-590
[27] Ravindran S S. Proper Orthogonal Decomposition in Optimal Controlof Fluids, NASA/TM-1999-209113. USA, 1999.
[28] Padhi R, Balakrishnan S N. Proper orthogonal decomposition basedoptimal control design of heat equation with discrete actuators usingneural networks. American Control Conference, ACC02-IEEE 1545,2002.
[29] Padhi R, Balakrishnan S N. Proper orthogonal decomposition basedfeedback optimal control synthesis of distributed parameter systemsusing neural networks. In: Proceedings of the 15th American ControlConference. Anchorage, AK: IEEE, 2002.4389-4394
[30] Ravindran S S. Adaptive reduced-order controllers for a thermal flowsystem using proper orthogonal decomposition. SIAM Journal on ScientificComputing, 2002, 23(6): 1924-1942
[31] Popov V M. Hyperstability of Control Systems. Berlin: Springer, 1973.
[32] Olver P J. Introduction to Partial Differential Equations. New York:Springer-Verlag, 2013.
Reinforcement Learning Based Controller Synthesis for Flexible Aircraft Wings
Manoj Kumar, Karthikeyan Rajagopal, Sivasubramanya Nadar Balakrishnan, Nhan T. Nguyen