
IEEE/CAA Journal of Automatica Sinica
Citation: | Teng Liu, Bin Tian, Yunfeng Ai and Fei-Yue Wang, "Parallel Reinforcement Learning-Based Energy Efficiency Improvement for a Cyber-Physical System," IEEE/CAA J. Autom. Sinica, vol. 7, no. 2, pp. 617-626, Mar. 2020. doi: 10.1109/JAS.2020.1003072 |
CYBER physical systems (CPSs) are defined as a system wherein the physical components are deeply intertwined with the software components to exhibit various and distinct behavioral patterns [1]. Recent increased demands of performance and complex usage pattern accelerate the development and research of CPSs [2]–[4]. Being a typical application of CPS in green transportation, hybrid electric vehicles (HEVs) show great potential to reduce energy consumption and air pollution [5], [6]. In such a system, hybrid electric powertrain and driving environments constitute the physical resources, communication and control data compose the cyber part of this system [7], [8]. Strong nonlinearities and uncertainties of the interactions between the cyber and physical resources increase difficulties in control, management, and optimization of HEVs [9], [10]. Especially, energy management of HEV is critical, and several challenges remain to be resolved, such as optimization, calculation time, and adaptability [11], [12].
Energy management strategies (EMSs) has been active research for decades because it can achieve remarkable energy efficiency. Existing EMSs are generally classified into three different categories, rule-based, optimization-based, and learning-based ones. Rule-based strategies depend on a set of predefined criterions without knowledge of real-world driving conditions [13], [14]. Binary control as a typical example is used to adjust power split between battery and engine as the state of charge (SOC) exceeds the threshold values. When the trip information is prior known, many approaches have been applied to search the optimal control strategies, such as dynamic programming (DP) [15], stochastic dynamic programming (SDP) [16], Pontryagin’s minimum principle (PMP) [17], model predictive control (MPC) [18], and equivalent consumption minimization strategy (ECMS) [19]. However, these strategies are usually inappropriate for various driving environments [20]. Due to the ultrafast development of computing capability, learning-based methods emerge great potential in learning control strategies from the recorded historical driving data [21], [22]. This type of methods needs to be further developed.
As a complex CPS, hybrid electric powertrain still faces several issues to handle energy management problems. The first one is data lack [23]. The controller needs to collect new data and learn new model parameters to derive different strategies for new driving conditions. The second one is data inefficiency [24]. Large-dimension actions and states of complex CPS need to be calibrated and scheduled reasonably to guide the controller. The final one is universality. Adaptive and efficient control strategies need to be generated to accommodate the dynamic real-world driving conditions.
To address these difficulties, we develop a novel bidirectional long short-term memory (LSTM) network based parallel reinforcement learning (PRL) framework to construct EMS for a hybrid tracked vehicle (HTV), see Fig. 1 as an illustration. This framework involves two levels. In the high-level structure, an artificial vehicle powertrain system is built analogy to the real vehicle to constitute the parallel powertrain system. The large synthesized data from this parallel system is utilized to relieve the data lack problem. A bidirectional LSTM network is proposed to represent dependence between multi-actions and state. This network can capture more details of the interactions between multi-action embeddings to solve the data inefficiency problem. In the lower-level skeleton, model-free reinforcement learning (RL) algorithm is finally used to compute the adaptive control strategy based on the trained data.
This literature involves three perspectives of contributions: 1) A parallel system of the HTV is constructed to generate large synthesized data based on the limited real historical data; 2) A bidirectional LSTM network is proposed to train the available data to model effectively the action state function; 3) Mode-free RL technique is applied to derive the adaptive EMS to accommodate different driving conditions. Experimental results illustrate that the proposed EMS can achieve considerable energy efficiency improvement by comparing with the conventional RL approach and deep RL.
The remainder of this paper is organized as follows. Section II describes the high-level architecture of a deep neural network for data estimation and the bidirectional LSTM network framework. Section III describes the modeling of the hybrid electric powertrain, wherein the optimal control problem is constructed, and the structure of the lower-level model-free RL algorithms are also introduced. In Section IV, the data collection in real vehicle tests and synthesized data processing are elaborated, and experiment results of three control strategies comparison are presented. Key takeaways are summarized in Section V.
Bidirectional LSTM network framework for action state function estimation is introduced in this section. First, multilayer deep neural network is constructed via considering powertrain state and actions as inputs. The states are the SOC in battery and generator speed, and actions are the engine torque, power demand, and motor speed. Based on this network, bidirectional LSTM theory is formulated to approximate the action value function. The detailed components are illustrated as follows.
A deep neural network is a logical-mathematical model that seeks to simulate the behavior and function of a biological neuron [25]. Three layers named input layer, hidden layer, and output layer are included in this network, see Fig. 2(a) as an illustration. The input vector z = [z1, z2, …, zN] are weighted by elements ω1, ω2, …, ωN, then summed with a bias b and imposed by an activation function f to generate the neuron output as follow:
{x=N∑j=1ωjzj+by=f(x) | (1) |
where x depicts the net input, y means the neuron output, N is total inputs index and zj is the jth input.
The log-sigmoid activation function is adopted in this paper, and thus the output of the overall networks is depicted as
{f(x)=11+e−xyall=f2(S∑i=1ω21if1(N∑j=1ω1ijzj+b1i)+b2all) | (2) |
where f 2 and f 1 represent the activation function of the hidden layer and output layer, respectively. S is the number of neurons in the hidden layer,
A memory block is the key constituent part of an LSTM network. For each block, three adaptive and multiplicative gating units are shared by multiple cells, as shown in Fig. 2(b). Furthermore, a recurrently self-connected linear unit called constant error carousel (CEC) is the core of each block. The CEC can provide short-term memory storage for extended time periods by recirculating activation and error signals indefinitely. The three gating units are able to be trained to recognize, store and read information from the memory block. All the cells are combined into the block to share the same gates and reduce the number of adaptive parameters [26].
In this paper, the LSTM network is operated in bidirectional courses and the time steps are discreted as t = 0, 1, 2, … . The two courses are named forward pass and backward pass, which mean the updating of the units’ activation and the calculation of the error signals for weights. The notations in the following manuscript are defined as: j is the index of the memory block, v depicts the sequence of cells in the block j, and thus
1) Input: In the forward pass, the cell input is first computed as
zcvj(t)=∑mωcvjmym(t−1). | (3) |
This variable is affected by the input squashing function g to generate the new cell state next.
The input gate activation yin is derived by applying a logistic sigmoid squashing function fin with range [0, 1] to the gate’s net input zin
{zinj(t)=∑mωinjmym(t−1)yinj(t)=finj(zinj(t)) | (4) |
where yin ≈ 1 means the input gate is open and the relevant information can be stored in the block and yin ≈ 0 indicates the gate is close to shield the irrelevant one.
2) Cell State: The memory cell state sc is initialized to zero when t = 0, and then it accumulates based on the input and discounted factor of the forget gate. First, the forget gate activation is defined as
{zθj(t)=∑mωθjmym(t−1)yθj(t)=fθj(zθj(t)) | (5) |
where fθ represents a logistic sigmoid function and ranges from 0 to 1. Then, the new cell state is derived as follow:
scvj(t)=yθj(t)scvj(t−1)+yinj(t)g(zcvj(t)),scvj(0)=0. | (6) |
What information to store in the memory block is decided by the input gate and when to erase the outdated information is determined by the forget gate. By doing this, the memory block can retain fresh data and the cell state cannot grow to infinity.
3) Output: The read access to the information is controlled by the output gate via multiplying the output from the CEC. The relevant activation is calculated by applying the squashing function ([0, 1]) into the net input
{zoutj(t)=∑mωoutjmym(t−1)youtj(t)=foutj(zoutj(t)). | (7) |
Then, the cell output yc is described by the cell state and output gate activation as follow:
ycvj(t)=youtj(t)scvj(t). | (8) |
Finally, the activation of the output units k is depicted as
yk(t)=fk(zk(t)),zk(t)=∑mωkmym(t) | (9) |
where m range over all units, and fk is the output squashing function.
LSTM’s backward pass is a truncated version of real-time recurrent learning (RTRL) for weights to cell input, input gates, and forget gates. Also, it fuses the error of back-propagation (BP) in the output units and output gates efficiently.
1) Output Units and Gates: Based on the target tk, the squared error objective function is depicted as
E(t)=12∑kek(t)2, ek(t)=tk(t)−yk(t) | (10) |
where ek is the externally injected error. Gradient descent algorithm is used to minimize the objective function. The weight ωlm is decided by the variation Δωlm, which is calculated via the negative gradient of E times the learning rate α. Hence, the standard BP weight changes of output units are
Δωkm(t)=αδk(t)ym(t−1),δk(t)=−∂E(t)∂zk(t). | (11) |
The standard BP is also utilized to compute the weight changes for connections to the output gate from source units m
{Δωoutjm(t)=αδoutj(t)ym(t)δoutj(t)tr=f′outj(zoutj(t))(Sj∑v=1scvj(t)∑kωkcvjδk(t)) | (12) |
where
2) Truncated RTRL Partials: The forward propagation is necessary in time for the partial derivatives in RTRL. These partials for weights at the cell (cvj), input gate (in), and forget gate (θ) are updated as follow:
∂scvj(t)∂ωcvjmtr=∂scvj(t−1)∂ωcvjmyφj(t)+g′(zcvj(t))yinj(t)ym(t−1) | (13) |
∂scvj(t)∂ωinjmtr=∂scvj(t−1)∂ωinjmyφj(t)+g(zcvj(t))f′inj(zinj(t))ym(t−1) | (14) |
∂scvj(t)∂ωθjmtr=∂scvj(t−1)∂ωθjmyφj(t)+scvj(t−1)f′θj(zθj(t))ym(t−1) | (15) |
where when t = 0, these partials equal to zero.
3) RTRL Weight Changes: In backward pass, the RTRL partials are employed to compute weight changes Δωlm for connections to the forget gate, cell and input gate as
Δωcvjm(t)=αescvj(t)∂scvj(t)∂ωcvjm | (16) |
Δωinjm(t)=αSj∑v=1escvj(t)∂scvj(t)∂ωinjm | (17) |
Δωθjm(t)=αSj∑v=1escvj(t)∂scvj(t)∂ωθjm. | (18) |
At each memory cell, the internal state error
escvj(t)tr=youtj(t)(∑kωkcvjδk(t)). | (19) |
In bidirectional recurrent nets, the forward and backward sequences of each training are regarded as two independent recurrent nets and are connected to the same output layer. Taking the time sequence from t – 1 to t as an example, the outline that combines the bidirectional algorithm and LSTM is illustrated as follow.
1) Forward Pass: Feed all input data of the sequence into the LSTM and decide all the output units.
a) For the forward states (from time t – 1 to t) and backward states (from time t to t – 1), realize the forward pass process in Section II-B;
b) For the output layer, realize the forward pass process in Section II-B.
2) Backward Pass: Compute the relevant partial derivatives of error for the sequence used in the forward pass.
a) For the output neurons, achieve the backward pass process introduced in Section II-C;
b) For the forward states (from time t to t – 1) and backward states (from time t – 1 to t), achieve the backward pass process discussed in Section II-C.
3) Update Weight Changes: Finally, (16) to (19) are used to update RTRL weight changes.
In this section, the energy management of a hybrid tracked vehicle (HTV) is constructed as an optimization control problem. Modeling of the battery pack and engine-generator set (EGS) combine with the optimization objective are first introduced. To resolve the data lack problem of a complex CPS, a parallel system of the hybrid electric powertrain is then proposed to generate the artificial data. Real and artificial driving data constitute the synthesized data, which is trained to approximate the action state function. Finally, Q-learning algorithm is applied to compute the optimal control action according to the trained data from the bidirectional LSTM network.
The studied complex CPS is a self-built HTV and Fig. 3 depicts the sketch of the powertrain architecture. The main energy sources to propel the powertrain are the EGS and battery [10]. Table I lists the key characteristics of the HTV powertrain.
Symbol | Value | Unit |
Vehicle mass Mv | 2500 | kg |
Inertia of generator Jg | 0.1 | kg·m2 |
Inertia of engine Je | 0.2 | kg·m2 |
Gear ratio parameter ieg | 1.2 | / |
Electromotive force parameter Ke | 0.8092 | Vsrad−2 |
Electromotive force parameter Kx | 0.0005295 | NmA−2 |
Minimum state of charge SOCmin | 0.5 | / |
Maximum sate of charge SOCmax | 0.9 | / |
Battery capacity Cbat | 37.5 | Ah |
For EGS, the rated engine power is 52 kW at the speed 6200 rpm. The rated generator output power is 40 kW within the speed range from 3000 rpm to 3500 rpm. The generator speed is the first state variable and is computed based on the torque equilibrium restraint
{dngdt=Teieg−Tg0.1047(Jei2eg+Jg)ne=ngieg | (20) |
where ng and ne are the rotational speeds, Tg and Te are the torques of generator and engine, respectively. Te is one of the control variables in this work. Je and Jg are the rotational moment of inertias for the engine and generator, severally. ieg is the gear ratio connects the generator and engine, and 0.1047 is the transformation parameter which means 1 r/min = 0.1047 rad/s.
The output voltage and torque of the generator are derived as follow:
{Tg=KeIg−KxI2gUg=Keng−KxngIg | (21) |
where Ke is the electromotive force coefficient, Ug and Ig are the generator current and voltage, respectively. Kxng is the electromotive force, and Kx = 3PLg/π, in which Lg is the armature synchronous inductance, P is the poles number.
In the hybrid electric powertrain, SOC of battery is selected as another state variable. The output voltage and derivative of SOC in the battery are depicted via the equivalent first-order model
{dSOCdt=−Ibat(t)Cbat=(Voc−√V2oc−4rinPbat(t))2CbatrinUbat={Voc−Ibatrch(SOC),Ibat≥0Voc−Ibatrdis(SOC),Ibat<0 | (22) |
where Ibat and Cbat are the battery current and capacity, respectively. Pbat is the battery power, rin is the battery internal resistance and Voc is the open circuit voltage. Ubat is the output voltage of battery, rdis(SOC) and rch(SOC) describe the internal resistance during discharging and charging, respectively.
The optimization control goal to be minimized is a trade-off between the charge sustaining constraint and fuel consumption over a finite horizon as
{J=∫tft0[˙mf(t)+λ(ΔSOC)2]dtΔSOC={SOC(t)−SOCref,SOC(t)<SOCref0,SOC(t)≥SOCref | (23) |
where [t0, tf] denotes the given time horizon,
Furthermore, the instantaneous physical limits need to be observed to guarantee the reliability and safety of the powertrain:
{SOCmin≤SOC(t)≤SOCmaxng,min≤ng(t)≤ng,maxTe,min≤Te(t)≤Te,maxne,min≤ne(t)≤ne,maxPdem,min≤Pdem(t)≤Pdem,maxnm,min≤nm(t)≤nm,max | (24) |
where ne, min, ne, max, Te, min, and Te, max are the permitted lower and upper bounds of the engine speed and torque, respectively. nm is the motor speed, nm, min and nm, max are its boundary values. Pdem, min, and Pdem, max are the threshold of power demand admissible sets, same as the ng, min and ng, max.
Note that the core of this article focuses on discussing the PRL technique for a complex CPS, the traction motors are assumed as the power conversion devices with the identical efficiency and the battery aging is not considered in this study [9], [10].
Fei-Yue Wang first initialized the parallel system theory in 2004 [28], [29], in which ACP method was proposed to deal the
complex CPS problem. ACP approach represents artificial societies (A) for modeling, computational experiments (C) for analysis, and parallel execution (P) for control. An artificial system is usually built by modeling, to explore the data and knowledge as the real system does. Through executing independently and complementally in these two systems, the learning model can be more efficient and less data-hungry. ACP approach has been employed in several fields to discuss the different problems in complex CPSs [30]–[32].
For a self-built HTV, there are not sufficient environments provided for it to operate to obtain enough actual data. Hence, we build an artificial powertrain system in MATLAB/ Simulink to address the data lack problem in action state function training. This artificial system combines with the real powertrain system constitute the parallel system, see Fig. 4(a) as an illustration. To neglect the steering power (too small), the speeds of two tracks are equivalent to the average speed of them. By taking a few field test data as guidance and regulating the parameters of powertrain model and environments, large artificial data is acquired, including SOC, generator speed, engine torque, engine speed, power demand, battery current, battery voltage, and two motors speed. The synthesized data from the parallel system is collected and calibrated to derive the optimal EMS using the bidirectional LSTM network and reinforcement learning.
A learning agent interacts with a stochastic environment in reinforcement learning (RL) framework, and this interaction is modeled as a discrete discounted Markov decision process (MDP). The MDP is expressed as a quintuple (S, A, Π, R, γ), where A and S are the set of control actions and state variables, Π is the transition probability matrix (TPM), R is the reward function, and γ
The value function is defined as the expected future reward
V(s)=E(tf∑t=t1γt1−t0r(s)). | (25) |
Then, the finite expected discounted and accumulated rewards is summarized as the optimal value function
V∗(s)=minπE(tf∑t=t0γtr(s)) | (26) |
where π is the control policy, which depicts the control action distribution with the time sequence. To deduce the optimal control action at each time instant, (26) is reformulated recursively as
V∗(s)=mina(r(s,a)+γ∑s′∈Spsa,s′V∗(s′)),∀s∈S. | (27) |
The optimal control policy is determined based on the optimal value function in (27)
π∗(s)=argmina(r(s,a)+γ∑s′∈Spsa,s′V∗(s′)). | (28) |
Furthermore, the action value function and its corres- ponding optimal measure are described as follow:
{Q(s,a)=r(s,a)+γ∑s′∈Spsa,s′Q(s′,a′)Q∗(s,a)=r(s,a)+γ∑s′∈Spsa,s′mina′Q∗(s′,a′). | (29) |
Fig. 5 shows the bidirectional LSTM-based deep reinforcement network is utilized to estimate the action value function in RL. This structure includes two deep neural networks, one for state variables
The inner product is used to compute new Q(st, at) through combining the states and sub-actions neuron output as
Q(st,at)=K1∑j=1K2∑i=1Q(sjt,ait) | (30) |
where K1 and K2 are the number of the states and sub-actions, respectively. Q(st,
Finally, the action value function corresponding to an optimal control policy can be computed using the Q-learning algorithm as [33]
Q(st,at)←Q(st,at)+μ(r(st,at)+γmina′Q(s′t,a′t)−Q(st,at)) | (31) |
where μ
Algorithm 1 describes the pseudo-code of the Q-learning algorithm. The discount factor γ is settled as 0.96, the decaying factor μ is related with the time instant k and given as
Algorithm 1: Q-learning Algorithm
1. Extract Q(s, a) from training and initialize iteration number Nit
2. Repeat time instant k = 1, 2, 3, …
3. Based on Q(s, .), choose action a (ε-greedy policy)
4. Executing action a and observe r, s'
5. Define a* = arg mina Q(s', a)
6. Q(s, a)←Q(s, a) + μ(r(s, a) + γmin a' Q(s', a') – Q(s, a))
7. s←s'
8. until s is terminal
The TPMs of power demand and vehicle modeling are inputs of RL technique for optimal EMS calculation. The RL algorithm is realized in Matlab via the Markov decision process (MDP) toolbox presented in [34] and a micro- processor with an Intel quad-core CPU of
The proposed bidirectional LSTM enabled PRL-based energy management strategy (EMS) is assessed on the self-built HTV powertrain in this section. First, data collection and processing are introduced in detail. We operate the HTV in the real scenarios to collect real vehicle driving data. Based on this data, we generate the synthesized data from the parallel system to use for action value function estimation, including all the states and control variables. Then, the presented PRL-based energy management strategy is compared with the conventional RL and deep RL approaches to evaluate its availability and optimality. Simulation results indicate that the proposed energy management strategy is superior to the two benchmarking techniques in control performance.
The real vehicle experiment is implemented on the self-built HTV in the suburb to represent the cross-country scenarios, and the real and target driving cycles are depicted in Fig. 6. The vehicle data and powertrain states are collected with a sampling frequency of
Furthermore, to eliminate the influence of different variable units on training, the input state variables and control actions of the network are scaled to the range from 0 to 1.
Based on the trained action-value function, the proposed bidirectional LSTM enabled PRL-based EMS is compared with the conventional RL and deep RL controls to certify its availability and optimality in this section. In the energy management problem, the simulation cycle is a real vehicle driving cycle, and the initial values of the state variable SOC and generator speed are 0.7 and 1200 rpm, respectively.
The SOC trajectories of a certain driving cycle and the corresponding generator speed are illustrated in Fig. 8. It can be discerned that the SOC trajectory imposed on the proposed model free EMS is close to that in deep RL control, and they are different from that in conventional RL control. This can be explained by the different power split between the EGS and battery, which is decided by the action value functions. It demonstrates that the training process in the deep neural network can improve the accuracy and optimality of control policy derived by the Q-learning algorithm. An analogous result in the generator speed trajectory is also given in Fig. 8.
Taking engine torque as an example, the above observation can be explained by the different distribution of engine torque with the state variables. Being a control variable, different values of engine torque decide multiple operative modes of the powertrain, as shown in Fig. 9.
The convergence processes of the action value function in the proposed EMS, conventional RL and deep RL are illustrated in Fig. 10. The mean discrepancy depicts the deviation of two action value functions per 100 iterations. Note that, the increase of iterative number accompanies with a decreased mean discrepancy, which implies the convergence characteristic of the Q-learning algorithm.
Fig. 10 also describes that the proposed control is superior to the conventional RL and deep RL controls in control performance, and the convergence rate is a little slower than them. This can be illuminated by the additional training process of the action value function in the bidirectional LSTM network. With an accepted calculation speed, the proposed EMS adapts to the real-time driving conditions more suitably than the conventional RL and deep RL controls, which demonstrates its availability.
Table II describes the fuel consumption after SOC correction and computation time for the three control strategies. It is apparent that the fuel consumption under the PRL-enabled EMS is lower than those in conventional RL-based and deep RL controls, which demonstrates its optimality. Also, the consumed time of PRL is lower than that of deep RL and conventional RL, which implies that it is potential to be applied in real-time.
Algorithms | Consumed fuel (g) | Time cost (s) |
PRL | 416.8 | 46.8 |
Deep RL | 441.3 | 61.5 |
Conventional RL | 465.7 | 53.7 |
We propose a novel bidirectional LSTM network based PRL framework to construct EMS for an HTV in this paper. First, the up-level builds an artificial vehicle powertrain system analogy to the real vehicle to constitute the parallel powertrain system. Second, a bidirectional LSTM network is proposed to train the large synthesized data from this parallel system to represent dependence between multi-actions and states. Third, in the lower-level skeleton, model-free RL algorithm is finally used to compute the adaptive control strategy based on the trained data.
Tests prove the optimality and availability of the proposed energy management strategy. In addition, the advantages in control performance and energy efficiency imply that the proposed adaptive control can be applied in real situations.
The proposed combination of bidirectional LSTM network and RL is indeed a simplified specification of the so-called parallel learning [35] which aims to build a more general framework for data-driven intelligent control. Future work focuses on applying the parallel learning and PRL framework into different research fields of automated vehicles, such as driving style recognition [36], braking intensity estimation [37], [38], and lane changing intention prediction [39], [40]. The parallel system could generate abundant driving data and evaluate the performance of different controllers easily.
[1] |
F.-Y. Wang, “The emergence of intelligent enterprises: from CPS to CPSS,” IEEE Trans. Intell. Transp. Syst., vol. 25, no. 4, pp. 85–88, 2010.
|
[2] |
F.-Y. Wang, “Control 5.0: from Newton to Merton in popper’s cyber-social-physical spaces,” IEEE/CAA J. Autom. Sinica, vol. 3, no. 3, pp. 233–234, 2016. doi: 10.1109/JAS.2016.7508796
|
[3] |
X. L. Tang, X. S. Hu, W. Yang, and H. S. Yu, “Novel torsional vibration modeling and assessment of a power-split hybrid electric vehicle equipped with a dual mass flywheel,” IEEE Trans. Veh. Technol., vol. 67, no. 3, pp. 1900−2000, 2018.
|
[4] |
T. Liu, X. S. Hu, W. H. Hu, and Y. Zou, “A heuristic planning reinforcement learning-based energy management for power-split plug-in hybrid electric vehicles,” IEEE Trans. Industrial Informatics, Mar. 2019.
|
[5] |
T. Liu, X. S. Hu, S. E. Li, and D. P. Cao, “Reinforcement learning optimized look-ahead energy management of a parallel hybrid electric vehicle,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 4, pp. 1497–1507, 2017. doi: 10.1109/TMECH.2017.2707338
|
[6] |
Y. Zou, T. Liu, D. X. Liu, and F. C. Sun, “Reinforcement learning-based real-time energy management for a hybrid tracked vehicle,” Applied Energy, vol. 171, pp. 372–382, 2016. doi: 10.1016/j.apenergy.2016.03.082
|
[7] |
C. Lv, Y. H. Liu, X. S. Hu, H. Guo, D. P. Cao, and F.-Y. Wang, “Simultaneous observation of hybrid states for cyber-physical systems: a case study of electric vehicle powertrain,” IEEE Trans. Cybernetics, vol. 48, no. 8, pp. 2357–2367, 2018.
|
[8] |
X. S. Hu, H. Wang, and X. L. Tang, “Cyber-physical control for ener-gy-saving vehicle following with connectivity,” IEEE Trans. Indus. Electron., vol. 64, no. 11, pp. 8578–8587, 2017.
|
[9] |
Y. Zou, Z. H. Kong, T. Liu, and D. X. Liu, “A real-time Markov chain driver model for tracked vehicles and its validation: its adaptability via stochastic dynamic programming,” IEEE Trans. Veh. Technol., vol. 66, no. 5, pp. 3571–3582, 2017.
|
[10] |
T. Liu, Y. Zou, D. X. Liu, and F. C. Sun, “Reinforcement learning of adaptive energy management with transition probability for a hybrid electric tracked vehicle,” IEEE Trans. Ind. Electron., vol. 62, no. 12, pp. 7837–7846, 2015.
|
[11] |
C. M. Martinez, X. S. Hu, D. P. Cao E. Velenis, B. Gao, and M. Wellers, “Energy management in plug-in hybrid electric vehicles: recent progress and a connected vehicles perspective,” IEEE Trans. Veh. Technol., vol. 66, no. 6, pp. 4534–4549, 2017. doi: 10.1109/TVT.2016.2582721
|
[12] |
Y. C. Qin, F. Zhao, Z. F. Wang L. Gu, and M. M. Dong, “Comprehensive analysis for influence of controllable damper time delay on semi-active suspension control strategies,” J. Vibration and Acoustics-Trans. ASME, vol. 139, no. 3, pp. 031006-1–031006-12, 2017. doi: 10.1115/1.4035700
|
[13] |
T. Liu, B. Wang, and C. L. Yang, “Online Markov chain-based energy management for a hybrid tracked vehicle with speedy Q-learning,” Energy, vol. 160, pp. 544–555, 2018. doi: 10.1016/j.energy.2018.07.022
|
[14] |
H. S. Ramadan, M. Becherif, and F. Claude, “Energy management improvement of hybrid electric vehicles via combined GPS/rule-based methodology,” IEEE Trans. Autom. Sci. Eng., vol. 14, no. 2, pp. 586–597, 2017. doi: 10.1109/TASE.2017.2650146
|
[15] |
K. Li, F. C. Chou, and J. Y. Yen, “Real-time, energy-efficient traction allocation strategy for the compound electric propulsion system,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 3, pp. 1371–1380, 2017. doi: 10.1109/TMECH.2017.2667725
|
[16] |
M. Muratori and G. Rizzoni, “Residential demand response: dynamic energy management and time-varying electricity pricing,” IEEE Trans. Power syst., vol. 31, no. 2, pp. 1108–1117, 2016. doi: 10.1109/TPWRS.2015.2414880
|
[17] |
S. Delprat, T. Hofman, and S. Paganelli, “Hybrid vehicle energy management: singular optimal control,” IEEE Trans. Veh. Technol, vol. 66, no. 6, pp. 9654–9666, 2017. doi: 10.1109/TVT.2017.2746181
|
[18] |
L. L. Guo, B. Z. Gao, Q. F. Liu, J. H. Tang, and H. Chen, “On-line optimal control of the gearshift command for multispeed electric vehicles,” IEEE/ASME Trans. Mechatronics, vol. 22, no. 4, pp. 1519–1530, 2017. doi: 10.1109/TMECH.2017.2716340
|
[19] |
J. H. Han, D. Kum, and Y. Park, “Synthesis of predictive equivalent consumption minimization strategy for hybrid electric vehicles based on closed-form solution of optimal equivalence factor, ” IEEE Trans. Veh. Technol., 2017.
|
[20] |
P. Nyberg, E. Frisk, and L. D. Nielsen, “Using real-world driving databases to generate driving cycles with equivalence properties,” IEEE Trans. Veh. Technol, vol. 65, no. 6, pp. 4095–4105, Jun. 2016. doi: 10.1109/TVT.2015.2502069
|
[21] |
T. Liu, X. L. Tang, H. Wang, H. Yu, and X. S. Hu, “Adaptive hierarchical energy management design for a plug-in hybrid electric vehicle,” IEEE Trans. Veh. Technol., Jul, 2019.
|
[22] |
T. Liu, Y. Zou, D. X. Liu, and F. C. Sun, “Reinforcement learning-based energy management strategy for a hybrid electric tracked vehicle,” Energies, vol. 8, no. 7, pp. 7243–7260, 2015. doi: 10.3390/en8077243
|
[23] |
M. Deniša, A. Gams, A. Ude, and T. Petric, “Learning compliant movement primitives through demonstration and statistical generalization,” IEEE/ASME Trans. Mechatronics, vol. 21, no. 5, pp. 2581–2594, 2017.
|
[24] |
V. Mnih, K. Kavukcuoglu, D. Silver, and A. Graves, “Playing atari with deep reinforcement learning, ” arXiv preprint, arXiv: 1312.5602, 2013.
|
[25] |
M. Hagan, H. Demuth, M. Beale, and O. De Jess, Neural Network Design, Boston, MA, Martin Hagan, 2014.
|
[26] |
F. A. Gers, N. N. Schraudolph, and J. Schmidhuber, “Learning precise timing with LSTM recurrent networks,” J. Machine Learning Research, vol. 3, no. 1, pp. 115–143, 2002.
|
[27] |
L. Li, S. X. You, C. Yang, B. J. Yan, J. Song, and Z. Chen, “Driving-behavior-aware stochastic model predictive control for plug-in hybrid electric buses,” Appl Energy, vol. 162, pp. 868–879, 2016. doi: 10.1016/j.apenergy.2015.10.152
|
[28] |
F.-Y. Wang, “Artificial societies, computational experiments, and parallel systems: a discussion on computational theory of complex social-economic systems,” Complex Syst. Complex. Sci., vol. 1, no. 4, pp. 25–35, Oct. 2004.
|
[29] |
F.-Y. Wang, “toward a paradigm shift in social computing: the ACP approach,” IEEE Intell. Syst., vol. 22, no. 5, pp. 65–67, Sept.–Oct. 2007. doi: 10.1109/MIS.2007.4338496
|
[30] |
F.-Y. Wang, “Parallel control and management for intelligent transportation systems: concepts, architectures, and applications,” IEEE Trans. Intell. Transp. Syst., vol. 11, no. 3, pp. 630–638, Sep. 2010. doi: 10.1109/TITS.2010.2060218
|
[31] |
F.-Y. Wang and S. N. Tang, “Artificial societies for integrated and sustainable development of metropolitan systems,” IEEE Intell. Syst., vol. 19, no. 4, pp. 82–87, Jul.–Aug. 2004. doi: 10.1109/MIS.2004.22
|
[32] |
F.-Y. Wang, H. G. Zhang, and D. R. Liu, “Adaptive dynamic programming: an introduction,” IEEE Comput. Intell. Magazine, vol. 4, no. 2, pp. 39–47, Jun. 2009.
|
[33] |
T. Liu, H. L. Yu, H. Y. Guo, Y. C. Qin, and Y. Zou, “Online energy management for multimode plug-in hybrid electric vehicles,” IEEE Trans. Industrial Informatics, vol. 15, no. 7, pp. 4352–4361, Jul. 2019.
|
[34] |
P. Shan, R. Li, S. H. Ning, and Q. Yang, “Markov decision process toolbox, ” in Proc. of IEEE Int. Workshop on Open-Source Software for Scientific Computation (OSSC), Sep. 2009.
|
[35] |
L. Li, Y. L. Lin, N. N. Zheng, and F.-Y. Wang, “Parallel learning: a perspective and a framework,” IEEE/CAA J. Autom. Sinica, vol. 4, no. 3, pp. 389–395, 2017. doi: 10.1109/JAS.2017.7510493
|
[36] |
C. Lv, X. S. Hu, A. Sangiovanni-Vincentelli, Y. T. Li, C. M. Martinez, and D. P. Cao, “Driving-style-based codesign optimization of an automated electric vehicle: a cyber-physical system approach,” IEEE Trans. Indus. Electron, vol. 66, no. 4, pp. 2965–2975, 2018.
|
[37] |
C. Lv, Y. Xing, C. Lu, Y. H. Liu, H. Y. Guo, H. B. Gao, and D. P. Cao, “Hybrid-learning-based classification and quantitative inference of driver braking intensity of an electrified vehicle,” IEEE Trans. Veh. Technol., vol. 67, no. 7, pp. 5718–5729, 2018.
|
[38] |
C. Lv, Y. Xing, J. Z. Zhang, X. X. Na, Y. T. Li, T. Liu, D. P. Cao, and F.-Y. Wang, “Leven-berg-marquardt backpropagation training of multilayer neural networks for state estimation of a safety-critical cyber-physical system,” IEEE Trans. Industrial Informatics, vol. 14, no. 8, pp. 3436–3446, 2017.
|
[39] |
Y. Xing, C. Lv, H. J. Wang, D. P. Cao, E. Velenis, and F.-Y. Wang, “Driver lane change intention inference for intelligent vehicles: framework, survey, and challenges,” IEEE Trans. Veh. Technol, vol. 68, no. 5, pp. 4377–4390, 2019.
|
[40] |
T. Liu and X. S. Hu, “A Bi-level control for energy efficiencyimprovement of a hybrid tracked vehicle,” IEEE Trans. Industrial Informatics, vol. 14, no. 4, pp. 1616–1625, 2018. doi: 10.1109/TII.2018.2797322
|
[1] | Feisheng Yang, Jiaming Liu, Xiaohong Guan. Distributed Fixed-Time Optimal Energy Management for Microgrids Based on a Dynamic Event-Triggered Mechanism[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(12): 2396-2407. doi: 10.1109/JAS.2024.124686 |
[2] | Yuanxin Lin, Zhiwen Yu, Kaixiang Yang, Ziwei Fan, C. L. Philip Chen. Boosting Adaptive Weighted Broad Learning System for Multi-Label Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(11): 2204-2219. doi: 10.1109/JAS.2024.124557 |
[3] | Xingxia Wang, Jing Yang, Yutong Wang, Qinghai Miao, Fei-Yue Wang, Aijun Zhao, Jian-Ling Deng, Lingxi Li, Xiaoxiang Na, Ljubo Vlacic. Steps Toward Industry 5.0: Building “6S” Parallel Industries With Cyber-Physical-Social Intelligence[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(8): 1692-1703. doi: 10.1109/JAS.2023.123753 |
[4] | Feiye Zhang, Qingyu Yang, Dou An. Privacy Preserving Demand Side Management Method via Multi-Agent Reinforcement Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(10): 1984-1999. doi: 10.1109/JAS.2023.123321 |
[5] | Aditya Joshi, Skieler Capezza, Ahmad Alhaji, Mo-Yuen Chow. Survey on AI and Machine Learning Techniques for Microgrid Energy Management Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(7): 1513-1529. doi: 10.1109/JAS.2023.123657 |
[6] | Zhe Chen, Ning Li. An Optimal Control-Based Distributed Reinforcement Learning Framework for A Class of Non-Convex Objective Functionals of the Multi-Agent Network[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(11): 2081-2093. doi: 10.1109/JAS.2022.105992 |
[7] | Yanni Wan, Jiahu Qin, Xinghuo Yu, Tao Yang, Yu Kang. Price-Based Residential Demand Response Management in Smart Grids: A Reinforcement Learning-Based Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2022, 9(1): 123-134. doi: 10.1109/JAS.2021.1004287 |
[8] | Mohammad Al-Sharman, David Murdoch, Dongpu Cao, Chen Lv, Yahya Zweiri, Derek Rayside, William Melek. A Sensorless State Estimation for A Safety-Oriented Cyber-Physical System in Urban Driving: Deep Learning Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2021, 8(1): 169-178. doi: 10.1109/JAS.2020.1003474 |
[9] | Abdulaziz Almalaq, Jun Hao, Jun Jason Zhang, Fei-Yue Wang. Parallel Building: A Complex System Approach for Smart Building Energy Management[J]. IEEE/CAA Journal of Automatica Sinica, 2019, 6(6): 1452-1461. doi: 10.1109/JAS.2019.1911768 |
[10] | Teng Liu, Bin Tian, Yunfeng Ai, Li Li, Dongpu Cao, Fei-Yue Wang. Parallel Reinforcement Learning: A Framework and Case Study[J]. IEEE/CAA Journal of Automatica Sinica, 2018, 5(4): 827-835. doi: 10.1109/JAS.2018.7511144 |
[11] | Bonan Huang, Yushuai Li, Huaguang Zhang, Qiuye Sun. Distributed Optimal Co-multi-microgrids Energy Management for Energy Internet[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(4): 357-364. |
[12] | Zhongwen Li, Chuanzhi Zang, Peng Zeng, Haibin Yu, Hepeng Li. MAS Based Distributed Automatic Generation Control for Cyber-Physical Microgrid System[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(1): 78-89. |
[13] | Xinping Guan, Bo Yang, Cailian Chen, Wenbin Dai, Yiyin Wang. A Comprehensive Overview of Cyber-Physical Systems: From Perspective of Feedback System[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(1): 1-14. |
[14] | Youxian Sun, Xinping Guan, Jiming Chen, Yilin Mo. Guest Editorial for Special Issue on Cyber-Physical Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(3): 233-234. |
[15] | Song Deng, Dong Yue, Xiong Fu, Aihua Zhou. Security Risk Assessment of Cyber Physical Power System Based on Rough Set and Gene Expression Programming[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(4): 431-439. |
[16] | Gang Xiong, Fenghua Zhu, Xiwei Liu, Xisong Dong, Wuling Huang, Songhang Chen, Kai Zhao. Cyber-physical-social System in Intelligent Transportation[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(3): 320-333. |
[17] | Hepeng Li, Chuanzhi Zang, Peng Zeng, Haibin Yu, Zhongwen Li. A Stochastic Programming Strategy in Microgrid Cyber Physical Energy System for Energy Optimal Operation[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(3): 296-303. |
[18] | Meng Zheng, Junru Lin, Wei Liang, Haibin Yu. A Priority-aware Frequency Domain Polling MAC Protocol for OFDMA-based Networks in Cyber-physical Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(4): 412-421. |
[19] | Yongsong Wei, Shaoyuan Li. Water Supply Networks as Cyber-physical Systems and Controllability Analysis[J]. IEEE/CAA Journal of Automatica Sinica, 2015, 2(3): 313-319. |
[20] | Xin Chen, Bo Fu, Yong He, Min Wu. Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(2): 127-133. |
Symbol | Value | Unit |
Vehicle mass Mv | 2500 | kg |
Inertia of generator Jg | 0.1 | kg·m2 |
Inertia of engine Je | 0.2 | kg·m2 |
Gear ratio parameter ieg | 1.2 | / |
Electromotive force parameter Ke | 0.8092 | Vsrad−2 |
Electromotive force parameter Kx | 0.0005295 | NmA−2 |
Minimum state of charge SOCmin | 0.5 | / |
Maximum sate of charge SOCmax | 0.9 | / |
Battery capacity Cbat | 37.5 | Ah |
Algorithms | Consumed fuel (g) | Time cost (s) |
PRL | 416.8 | 46.8 |
Deep RL | 441.3 | 61.5 |
Conventional RL | 465.7 | 53.7 |
Symbol | Value | Unit |
Vehicle mass Mv | 2500 | kg |
Inertia of generator Jg | 0.1 | kg·m2 |
Inertia of engine Je | 0.2 | kg·m2 |
Gear ratio parameter ieg | 1.2 | / |
Electromotive force parameter Ke | 0.8092 | Vsrad−2 |
Electromotive force parameter Kx | 0.0005295 | NmA−2 |
Minimum state of charge SOCmin | 0.5 | / |
Maximum sate of charge SOCmax | 0.9 | / |
Battery capacity Cbat | 37.5 | Ah |
Algorithms | Consumed fuel (g) | Time cost (s) |
PRL | 416.8 | 46.8 |
Deep RL | 441.3 | 61.5 |
Conventional RL | 465.7 | 53.7 |