Processing math: 100%
A journal of IEEE and CAA , publishes high-quality papers in English on original theoretical/experimental research and development in all areas of automation
Teng Liu, Bin Tian, Yunfeng Ai, Li Li, Dongpu Cao and Fei-Yue Wang, "Parallel Reinforcement Learning: A Framework and Case Study," IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827-835, July 2018. doi: 10.1109/JAS.2018.7511144
Citation: Teng Liu, Bin Tian, Yunfeng Ai, Li Li, Dongpu Cao and Fei-Yue Wang, "Parallel Reinforcement Learning: A Framework and Case Study," IEEE/CAA J. Autom. Sinica, vol. 5, no. 4, pp. 827-835, July 2018. doi: 10.1109/JAS.2018.7511144

Parallel Reinforcement Learning: A Framework and Case Study

doi: 10.1109/JAS.2018.7511144
Funds:

the National Natural Science Foundation of China 61503380

the Natural Science Foundation of Guangdong Province, China 2015A030310187

More Information
  • In this paper, a new machine learning framework is developed for complex system control, called parallel reinforcement learning. To overcome data deficiency of current data-driven algorithms, a parallel system is built to improve complex learning system by self-guidance. Based on the Markov chain (MC) theory, we combine the transfer learning, predictive learning, deep learning and reinforcement learning to tackle the data and action processes and to express the knowledge. Parallel reinforcement learning framework is formulated and several case studies for real-world problems are finally introduced.

     

  • Machine learning especially deep reinforcement learning (DRL) experiences an ultrafast development in recent years [1], [2]. No matter in traditional visual detection [3], dexterous manipulation in robotics [4], energy efficiency improvement [5], object localization [6], novel Atari game [7], [8], Leduc poker [9], Doom game [10] and text-based games [11], these data-driven learning approaches show great potential in improving performance and accuracy. However, there are still several issues to impede researchers applying DRL to handle the real complex system problems.

    One of the issues is lack of generalization capability to new goals [3]. DRL agents need to collect new data and learn new model parameters for a new target. It is computationally expensive to retrain the learning model. Hence, we need to utilize the limited data well to accommodate the environments via learning.

    Another issue is data inefficiency [8]. Acquiring large-scale action and interaction data of real complex systems is arduous. To explore control policy by themselves is very difficult for the learning systems. Thus, it is necessary to create a large number of observations for action and knowledge from the historical available data.

    Finally, the issue is data dependency and distribution. In practical systems, data samples dependency is often uncertain and probability distribution is usually variant. So, it is hard for DRL agents to consider the state, action and knowledge of a learning system in an integrated way.

    In order to address these difficulties, we develop a new parallel reinforcement learning framework for complex system control in this paper. We construct an artificial system analogy to the real system via modelling to constitute a parallel system. Based on the Markov chain (MC) theory, transfer learning, predictive learning, deep learning and reinforcement learning are exhibited to tackle data and action processes and to express knowledge. Furthermore, several application cases of parallel reinforcement learning are introduced to illustrate its usability. It is noticed that the proposed technique in this paper can be regarded as the specification of the parallel learning in [12].

    Fei-Yue Wang first initialized the parallel system theory in 2004 [13], [14]. In [13] and [14], ACP method was proposed to deal the complex system problem. ACP approach represents artificial societies (A) for modelling, computational experiments (C) for analysis, and parallel execution (P) for control. An artificial system is usually built by modelling, to explore the data and knowledge as the real system does. Through executing independently and complementally in these two systems, the learning model can be more efficient and less data-hungry. ACP approach has been applied in several fields to discuss different problems in complex systems [15]-[17].

    Transfer learning focuses on storing knowledge gained while solving one problem and applying it to a different but related problem. Taking driving cycles of vehicle as an example, we introduce mean traction force (MTF) components to achieve equivalent transformation of them. By transferring limited data via MTF, the generalization capability problem can be relieved.

    Predictive learning tries to use prior knowledge to build a model of environment by trying out different actions in various circumstances. Taking power demand for example, we introduce fuzzy encoding predictor to forecast the future power demand in different time steps. Based on the MC, historical available data can be used to solve the data inefficiency.

    Deep learning is defined via learning data representations, including multiple layers of nonlinear processing units and the supervised or unsupervised learning of feature representations in each layer. Reinforcement learning is concerned with how agents ought to take actions in an environment so as to maximize some notion of cumulative reward. The main contribution of this paper is combining parallel system with transfer learning, predictive learning, deep learning and reinforcement learning to formulate the parallel reinforcement learning framework to dispose the data dependency and distribution problems in real-world complex systems.

    The rest of this paper is organized as follows. Section Ⅱ introduces the parallel reinforcement learning framework and relevant components, then several case studies for real-world complex system problems are described in Section Ⅲ. Finally, we conclude the paper in Section Ⅳ.

    The purpose of parallel reinforcement learning is building a closed loop of data and knowledge in the parallel system to determine the next operation in each system, as shown in Fig. 1. The data represents the inputs and parameters in artificial and real systems. The knowledge means the records from state space to action space, which we name in the real system as experience and that in the artificial system as policy. The experience can be used to rectify the artificial model and updated policy is utilized to guide the real actor along with feedback from environment.

    Figure  1.  Parallel reinforcement learning framework.

    Cyber physical systems have attracted increasingly more concerns for their potentials to fuse computational processes with the physical world in the past two decades. Furthermore, cyber-physical-social systems (CPSS) augment the cyber physical system capacity by integrating the human and social characteristics to achieve more effective design and operation [18]. The ACP-driven parallel system framework is depicted in Fig. 2. The integration of the real and artificial system as a whole is called parallel system.

    Figure  2.  ACP-driven parallel system framework.

    In this framework, the physically-defined real system interacts with the software-defined artificial system via three coupled modules within the CPSS. The three modules are control and management, experiment and evaluation, learning and training. The first module belongs to decision maker in these two systems, the second one represents the evaluator and the final one indicates the learning controller.

    ACP = Artificial societies + Computational experiments + Parallel execution. Artificial system is often constructed by descriptive learning based on the observation on the real system due to the development in information and communication technologies. It can help learning controller store more computing results and make more flexible decisions. Thus, the artificial system is parallel to the real system and runs asynchronously to stabilize the learning process and extend the learning capability.

    In the computational experiment stage, the specifications of transfer learning, predictive learning and deep learning are illustrated by the MC theory, as we will discuss them later. For the parallel system, combining these learning processes with reinforcement learning, the parallel reinforcement learning is formulated to derive the experience and policy and to clarify the interaction of them. For a general parallel intelligent system, such knowledge can be applied in different tasks because the learning controller can handle several tasks via rational reasoning [19].

    Finally, parallel execution between the artificial and real systems is expected to enable an optimal operation of these systems [20]. Although the artificial system is drawn by the prior data of real system, it will be rectified and improved by the further observation. The consecutive updated knowledge in the artificial system is also used to instruct the real system operation in an efficient way. Owing to the communication of data and knowledge by parallel execution, these two systems can improve by self-guidance.

    In this paper, we choose driving cycles as an example to introduce transfer learning, which can be easily popularized for other data in the MC domain. A general driving cycle transformation methodology based on the mean tractive force (MTF) components is introduced in this section. This transformation can convert the existent driving cycles database into an equivalent one with a real MTF value to relieve the data lacking problem.

    MTF is defined as the tractive energy divided by the distance traveled for a whole driving cycle, which is integrated over the entire time interval [0, T ] as follows

    ˉF=1xTT0F(t)v(t)dt (1)

    where xT is the total distance traveled in a certain driving cycle and calculated as v(t)dt , v is the vehicle speed with respect to a certain driving cycle. F is the longitudinal force to propel the vehicle and computed as

    {F=Fa+Fr+FmFa=12ρaCdAv2,Fr=Mvgf,Fm=Mva (2)

    where Fa is aerodynamic drag, Fr is rolling resistance and Fm is inertial force. ρa is the air density, Cd is the aerodynamic coefficient, and A is the frontal area. Mv is the curb weight, g is the gravitational acceleration, f is the rolling friction coefficient and a is the acceleration.

    The vehicle operating modes are divided as traction, coasting, braking and idling according to the force imposed on the vehicle powertrain [21]. Hence, the time interval is depicted as

    {T=TtrTcoTbrTidTtr={t|F(t)>0,v(t)0},Tco={t|F(t)=0,v(t)0}Tbr={t|F(t)<0,v(t)0},Tid={t|v(t)=0} (3)

    where Ttr and Tco are the traction-mode and coasting-mode regions, respectively. Tbr represents the vehicle brakes and Tid is the idling set.

    From (3), it is obvious that the powertrain only provides positive power to wheels in the traction regions. MTF in (1) is specialized as follows:

    ˉF=1xLtTtrF(t)v(t)dt=ˉFa+ˉFr+ˉFm. (4)

    Then, MTF components ( α , β , γ ) are the statistic characteristics measures for a driving cycle that are defined as [22]

    {α=ˉFa12ρaCdA=1xLtTtrv3(t)dtβ=ˉFfmgf=1xLtTtrv(t)dtγ=ˉFmm=1xLtTtrav(t)dt. (5)

    Note that MTF components are related to the speed and acceleration for a specific driving cycle. These measures are employed as the constraints for driving cycle transformation.

    Definition decides MTF is unique for a specific driving cycle, thus inequality and equality constraints are employed to determine the transferred driving cycle. A cost function can be defined by the designer to choose an optimal equivalent one from a set of feasible solutions. This transformation is formulated as a non-linear program (NLP) as

                     min˜vf(˜v)s.t.     gi(˜v,Ttr,α,βorγ))=0,i=1,2,3             h1(˜v,Ttr,vcoast)<0             h2(˜v,TcoTbr,vcoast)0 (6)

    where ˜v is the transferred driving cycle, ( α , β , γ ) are the target MTF components, vcoast is the vehicle coasting speed. gi and hj are the constraints. Through this process, the transferred driving cycle related to the real conditions can be decided, and afterwards can be used for other operations, such as control and management [21], [22], see Fig. 3 for an illustration.

    Figure  3.  Transfer learning for driving cycles transformation.

    The purpose of transfer learning is converting historical available data into equivalent one to expand the database. The transferred data is strongly associated with the real environments. Thus, it can be used for generating adaptive control and operations in complex systems, so as to solve the generalization capability and data hungry problems.

    Taking power demand of vehicle for example, we introduce predictive learning to forecast the future power demand based on the observed data and processes in parallel system. A better understanding of the real system can then be described and applied to update the artificial system from these new experiences. A power demand prediction technology based on fuzzy encoding predictor (FEP) is illustrated in this section. This approach can also be used to draw more future knowledge from experiences for other parameters in the complex systems.

    Power demand is modelled as a finite-state MC [23] and depicted as Pdem={pj|j=1,,M}X , where XR is bounded. Transition probability of power demand is calculated by maximum likelihood estimator as

    {πij=P(p+=pj|p=pi)=NijNiNi=Mj=1Nij (7)

    where πij is the transition probability from pi to pj . p and p+ are the present and next one-step ahead power demands, respectively. Furthermore, Nij indicates the transition count number from pi to pj , and Ni is the total transition count number initiated from pi .

    All elements πij constitute the transition probability matrix Π . For fuzzy encoding technique, X is divided into a finite set of fuzzy subsets Φj,j=1,,M , where Φj is a pair ( X , μj()) and μj() is called Lebesgue measurable membership function and defined as

    μj:X[0,1]s.t.pX,j,1jM,μj(p)>0 (8)

    where μj(p) reflects the membership degree of pX in μj . It is noticed that a continuous state pX in the fuzzy encoding may be associated with several states pj of the underlying finite-state MC model [24].

    Two transformations are involved in FEP. The first transformation allocates an M -dimensional possibility (not probability) vector for each pX as

    ˜OT(p)=μT(p)=[μ1,μ2,,μM(p)]. (9)

    This transformation is named fuzzification and maps power demand in the space X to the vector in M -dimensional possibility vector space ˜X . Note that it is not necessary for the sum of the elements in possibility vector ˜O(p) to equal 1.

    The second transformation is called proportional possibility-to-probability transformation, in which the possibility vector ˜O(p) is converted into a probability vector O(p) by normalization [23], [24]:

    O(p)=˜O(p)Mj=1˜Oj(p) (10)

    where this transformation maps ˜X to an M -dimensional probability vector space, ˉX . The element πij in the transition probability matrix (TPM) Π is interpreted as a transition probability between Φi and Φj . To decode vectors in ˉX back to X , the probability distribution O+(p) is utilized to aggregate the membership function μ(p) to encode the probability vector of the next state in X :

    w+(p)=(O+(p))Tμ(p)=(O(p))TΠμ(p). (11)

    The expected value over the possibility vector leads to the next one-step ahead power demand in FEP:

    {p+=Xw+(y)ydy/Xw+(y)dyXw+(y)ydy=Mi=1Oi(p)Mj=1πijXyμj(y)dyXw+(y)dy=Mi=1Oi(p)Mj=1πijXμj(y)dy. (12)

    The centroid and volume of the membership function μj(p) is expressed as

    {ˉci=Xyμj(y)dyVj=Xμj(y)dy. (13)

    Thus, (12) is reformulated as

    {p+=Mi=1Oi(p)Mj=1πijVjˉcjMi=1Oi(p)Mj=1πijVj. (14)

    where expression (14) is the predicted one-step ahead power demand using FEP. Fig. 4 shows an example of predictive learning used for power demand prediction. By doing this, the future power demand of vehicle in different time steps can be determined, and then these data will be used for improving the management and operations in the parallel system by self-guidance.

    Figure  4.  Predictive learning for future power demand prediction.

    The goal of predictive learning is generating reasonable data from the prior existed data and real-time observations in the real world. We aim to minimize the differences between real samples and generated samples by tuning the parameters in the predictive learning methodology. Therefore, these generated data are responsible for deriving various experiences and guiding the complex system by learning process, so as to settle the data inefficiency and distribution problem.

    In the reinforcement learning framework, a learning agent interacts with a stochastic environment. We model the interaction as quintuple (S, A, Π , R, γ ), where sS and aA are state variables and control actions sets, Π is the transition probability matrix, rR is the reward function, and γ (0, 1) denotes a discount factor.

    The action value function Q(s , a) is defined as the expected reward starting from s and taking the action a :

    Q(s,a)=E{l=0γlrt+1+l|st=s,at=a}. (15)

    The action value function associated with an optimal policy can be found by the Q-learning algorithm as in [25]

    Q(st,at)Q(st,at)+η(r+γmaxaQ(st+1,a)Q(st,at)). (16)

    When the state and action space is large, for example the action at consists of several sub-actions, modelling Q-values Q(s , a) becomes difficult. In this situation, we use both state and action representations as input to a deep neural network to approximate the action value function.

    A deep neural network is composed of an input layer, one or more hidden layers and an output layer. As shown in Fig. 5(a), the input vector g=[g1,g2,,gR] is weighted by elements w1,w2,,wR , and then summed with a bias b to form the net input n as

    Figure  5.  Deep neural network and bidirectional long short-term memory.
    n=Ri=1wigi+b. (17)

    Then, the net input n is affected by an activation function h to generate the neuron output d .

    d=h(n) (18)

    where activation function usually includes activation function in the hidden layer h1 and activation function in the output layer h2 .

    In this paper, we propose a bidirectional long short-term memory [26] based deep reinforcement network (BiLSTM- DRN) to approximate the action value function in reinforcement learning, see Fig. 5 (b) for an illustration. This structure consists of a pair of deep neural networks, one for state variable st embedding and the other for control sub-actions cit embeddings. As the bidirectional LSTM has a larger capacity due to its nonlinear structure, we expect it will capture more details on how the embeddings in each sub-action are combined into an action embedding. Finally, a pairwise interaction function (e.g., inner product) is used to compute new Q(st , at) via combining the state and sub-actions neuron output as

    Q(st,at)=Ki=1Q(st,cit) (19)

    where K is the number of the sub-actions, and Q(st , cit) represents the expected accumulated future rewards by including this sub-action.

    Combining the ideas of parallel system, transfer learning, predictive learning and reinforcement learning, we can formulate a closed loop of data and knowledge, named parallel reinforcement learning, as described in Fig. 1. Several case studies for real-world complex system problems are introduced and discussed in the next section.

    Parallel reinforcement learning serves as a reasonable and suitable framework to analyse the real world complex system. It consists of a self-boosting process in the parallel system, a self-adaptive process by transfer learning, a self-guided process by predictive learning and big data screening and generating process by BiLSTM-DRN. Learning process becomes more efficient and continuous in the parallel reinforcement learning framework.

    Several complex systems have been researched and analysed in the perspective of parallel reinforcement learning, such as transportation systems [27], [28], and vision systems [29]. A traffic flow prediction system was designed in [27], which considered the spatial and temporal correlations inherently. First, an artificial system named stacked autoencoder model was built to learn generic traffic flow features. Second, the synthetic data were trained by a layer-wise greedy method in the deep learning architecture. Finally, predictive learning was used to achieve traffic flow prediction and self-guidance for the parallel system. A survey on the development of the data-driven intelligent transportation system (D-DITS) was introduced in [28]. The functionality of D-DITS's key components and some deployment issues associated with its future research were addressed in [28].

    Also, a parallel reinforcement learning framework has also been applied to address the problems in visual perception and understanding [29]. To draw an artificial vision system based on the observations from real scenes, the synthetic data can be used for feature analysis, object analysis and scene analysis. This novel research methodology, named parallel vision, was proposed for perception and understanding of complex scenes.

    Furthermore, autonomous learning system for vehicle energy efficiency improvement in [30] can also be put into parallel reinforcement learning framework. First, a plug-in hybrid electric vehicle was imitated to construct the parallel system. Then, historical driving record for the real vehicle was collected to learn autonomously the optimal fuel use via a deep neural network and reinforcement learning. Finally, this trained policy can guide the real vehicle operations and improve control performance. A better understanding of the real vehicle can then be obtained and used to adjust the artificial system from these new experiences.

    Recently, we designed a driving cycles transformation based adaptive energy management system for a hybrid electric vehicle (HEV). There exist two major difficulties in the energy management problem of HEV. First, most of energy management strategies or predefined rules cannot adapt to changing driving conditions. Second, model-based approaches used in energy management require accurate vehicle models, which bring a considerable model parameter calibration cost. Hence, we apply parallel reinforcement learning framework into the energy management problem of HEV, as depicted in Fig. 6. More precisely, the core idea of this methodology is bi-level.

    Figure  6.  Parallel reinforcement learning for energy management of HEV.

    The up-level characterizes how to transform driving cycles using transfer learning by considering the induced matrix norm (IMN). Specially, TPM of power demand are computed and IMN is employed as a critical criterion to identify the differences of TPMs and to determine the alteration of control strategy. The lower-level determines how to set the corresponding control strategies with the transferred driving cycle by using model-free reinforcement learning algorithm. In other words, we simulate the HEV as an artificial system to sample the possible energy management solutions, use transfer learning to make the computed strategies adaptive to real world driving conditions, and use reinforcement learning to generate the corresponding controls. Tests demonstrate that the proposed strategy exceeds the conventional reinforcement learning approach in both calculation speed and control performance.

    Furthermore, we construct an energy efficiency improvement system in parallel reinforcement learning framework for a hybrid tracked vehicle (HTV). Specifically, we combine the simulated artificial vehicle with real vehicle to constitute the parallel system, use predictive learning to realize power demand prediction for further self-guidance and use reinforcement learning for control policy calculation. This approach also includes two layers, see Fig. 7 for a visualization of such idea. The first layer discusses how to accurately forecast the future power demand using FEP based on the MC theory. Kullback-Leibler (KL) divergence rate is employed to decide the differences of TPMs and updating of control strategy. The second layer computes the relevant control policy based on the predicted power demand and reinforcement learning technique. Finally, comparison shows that the proposed control policy is superior to the primary reinforcement learning approach in energy efficiency improvement and computational speed.

    Figure  7.  Parallel reinforcement learning for energy efficiency of HTV.

    In the future, we plan to apply BiLSTM- DRN to process and train the large real vehicle data for optimal energy management strategy computation. The objective is to realize real-time control using the parallel reinforcement learning method in our self-made tracked vehicle. More importantly, we will apply parallel reinforcement learning framework into multi-missions of automated vehicles [30], such as decision making, trajectory planning and so on. To address the existing disadvantages of traditional data-driven methods, we expect that parallel reinforcement learning can promote the development of machine learning.

    The general framework and case studies of parallel reinforcement learning for complex systems are introduced in this paper. The purpose is to build a closed loop of data and knowledge in the parallel system to guide the real system operation or improve the artificial system precision. Particularly, ACP approach is used to construct the parallel system that contains an artificial system and a real system. Transfer learning is utilized to achieve driving cycle transformation by mean of tractive force components. Predictive learning is applied to forecast the future power demand via fuzzy encoding predictor. To train data in the large action and state space, we introduce BiLSTM-DRN to approximate the action value function in reinforcement learning.

    Data-driven models are usually viewed as a component irrelevant to the data in learning process, which results in the large-scale exploration and observation-insufficiency problems. Furthermore, data in these models tend to be inadequate, and the general principle to organize these models remains absent. By combining parallel system, transfer learning, predictive learning, deep learning and reinforcement learning, we believe that parallel reinforcement learning can effectively address these problems and promote the development of machine learning.

  • [1]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness, M. G. Bellemare, A. Graves, M. Riedmiller, A. K. Fidjeland, G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou, H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis, "Human-level control through deep reinforcement learning, " Nature, vol. 518, no. 7540, pp. 529-533, Feb. 2015. http://europepmc.org/abstract/med/25719670
    [2]
    D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, "Mastering the game of Go with deep neural networks and tree search, " Nature, vol. 529, no. 7587, pp. 484-489, Jan. 2016. http://www.ncbi.nlm.nih.gov/pubmed/26819042
    [3]
    Y. K. Zhu, R. Mottaghi, E. Kolve, J. J. Lim, A. Gupta, F. -F. Li, and A. Farhadi, "Target-driven visual navigation in indoor scenes using deep reinforcement learning, " in Proc. 2017 IEEE Int. Conf. Robotics and Automation (ICRA), Singapore, pp. 3357-3364. http://arxiv.org/abs/1609.05143
    [4]
    I. Popov, N. Heess, T. Lillicrap, R. Hafner, G. Barth-Maron, M. Vecerik, T. Lampe, Y. Tassa, T. Erez, and M. Riedmiller, "Data-efficient deep reinforcement learning for dexterous manipulation, " arXiv: 1704.03073, 2017. http://arxiv.org/abs/1704.03073
    [5]
    X. W. Qi, Y. D. Luo, G. Y. Wu, K. Boriboonsomsin, and M. J. Barth, "Deep reinforcement learning-based vehicle energy efficiency autonomous learning system, " in Proc. Intelligent Vehicles Symp. (Ⅳ), Los Angeles, CA, USA, pp. 1228-1233, 2017. http://www.researchgate.net/publication/318800742_Deep_reinforcement_learning-based_vehicle_energy_efficiency_autonomous_learning_system
    [6]
    J. C. Caicedo and S. Lazebnik, "Active object localization with deep reinforcement learning, " in Proc. IEEE Int. Conf. Computer Vision, Santiago, Chile, 2015, pp. 2488-2496. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=7410643
    [7]
    X. X. Guo, S. Singh, R. Lewis, and H. Lee, "Deep learning for reward design to improve Monte Carlo tree search in Atari games, " arXiv: 1604.07095, 2016. http://arxiv.org/abs/1604.07095
    [8]
    V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, D. Wierstra, and M. Riedmiller, "Playing Atari with deep reinforcement learning, " arXiv: 1312.5602, 2013.
    [9]
    J. Heinrich and D. Silver, "Deep reinforcement learning from self-play in imperfect-information games, " arXiv: 1603.01121, 2016. http://arxiv.org/abs/1603.01121
    [10]
    D. Hafner, "Deep reinforcement learning from raw pixels in doom, " arXiv: 1610.02164, 2016. http://arxiv.org/abs/1610.02164
    [11]
    K. Narasimhan, T. Kulkarni, and R. Barzilay, "Language understanding for text-based games using deep reinforcement learning, " arXiv: 1506.08941, 2015. http://arxiv.org/abs/1506.08941
    [12]
    L. Li, Y. L. Lin, N. N. Zheng, and F. Y. Wang, "Parallel learning: a perspective and a framework, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 3, pp. 389-395, Jul. 2017. http://ieeexplore.ieee.org/xpl/articleDetails.jsp?reload=true&arnumber=7974888
    [13]
    F. Y. Wang, "Artificial societies, computational experiments, and parallel systems: a discussion on computational theory of complex social-economic systems, " Complex Syst. Complex. Sci., vol. 1, no. 4, pp. 25-35, Oct. 2004. http://en.cnki.com.cn/Article_en/CJFDTOTAL-FZXT200404001.htm
    [14]
    F. Y. Wang, "Toward a paradigm shift in social computing: the ACP approach, " IEEE Intell. Syst., vol. 22, no. 5, pp. 65-67, Sep. -Oct. 2007. http://ieeexplore.ieee.org/document/4338496/
    [15]
    F. Y. Wang, "Parallel control and management for intelligent transportation systems: concepts, architectures, and applications, " IEEE Trans. Intell. Transp. Syst., vol. 11, no. 3, pp. 630-638, Sep. 2010. http://ieeexplore.ieee.org/document/5549912/
    [16]
    F. Y. Wang and S. N. Tang, "Artificial societies for integrated and sustainable development of metropolitan systems, " IEEE Intell. Syst., vol. 19, no. 4, pp. 82-87, Jul. -Aug. 2004. http://ieeexplore.ieee.org/abstract/document/1333039/
    [17]
    F. Y. Wang, H. G. Zhang, and D. R. Liu, "Adaptive dynamic programming: an introduction, " IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39-47, May 2009. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=4840325
    [18]
    F. Y. Wang, "The emergence of intelligent enterprises: From CPS to CPSS, " IEEE Intell. Syst., vol. 25, no. 4, pp. 85-88, Jul. -Aug. 2010. http://ieeexplore.ieee.org/document/5552591/
    [19]
    F. Y. Wang, N. N. Zheng, D. P. Cao, C. M. Martinez, L. Li, and T. Liu, "Parallel driving in CPSS: a unified approach for transport automation and vehicle intelligence, " IEEE/CAA J. of Autom. Sinica, vol. 4, no. 4, pp. 577-587, Oct. 2017. http://ieeexplore.ieee.org/document/8039015/
    [20]
    K. F. Wang, C. Gou, and F. Y. Wang, "Parallel vision: an ACP-based approach to intelligent vision computing, " Acta Automat. Sin., vol. 42, no. 10, pp. 1490-1500, Oct. 2016. http://www.aas.net.cn/EN/Y2016/V42/I10/1490
    [21]
    P. Nyberg, E. Frisk, and L. Nielsen, "Driving cycle equivalence and transformation, " IEEE Trans. Veh. Technol., vol. 66, no. 3, pp. 1963-1974, Mar. 2017. http://ieeexplore.ieee.org/document/7493605/
    [22]
    P. Nyberg, E. Frisk, and L. Nielsen, "Driving cycle adaption and design based on mean tractive force, " in Proc. 7th IFAC Symp. Advanced Automatic Control, Tokyo, Japan, vol. 7, no. 1, pp. 689-694, 2013. http://www.researchgate.net/publication/271479464_Driving_Cycle_Adaption_and_Design_Based_on_Mean_Tractive_Force?ev=auth_pub
    [23]
    D. P. Filev and I. Kolmanovsky, "Generalized markov models for real-time modeling of continuous systems, " IEEE Trans. Fuzzy Syst., vol. 22, no. 4, pp. 983-998, Aug. 2014. http://ieeexplore.ieee.org/document/6588289/
    [24]
    D. P. Filev and I. Kolmanovsky, "Markov chain modeling approaches for on board applications, " in Proc. 2010 American Control Conf., Baltimore, MD, USA, pp. 4139-4145. http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=5530610
    [25]
    T. Liu, X. S. Hu, S. E. Li, and D. P. Cao, "Reinforcement learning optimized look-ahead energy management of a parallel hybrid electric vehicle, " IEEE/ASME Trans. Mechatron., vol. 22, no. 4, pp. 1497-1507, Aug. 2017. http://ieeexplore.ieee.org/document/7932983/
    [26]
    A. Graves and J. Schmidhuber, "Framewise phoneme classification with bidirectional LSTM and other neural network architectures, " Neural Netw., vol. 18, no. 5-6, pp. 602-610, Jul. -Aug. 2005. http://www.ncbi.nlm.nih.gov/pubmed/16112549
    [27]
    Y. S. Lv, Y. J. Duan, W. W. Kang, Z. X. Li, and F. Y. Wang, "Traffic flow prediction with big data: a deep learning approach, " IEEE Trans. Intell. Transp. Syst., vol. 16, no. 2, pp. 865-873, Apr. 2015. http://ieeexplore.ieee.org/document/6894591/
    [28]
    J. P. Zhang, F. Y. Wang, K. F. Wang, W. H. Lin, X. Xu, and C. Chen, "Data-driven intelligent transportation systems: a survey, " IEEE Trans. Intell. Transp. Syst., vol. 12, no. 4, pp. 1624-1639, Dec. 2011. http://ieeexplore.ieee.org/document/5959985/
    [29]
    K. F. Wang, C. Gou, N. N. Zheng, J. M. Rehg, and F. Y. Wang, "Parallel vision for perception and understanding of complex scenes: methods, framework, and perspectives, " Artif. Intell. Rev., vol. 48, no. 3, pp. 299-329, Oct. 2017. doi: 10.1007%2Fs10462-017-9569-z
    [30]
    W. Liu, Z. H. Li, L. Li, and F. Y. Wang, "Parking like a human: A direct trajectory planning solution, " IEEE Trans. Intell. Transp. Syst., vol. 18, no. 12, pp. 3388-3397, Dec. 2017. http://ieeexplore.ieee.org/document/7902173/
  • Related Articles

    [1]Hongxing Xiong, Guangdeng Chen, Hongru Ren, Hongyi Li. Broad-Learning-System-Based Model-Free Adaptive Predictive Control for Nonlinear MASs Under DoS Attacks[J]. IEEE/CAA Journal of Automatica Sinica, 2025, 12(2): 381-393. doi: 10.1109/JAS.2024.124929
    [2]Oguzhan Dogru, Junyao Xie, Om Prakash, Ranjith Chiplunkar, Jansen Soesanto, Hongtian Chen, Kirubakaran Velswamy, Fadi Ibrahim, Biao Huang. Reinforcement Learning in Process Industries: Review and Perspective[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 283-300. doi: 10.1109/JAS.2024.124227
    [3]Jiaxin Ren, Jingcheng Wen, Zhibin Zhao, Ruqiang Yan, Xuefeng Chen, Asoke K. Nandi. Uncertainty-Aware Deep Learning: A Promising Tool for Trustworthy Fault Diagnosis[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(6): 1317-1330. doi: 10.1109/JAS.2024.124290
    [4]Yuanxin Lin, Zhiwen Yu, Kaixiang Yang, Ziwei Fan, C. L. Philip Chen. Boosting Adaptive Weighted Broad Learning System for Multi-Label Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(11): 2204-2219. doi: 10.1109/JAS.2024.124557
    [5]Jiawen Kang, Junlong Chen, Minrui Xu, Zehui Xiong, Yutao Jiao, Luchao Han, Dusit Niyato, Yongju Tong, Shengli Xie. UAV-Assisted Dynamic Avatar Task Migration for Vehicular Metaverse Services: A Multi-Agent Deep Reinforcement Learning Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(2): 430-445. doi: 10.1109/JAS.2023.123993
    [6]Luigi D’Alfonso, Francesco Giannini, Giuseppe Franzè, Giuseppe Fedele, Francesco Pupo, Giancarlo Fortino. Autonomous Vehicle Platoons In Urban Road Networks: A Joint Distributed Reinforcement Learning and Model Predictive Control Approach[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(1): 141-156. doi: 10.1109/JAS.2023.123705
    [7]Zizhang Qiu, Shouguang Wang, Dan You, MengChu Zhou. Bridge Bidding via Deep Reinforcement Learning and Belief Monte Carlo Search[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(10): 2111-2122. doi: 10.1109/JAS.2024.124488
    [8]Min Yang, Guanjun Liu, Ziyuan Zhou, Jiacun Wang. Probabilistic Automata-Based Method for Enhancing Performance of Deep Reinforcement Learning Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(11): 2327-2339. doi: 10.1109/JAS.2024.124818
    [9]Fei Ming, Wenyin Gong, Ling Wang, Yaochu Jin. Constrained Multi-Objective Optimization With Deep Reinforcement Learning Assisted Operator Selection[J]. IEEE/CAA Journal of Automatica Sinica, 2024, 11(4): 919-931. doi: 10.1109/JAS.2023.123687
    [10]Zefeng Zheng, Luyao Teng, Wei Zhang, Naiqi Wu, Shaohua Teng. Knowledge Transfer Learning via Dual Density Sampling for Resource-Limited Domain Adaptation[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(12): 2269-2291. doi: 10.1109/JAS.2023.123342
    [11]Aditya Joshi, Skieler Capezza, Ahmad Alhaji, Mo-Yuen Chow. Survey on AI and Machine Learning Techniques for Microgrid Energy Management Systems[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(7): 1513-1529. doi: 10.1109/JAS.2023.123657
    [12]Sibo Cheng, César Quilodrán-Casas, Said Ouala, Alban Farchi, Che Liu, Pierre Tandeo, Ronan Fablet, Didier Lucor, Bertrand Iooss, Julien Brajard, Dunhui Xiao, Tijana Janjic, Weiping Ding, Yike Guo, Alberto Carrassi, Marc Bocquet, Rossella Arcucci. Machine Learning With Data Assimilation and Uncertainty Quantification for Dynamical Systems: A Review[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(6): 1361-1387. doi: 10.1109/JAS.2023.123537
    [13]Xinya Wang, Qian Hu, Yingsong Cheng, Jiayi Ma. Hyperspectral Image Super-Resolution Meets Deep Learning: A Survey and Perspective[J]. IEEE/CAA Journal of Automatica Sinica, 2023, 10(8): 1668-1691. doi: 10.1109/JAS.2023.123681
    [14]Teng Liu, Bin Tian, Yunfeng Ai, Fei-Yue Wang. Parallel Reinforcement Learning-Based Energy Efficiency Improvement for a Cyber-Physical System[J]. IEEE/CAA Journal of Automatica Sinica, 2020, 7(2): 617-626. doi: 10.1109/JAS.2020.1003072
    [15]Li Li, Yilun Lin, Nanning Zheng, Fei-Yue Wang. Parallel Learning: a Perspective and a Framework[J]. IEEE/CAA Journal of Automatica Sinica, 2017, 4(3): 389-395. doi: 10.1109/JAS.2017.7510493
    [16]Li Li, Yisheng Lv, Fei-Yue Wang. Traffic Signal Timing via Deep Reinforcement Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2016, 3(3): 247-254.
    [17]Brian Gaudet, Roberto Furfaro. Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(4): 397-411.
    [18]Zhen Zhang, Dongbin Zhao. Clique-based Cooperative Multiagent Reinforcement Learning Using Factor Graphs[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3): 248-256.
    [19]Xin Chen, Bo Fu, Yong He, Min Wu. Timesharing-tracking Framework for Decentralized Reinforcement Learning in Fully Cooperative Multi-agent System[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(2): 127-133.
    [20]Hao Wang, Shunguo Fan, Jinhua Song, Yang Gao, Xingguo Chen. Reinforcement Learning Transfer Based on Subgoal Discovery and Subtask Similarity[J]. IEEE/CAA Journal of Automatica Sinica, 2014, 1(3): 257-266.

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(7)

    Article Metrics

    Article views (1649) PDF downloads(112) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return