Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning

Brian Gaudet; Roberto Furfaro

Volume 1 Issue 4

Oct. 2014

IEEE/CAA Journal of Automatica Sinica

JCR Impact Factor: 15.3, Top 1 (SCI Q1)

CiteScore: 23.5, Top 2% (Q1)
Google Scholar h5-index: 77， TOP 5

Turn off MathJax

Article Contents

Article Navigation > IEEE/CAA Journal of Automatica Sinica > 2014 > 1(4): 397-411

Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.

Citation:

Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.

Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.

Citation:

Brian Gaudet and Roberto Furfaro, "Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning," IEEE/CAA J. of Autom. Sinica, vol. 1, no. 4, pp. 397-411, 2014.

PDF( 2354 KB)

Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning

Brian Gaudet¹,
Roberto Furfaro²

1. Research Engineer, Department of Systems and Industrial Engineering, University of Arizona, 1127 E. Roger Way, Tucson Arizona, 85721, USA;
2. Assistant Professor, Department of Systems and Industrial Engineering, Department of Aerospace and Mechanical Engineering, University of Arizona, 1127 E. James E. Rogers Way, Tucson, AZ 85721, USA

Abstract

Abstract

Future unconstrained and science-driven missions to Mars will require advanced guidance algorithms that are able to adapt to more demanding mission requirements, e.g. landing on selected locales with pinpoint accuracy while autonomously flying fuel-efficient trajectories. In this paper, a novel guidance algorithm designed by applying the principles of reinforcement learning (RL) theory is presented. The goal is to devise an adaptive guidance algorithm that enables robust, fuel efficient, and accurate landing without the need for off line trajectory generation and real-time tracking. Results from a Monte Carlo simulation campaign show that the algorithm is capable of autonomously following trajectories that are close to the optimal minimum-fuel solutions with an accuracy that surpasses that of past and future Mars missions. The proposed RL-based guidance algorithm exhibits a high degree of flexibility and can easily accommodate autonomous retargeting while maintaining accuracy and fuel efficiency. Although reinforcement learning and other similar machine learning techniques have been previously applied to aerospace guidance and control problems (e.g., autonomous helicopter control), this appears, to the best of the authors knowledge, to be the first application of reinforcement learning to the problem of autonomous planetary landing.
- Mars landing guidance,
- reinforcement learning,
- policy iteration,
- Markov decision process

FullText(HTML)

References(36)

References

[1]	Steltzner A, Kipp D, Chen A, Burkhart D, Guernsey C, Mendeck G,Mitcheltree R, Powell R, Rivellini T, San Martin M, Way D. Marsscience laboratory entry, descent, and landing system. In: Proceedingsof the 2006 IEEE Aerospace Conference. Big Sky, MT: IEEE, 2006.
[2]	Shotwell R. Phoenix-the first Mars scout mission. Acta Astronautica,2005, 57(2-8): 121-134
[3]	Singh G, SanMartin A M, Wong E C. Guidance and control design forpowered descent and landing on Mars. In: Proceedings of the 2006 IEEEAerospace Conference. Big Sky, MT: IEEE, 2007. 1-8
[4]	Klumpp A R. Apollo Guidance, Navigation, and Control: Apollo Lunar-Descent Guidance. Massachusetts Institute of Technology, Charles StarkDraper Lab, TR R-695, Cambridge, MA, 1971.
[5]	Klumpp A R. Apollo lunar descent guidance. Automatica, 1974, 10(2):133-146
[6]	Chomel C T, Bishop R H. Analytical lunar descent guidance algorithm.Journal of Guidance, Control, and Dynamics, 2009, 32(3): 915-926
[7]	Furfaro R, Selnick S, Cupples M L, Cribb M W. Non-linear slidingguidance algorithms for precision lunar landing (AAS 11-167). In:Proceedings of the 21st AAS/AIAA Space Flight Mechanics Meeting.San Diego, CA: American Astronautical Society by Univelt, 2011.945-964
[8]	Bishop C M. Pattern Recognition and Machine Learning. Berlin, Heidelberg:Springer, 2006.
[9]	Calise A J, Rysdyk R T. Nonlinear adaptive flight control using neuralnetworks. IEEE Control Systems Magazine, 1998, 18(6): 14-25
[10]	Sutton R S, Barto A G. Reinforcement Learning: An Introduction.Cambridge, MA: MIT Press, 1998. 100-103
[11]	Gaudet B, Furfaro R. Adaptive Pinpoint and Fuel Efficient Mars Landingusing Reinforcement Learning (AAS 12-191). In: Proceeding of the 22ndSpaceflight Mechanics Meeting. San Diego, CA: American AstronauticalSociety by Univelt, 2012. 1309-1328
[12]	Ng A Y, Kim H J, Jordan M I, Sastry S. Autonomous helicopter flightvia reinforcement learning. Advances in Neural Information ProcessingSystems 16. Cambridge, MA: MIT Press, 2004.
[13]	Munos R, Szepesv´ari C. Finite-time bounds for fitted value iteration.Journal of Machine Learning Research, 2008, 1: 815-857
[14]	Powell W B. Approximate Dynamic Programming: Solving the Cursesof Dimensionality (Second edition). Hoboken, N.J.: Wiley, 2011.
[15]	Bishop C M. Pattern Recognition and Machine Learning. Berlin, Heidelberg:Springer, 2006.
[16]	Koller D, Friedman N. Probabilistic Graphical Models. Massachusetts:MIT Press, 2009.
[17]	Tuckness D G. Analysis of a terminal landing on Mars. Journal ofSpacecraft and Rockets, 1995, 32(1): 142-148
[18]	Coates A, Abbeel P, Ng A Y. Learning for control from multipledemonstrations. In: Proceedings of the 25th International Conferenceon Machine Learning. New York, USA: ACM, 2008. 144-151
[19]	Huntington G T. Advancement and Analysis of Gauss PseudospectralTranscription for Optimal Control Problems [Ph. D. dissertation], MassachusettsInstitute of Technology, Cambridge MA, 2007
[20]	Françolin C C, Benson D A, Hager W W, Rao A V. Costate approximationin optimal control using integral Gaussian quadrature orthogonalcollocation methods. Optimal Control Applications and Methods, to bepublished
[21]	Patterson M A, Hager W W, Rao A V. A ph mesh refinement methodfor optimal control. Optimal Control Applications and Methods, to bepublished
[22]	Patterson M A, Rao A V. GPOPS-II: a MATLAB software for solvingmultiple-phase optimal control problems using Hp-adaptive Gaussianquadrature collocation methods and sparse nonlinear programming.ACM Transactions on Mathematical Software, 2013, 39(3), Article 1
[23]	Patterson M A, Rao A V. Exploiting sparsity in direct collocationpseudospectral methods for solving optimal control problems. Journalof Spacecraft and Rockets, 2012, 49(2): 354-377
[24]	Darby C L, Garg D, Rao A V. Costate estimation using multiple-intervalpseudospectral methods. Journal of Spacecraft and Rockets, 2011, 48(5):856-866
[25]	Darby C L, Hager W W, Rao A V. An Hp-adaptive pseudospectralmethod for solving optimal control problems. Optimal Control Applicationsand Methods, 2011, 32(4): 476-502
[26]	Garg D, Patterson M A, Françolin C, Darby C L, Huntington G T,Hager W W, Rao A V. Direct trajectory optimization and costate estimationof finite-horizon and infinite-horizon optimal control problemsusing a Radau pseudospectral method. Computational Optimization andApplications, 2011, 49(2): 335-358
[27]	Darby C L, Hager W W, Rao A V. Direct trajectory optimization using avariable low-order adaptive pseudospectral method. Journal of Spacecraftand Rockets, 2011, 48(3): 433-445
[28]	Garg D, Hager W W, Rao A V. Pseudospectral methods for solvinginfinite-horizon optimal control problems. Automatica, 2011, 47(4):829-837
[29]	Rao A V, Benson D A, Darby C, Patterson M A, Francolin C, SandersI, Huntington G T. Algorithm 902: GPOPS, A MATLAB softwarefor solving multiple-phase optimal control problems using the Gausspseudospectral method. ACM Transactions on Mathematical Software,2010, 37(2), Article 22
[30]	MacKay D J C. Bayesian interpolation. Neural Computation, 1992, 4(3):415-447
[31]	Rumelhart D E, Hinton G E, Williams R J. Learning representations byback-propagating errors. Nature, 1986, 323(6088): 533-536
[32]	Gonz´alez R. Neural Networks for Variational Problems in Engineering[Ph.D dissertation], University of Catalonia, Spain, 2008
[33]	Spall J C. Introduction to Stochastic Search and Optimization. New York:Wiley, 2003.
[34]	Fu M C. What you should know about simulation and derivatives. NavalResearch Logistics (NRL), 2008, 55(8): 723-736
[35]	Acikmese B, Ploen S R. Convex programming approach to powereddescent guidance for mars landing. Journal of Guidance, Control, andDynamics, 2007, 30(5): 1353-1366
[36]	Gill P E, Murray W, Saunders M A. SNOPT: an SQP algorithm for largescaleconstrained optimization. SIAM Review, 2005, 47(1): 99-131

Supplements(0)

Cited By

Proportional views

Proportional views

通讯作者: 陈斌, bchen63@163.com

1.
沈阳化工大学材料科学与工程学院沈阳 110142

Get Citation

PDF

XML

Article Metrics

Article views (1304) PDF downloads(22)

Adaptive Pinpoint and Fuel Efficient Mars Landing Using Reinforcement Learning

Abstract

References

Proportional views

Catalog

通讯作者: 陈斌, bchen63@163.com

Article Metrics

Export File

Citation

Format

Content