s
###
DOI:
飞控与探测:2021,(5):34-43
本文二维码信息
码上扫一扫!
行星软着陆GPS有模型强化学习制导方法
(西北工业大学 航天学院 航天飞行动力学技术国家级重点实验室)
Guidance Method of Planetary Soft Landing with GPS Model-Based Reinforcement Learning
ZHANG Yangkang1,2,SUN Chen1,2,PAN Binfeng1,2
(1.School of Astronautics, Northwestern Polytechnical University;2.National Key Laboratory of Aerospace Flight Dynamics)
摘要
相似文献
参考文献
本文已被:浏览 55次   下载 17
    
中文摘要: 由于距离地球较远、测控延时误差较大、飞行环境十分复杂且难以提前预测,行星软着陆的自主制导技术目前存在水平位置估计困难、导航参考信息匮乏、复杂地形着陆困难等挑战。针对行星软着陆存在的困难和挑战,提出了基于引导策略搜索算法的有模型强化学习制导方法,实现了着陆器在初始状态受到扰动时,无需重新规划,仍能在满足约束条件的情况下降落在指定位置。该方法将迭代线性二次调节器作为控制器,产生初始轨迹;其次,使用多层神经网络拟合制导策略;最后,利用控制器监督策略学习,进而收敛产生可行策略。针对行星表面软着陆的仿真验证结果显示该算法仅通过几次循环,即可以实现初始状态变化的快速软着陆。一方面表明了基于有模型强化学习的数据高效利用率,另一方面也证明了强化学习方法在深空探测领域中具有广阔的应用前景。
中文关键词: 迭代线性二次调节器;引导策略搜索;有模型强化学习;行星软着陆
Abstract:Due to the distance from the earth, the large delay error in measurement and control system, the complicated flight environment and the difficulty in predicting in advance, the autonomous guidance technology for planetary soft landing currently has challenges such as difficult horizontal position estimation, lack of navigation reference information, and difficult terrain landing. A model-based reinforcement learning guidance method based on guided policy search(GPS) is proposed to this issue, which realizes that when the lander is disturbed in the initial state, there is no need to re-plan, and it can still fall to the specified condition under constraints. In this method, the iterative linear quadratic regulator is used as the controller to generate the initial trajectory; secondly, a multi-layer neural network is used to fit the guidance policy; finally, the controller supervises the policy learning and then converges to generate a feasible policy. This paper takes the soft landing of the planet surface as an example for simulation verification. The simulation results show that the algorithm can achieve soft landing rapidly with the changed initial state only through a few training. On the one hand, it shows the efficient use of data based on model-based reinforcement learning; on the other hand, it also proves that the reinforcement learning method has broad application prospects in the field of deep space exploration.
keywords: iterative Linear Quadratic Regulator (iLQR); Guided Policy Search (GPS); model-based reinforcement learning; planetary soft landing
文章编号:20210505     中图分类号:TN911.73; TP391.9    文献标志码:A
基金项目:装备预研实验室基金(6142210200312)
作者单位
张阳康 西北工业大学 航天学院 航天飞行动力学技术国家级重点实验室 
孙 晨 西北工业大学 航天学院 航天飞行动力学技术国家级重点实验室 
泮斌峰 西北工业大学 航天学院 航天飞行动力学技术国家级重点实验室 
Author NameAffiliation
ZHANG Yangkang School of Astronautics, Northwestern Polytechnical University; National Key Laboratory of Aerospace Flight Dynamics 
SUN Chen School of Astronautics, Northwestern Polytechnical University; National Key Laboratory of Aerospace Flight Dynamics 
PAN Binfeng School of Astronautics, Northwestern Polytechnical University; National Key Laboratory of Aerospace Flight Dynamics 
引用文本:
张阳康,孙 晨,泮斌峰.行星软着陆GPS有模型强化学习制导方法[J].飞控与探测,2021,(5):34-43.

分享按钮