论文:2024,Vol:42,Issue(1):117-128
引用本文:
焦杰, 苟永杰, 吴文博, 泮斌峰. 基于自适应增强随机搜索的航天器追逃博弈策略研究[J]. 西北工业大学学报
JIAO Jie, GOU Yongjie, WU Wenbo, PAN Binfeng. Research on game strategy of spacecraft chase and escape based on adaptive augmented random search[J]. Journal of Northwestern Polytechnical University

基于自适应增强随机搜索的航天器追逃博弈策略研究
焦杰1,2, 苟永杰3, 吴文博1,2, 泮斌峰1,2
1. 西北工业大学 航天学院, 陕西 西安 710072;
2. 航天飞行动力学技术国家级重点实验室, 陕西 西安 710072;
3. 上海宇航系统工程研究所, 上海 201108
摘要:
针对航天器与非合作目标追逃博弈的生存型微分对策拦截问题,基于强化学习研究了追逃博弈策略,提出了自适应增强随机搜索(adaptive-augmented random search,A-ARS)算法。针对序贯决策的稀疏奖励难题,设计了基于策略参数空间扰动的探索方法,加快策略收敛速度;针对可能过早陷入局部最优问题设计了新颖度函数并引导策略更新,可提升数据利用效率;通过数值仿真验证并与增强随机搜索(augmented random search,ARS)、近端策略优化算法(proximal policy optimization,PPO)以及深度确定性策略梯度下降算法(deep deterministic policy gradient,DDPG)进行对比,验证了此方法的有效性和先进性。
关键词:    非合作目标    追逃博弈    微分对策    强化学习    稀疏奖励   
Research on game strategy of spacecraft chase and escape based on adaptive augmented random search
JIAO Jie1,2, GOU Yongjie3, WU Wenbo1,2, PAN Binfeng1,2
1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China;
2. National Key Laboratory of Aerospace Flight Dynamics, Xi'an 710072, China;
3. Shanghai Aerospace Systems Engineering Institute, Shanghai 201108, China
Abstract:
To solve the problem of the survival differential policy interception between a spacecraft and a non-cooperative target pursuit game, the pursuit game policy is studied based on reinforcement learning, and the adaptive-augmented random search algorithm is proposed. Firstly, to solve the sparse reward problem of sequential decision making, an exploration method based on the spatial perturbation of parameters of the policy is designed, thus accelerating its convergence speed. Secondly, to avoid the possibility of falling into local optimum prematurely, a novelty degree function is designed to guide the policy update, enhancing the efficiency of data utilization. Finally, the effectiveness and advancement of the exploration method are verified with numerical simulations and compared with those of the augmented random search algorithm, the proximal policy optimization algorithm and the deep deterministic policy gradient algorithm.
Key words:    non-cooperative target    pursuit game    differential game theory    reinforcement learning    sparse reward   
收稿日期: 2022-12-29     修回日期:
DOI: 10.1051/jnwpu/20244210117
通讯作者: 泮斌峰(1981-),教授 e-mail:panbinfeng@nwpu.edu.cn     Email:panbinfeng@nwpu.edu.cn
作者简介: 焦杰(1999-),博士研究生
相关功能
PDF(2687KB) Free
打印本文
把本文推荐给朋友
作者相关文章
焦杰  在本刊中的所有文章
苟永杰  在本刊中的所有文章
吴文博  在本刊中的所有文章
泮斌峰  在本刊中的所有文章

参考文献:
[1] RUMFORD T E. Demonstration of autonomous rendezvous technology(dart) project summary[J]. Space Systems Technology and Dperations, 2003, 5088: 10-19
[2] WEISMULLER T, LEINZ M. GNC demonstrated by the orbital express autonomous rendezvous and capture sensor system[C]//Proceedings of the 29th Annual AAS Guidance and Control Conference, 2006
[3] 罗亚中, 李振瑜, 祝海. 航天器轨道追逃微分对策研究综述[J]. 中国科学: 技术科学, 2020, 50(12): 1533-1545 LUO Yazhong, LI Zhenyu, ZHU Hai. Survey on spacecraft orbital pursuit-evasion dferental games[J]. Scientia Sinica Technologica, 2020, 50(12): 1533-1545(in Chinese)
[4] 于大腾, 王华, 周晚萌. 考虑空间几何关系的反交会规避机动方法[J]. 国防科技大学学报, 2016, 38(6): 89-94 YU Dateng, WANG Hua, ZHOU Wanmeng. Anti-rendezvous evasive maneuver method considering space geometrical relationship[J]. Journal of National University of Defense Technology, 2016, 38(6): 89-94(in Chinese)
[5] 钱杏芳, 林瑞雄, 赵亚男. 导弹飞行力学[M]. 北京: 北京理工大学出版社, 2006 QIAN Xingfang, LIN Ruixiong, ZHAO Yanan. Missile flight mechanics[M]. Beijing: Beijing Institute of Technology Press, 2006(in Chinese)
[6] 李超勇. TBM拦截器制导与控制若干问题研究[D]. 哈尔滨: 哈尔滨工业大学, 2008 LI Chaoyong. Study on guidance and control problems for tactical ballistic missile interceptor[D]. Harbin: Harbin Institute of Technology, 2008(in Chinese)
[7] ISAACS R. Differential Games[M]. New York: John Wiley and Sons, 1965
[8] INNOCENTI M A, TARIAGIA V. Game tharec stalegies for spaceraif endezvous and mofion synchronzalion[C]//AIAA Guidance, Navigation and Control Conference, 2016
[9] BARCHAN R, GCHOSE D. An SDRE based difrenial game approach for maneuvering target in erception[C]//AIAA Guidance, Navigation, and Control Conference, 2015
[10] LI Z Y, ZHU H, YANG Z, A dimension-eduction solution of fre-time diferential games for spacecraft ursuit-evasion[J]. Acta Astronautica, 2019, 163: 201-210
[11] 刘延芳. 基于微分对策理论的拦截导弹末端制导律研究[D]. 哈尔滨: 哈尔滨工业大学, 2014 LIU Yanfang. Research on end-game guidance law for interceptor missile based on differential game theory[D]. Harbin: Harbin Institute of Technology, 2014(in Chinese)
[12] ZHU Q, SHAO Z J. Missile real-time receding horizon pursuit and evasion games guidance based on neural network[J]. Systems Engineering and Electronics, 2019(7): 1597-1605
[13] 曹雷. 基于深度强化学习的智能博弈对抗关键技术[J]. 指挥信息系统与技术, 2019, 10(5): 1-7 CAO Lei. Key technologies of intelligent game confrontation based on deep reinforcement learning[J]. Command Information System and Technology, 2019, 10(5): 1-7(in Chinese)
[14] CHUN X, ALFRIEND K T, ZHANG J, et al. Q-learning algorithm for pathplanning to maneuver through a satellite cluster[C]//AAS/AIAA Astro-Dynamics Specialist Conference, 2018: 218-268
[15] 刘冰雁, 叶雄兵, 高勇, 等. 基于分支深度强化学习的非合作目标追逃博弈策略求解[J]. 航空学报, 2020, 41(10): 348-358 LIU Bingyan, YE Xiongbing, GAO Yong, et al. Strategy solution of norn-cooperative target pursuit evasion game based on branching deep rein-forcement learning[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(10): 348-358(in Chinese)
[16] 吴其昌. 基于人工智能的航天器追逃博弈机动轨道自主规划方法[D]. 长沙:国防科技大学, 2019 WU Qichang. Autonomous planning of spacecraft pursuit-evasion maneuver trajectory based on artificial intelligence method[D]. Changsha: National University of Defense Technology, 2019(in Chinese)
[17] 吴其昌, 张洪波. 基于生存型微分对策的航天器追逃策略及数值求解[J]. 控制与信息技术, 2019(4): 39-43 WU Qichang, ZHANG Hongbo. Spacecraft pursuit strategy and numerical solution based on survival differential strategy[J]. Control and Information Technology, 2019(4): 39-43(in Chinese)
[18] 廖天. 航天器追逃博弈控制与求解方法研究[D]. 哈尔滨: 哈尔滨工业大学, 2021 LIAO Tian. Research on control and solving method of pursuit-evasion game for spacecraft[D]. Harbin: Harbin Institute of Technology, 2021(in Chinese)
[19] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//30th Annual Conference on Neural Information Processing Systems, 2017
[20] MANIA H, GUY A, RECHT B. Simple random search of static linear policies is competitive for reinforcement learning[C]//31st Annual Conference on Neural Information Processing Systems, 2018
[21] CONTI E, MADHAVAN V, SUCH F P, et al. Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents[C]//31st Annual Conference on Neural Information Processing Systems, 2018
[22] GU Y, CHENG Y, CHEN C L P, et al. Proximal policy optimization with policy feedback[J]. IEEE Trans on Systems, Man, and Cybernetics: Systems, 2021, 52(7): 4600-4610
[23] SILVER D, LEVER G, HEESS N, et al. Deterministic policy gradient algorithms[C]//International Conference on Machine Learning, 2014