论文:2022,Vol:40,Issue(1):47-55
引用本文:
符小卫, 徐哲, 王辉. 基于DDPG的无人机追捕任务泛化策略设计[J]. 西北工业大学学报
FU Xiaowei, XU Zhe, WANG Hui. Generalization strategy design of UAVs pursuit evasion game based on DDPG[J]. Northwestern polytechnical university

基于DDPG的无人机追捕任务泛化策略设计
符小卫, 徐哲, 王辉
西北工业大学 电子信息学院, 陕西 西安 710129
摘要:
无人机追逃对抗问题是当今空战领域的研究热点,传统解决方案对此问题存在诸多限制,如模型难以适应复杂动态环境从而快速做出决策、对不同任务场景泛化性较差等问题。基于DDPG (deep deterministic policy gradient)算法设计了无人机追逃对抗策略;在此基础上,设计多种逃逸无人机的对抗机动策略,利用课程学习思想,在DDPG的训练过程中逐步提高逃逸无人机的智能程度,从而递进式地训练追捕无人机的对抗策略。仿真结果表明,相较于直接进行训练,利用课程学习的方法所训练的追捕无人机的追捕策略能够更快收敛,并能更好地执行对敌机的追捕任务,且能够适用于具有多种对抗机动策略的敌机,有效地提升了无人机追逃对抗决策模型的泛化性。
关键词:    无人机    追逃对抗    深度强化学习    DDPG    课程学习   
Generalization strategy design of UAVs pursuit evasion game based on DDPG
FU Xiaowei, XU Zhe, WANG Hui
School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710129, China
Abstract:
UAVs pursuit evasion game is a research hotspot in the field of air combat. Traditional solutions have many limitations to this problem, such as the difficulty of the model to adapt to complex dynamic environments to quickly make decisions, and the poor generalization of different mission scenarios. Based on the DDPG(deep deterministic policy gradient) algorithm, a mathematical model of UAVs pursuit and evasion countermeasures is established in this paper. On this basis, this research designs a variety of countermaneuver strategies for escaping UAV, and uses the training method of course learning ideas. In the training process, the intelligence of the escaping UAV is gradually improved, so as to progressively train the confrontation strategy of the chasing UAV. The simulation results show that compared with direct training, the pursuit strategy of the chasing UAV trained by the research method of course learning can converge faster, and can better perform the hunting mission of enemy aircraft, and can be applied to a variety of enemy aircraft with a variety of maneuvering strategies, which effectively improved the generalization of the UAV's pursuit and escape confrontation decision model.
Key words:    UAV    pursuit-evasion game    deep reinforcement learning    DDPG    curriculum learning   
收稿日期: 2021-06-11     修回日期:
DOI: 10.1051/jnwpu/20224010047
基金项目: 航空科学基金(2020Z023053001)资助
通讯作者:     Email:
作者简介: 符小卫(1976—),西北工业大学副教授,主要从事无人机控制、管理与决策与航空火力控制研究。e-mail:fxw@nwpu.edu.cn
相关功能
PDF(2369KB) Free
打印本文
把本文推荐给朋友
作者相关文章
符小卫  在本刊中的所有文章
徐哲  在本刊中的所有文章
王辉  在本刊中的所有文章

参考文献:
[1] 邵将, 徐扬, 罗德林. 无人机多机协同对抗决策研究[J]. 信息与控制, 2018, 47(3):347-354 SHAO Jiang, XU Yang, LUO Delin. Cooperative combat decision-making research for multi UAVs[J]. Information and Control, 2018, 47(3):347-354(in Chinese)
[2] 魏航. 基于强化学习的无人机空中格斗算法研究[D]. 哈尔滨:哈尔滨工业大学, 2015 WEI Hang. Research of UCAV air combat based on reinforcement learning[D]. Harbin:Harbin Institute of Technology, 2015(in Chinese)
[3] 孟秋楠. 有约束微分对策问题及其在空战对抗中的应用[D]. 沈阳:沈阳航空航天大学, 2018 MENG Qiunan. Differential game with constrained its application in air combat[D]. Shenyang:Shenyang Aerospace University, 2018(in Chinese)
[4] 谢剑. 基于微分博弈论的多无人机追逃协同机动技术研究[D]. 哈尔滨:哈尔滨工业大学, 2015 XIE Jian. Differential game theory for multi UAV pursuit maneuver technology based on collaborative research[D]. Harbin:Harbin Institute of Technology, 2015(in Chinese)
[5] CHIN H H. Knowledge-based system of super maneuver selection for pilot aiding[J]. Journal of Aircraft, 1971, 26(12):1111-1117
[6] MATHESON J E. Using influence diagrams to value information and control[M]. New York:John Wiley & Sons, 1988
[7] 李高垒, 马耀飞. 基于深度网络的空战态势特征提取[J]. 系统仿真学报, 2017, 29(增刊1):98-105 LI Gaolei, MA YaoFei. Feature extraction algorithm of air combat situation based on deep neural networks[J]. Journal of System Simulation, 2017, 29(suppl 1):98-105(in Chinese)
[8] 张耀中, 许佳林, 姚康佳, 等. 基于DDPG算法的无人机集群追击任务[J]. 航空学报, 2020, 41(10):314-326 ZHANG Yaozhong, XU Jialin, YAO Kangjia, et al. Pursuit missions for UAV swarms based on DDPG algorithm[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(10):314-326(in Chinese)
[9] 陈灿, 莫雳, 郑多, 等. 非对称机动能力多无人机智能协同攻防对抗[J]. 航空学报, 2020, 41(12):342-354 CHEN Can, MO Li, ZHENG Duo, et al. Cooperative attack-defense game of multiple UAVs with asymmetric maneuverability[J]. Acta Aeronautica et Astronautica Sinica, 2020, 41(12):342-354(in Chinese)
[10] 史豪斌, 徐梦. 基于强化学习的旋翼无人机智能追踪方法[J]. 电子科技大学学报, 2019, 48(4):553-559 SHI Haobin, XU Meng. An intelligent tracking method of rotor UAV based on reinforcement learning[J]. Journal of Electronic Science and Technology University, 2019, 48(4):553-559(in Chinese)
[11] 苏治宝, 陆际联, 童亮. 一种多移动机器人协作围捕策略[J]. 北京理工大学学报, 2004, 24(5):26-29 SU Zhibao, LU Jilian, TONG Liang. Strategy of Cooperative Hunting by Multiple Mobile Robots[J]. Journal of Beijing Institute of Technology, 2004, 24(5):26-29(in Chinese)
[12] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J]. Computer Science, 2015, 8(6):A187
[13] 杨瑞, 严江鹏, 李秀. 强化学习稀疏奖励算法研究-理论与实验[J]. 智能系统学报, 2020, 15(5):888-899 YANG Rui, YAN Jiangpeng, LI Xiu. A survey on sparse reward algorithms in reinforcement learning-theory and experiment[J]. CAAI Transactions on Intelligent Systems, 2020, 15(5):888-899(in Chinese)
[14] SHEN H L, FURUKAWA T, DISSANAYAKE G, et al. A time-optimal control strategy for pursuit-evasion games problems[C]//Proceedings of International Conference on Robotics and Automation, New Orleans, LA, USA, 2004