论文:2023,Vol:41,Issue(1):56-64
引用本文:
李永丰, 吕永玺, 史静平, 李卫华. 深度确定性策略梯度和预测相结合的无人机空战决策研究[J]. 西北工业大学学报
LI Yongfeng, LYU Yongxi, SHI Jingping, LI Weihua. UAV's air combat decision-making based on deep deterministic policy gradient and prediction[J]. Journal of Northwestern Polytechnical University

深度确定性策略梯度和预测相结合的无人机空战决策研究
李永丰1, 吕永玺1,2, 史静平1,2, 李卫华1
1. 西北工业大学 自动化学院, 陕西 西安 710129;
2. 陕西省飞行控制与仿真技术重点实验室, 陕西 西安 710129
摘要:
针对无人机自主空战机动决策过程中遇到的敌方不确定性操纵问题,提出了一种目标机动指令预测和深度确定性策略梯度算法相结合的无人机空战自主机动决策方法。对空战双方的态势数据进行有效的融合和处理,搭建无人机六自由度模型和机动动作库,在空战中目标通过深度Q网络算法生成相应机动动作库指令,同时我方无人机通过概率神经网络给出目标机动的预测结果。提出了一种同时考虑了两机态势信息和敌机预测结果的深度确定性策略梯度强化学习方法,使得无人机能够根据当前空战态势选择合适的机动决策。仿真结果表明,该算法可以有效利用空战态势信息和目标机动预测信息,在保证收敛性的前提下提高无人机自主空战决策强化学习算法的有效性。
关键词:    无人机    空战机动决策    预测    深度确定性策略梯度   
UAV's air combat decision-making based on deep deterministic policy gradient and prediction
LI Yongfeng1, LYU Yongxi1,2, SHI Jingping1,2, LI Weihua1
1. School of Automation, Northwestern Polytechnical University, Xi'an 710129, China;
2. Shaanxi Provincial Key Laboratory of Flight Control and Simulation Technology, Xi'an 710129, China
Abstract:
To solve the enemy uncertain manipulation problem during a UAV's autonomous air combat maneuver decision-making, this paper proposes an autonomous air combat maneuver decision-making method that combines target maneuver command prediction with the deep deterministic policy algorithm. The situation data of both sides of air combat are effectively fused and processed, the UAV's six-degree-of-freedom model and maneuver library are built. In air combat, the target generates its corresponding maneuver library instructions through the deep Q network algorithm; at the same time, the UAV on our side gives the target maneuver prediction results through the probabilistic neural network. A deep deterministic policy gradient reinforcement learning method that considers both the situation information of two aircraft and the prediction results of enemy aircraft is proposed, so that the UAV can choose the appropriate maneuver decision according to the current air combat situation. The simulation results show that the method can effectively use the air combat situation information and target maneuver prediction information so that it can improve the effectiveness of the reinforcement learning method for UAV's autonomous air combat decision-making on the premise of ensuring convergence.
Key words:    UAV    air combat maneuver decision-making    prediction    deep deterministic policy gradient   
收稿日期: 2022-04-25     修回日期:
DOI: 10.1051/jnwpu/20234110056
基金项目: 国家自然科学基金(62173277,61573286)、陕西省自然科学基金(2019JM-163,2020JQ-218,2022JM-011)与航空科学基金(20180753006,201905053004)资助
通讯作者: 吕永玺(1990-),西北工业大学助理研究员,主要从事飞行控制与控制方法研究。e-mail:yongxilyu@nwpu.edu.cn     Email:yongxilyu@nwpu.edu.cn
作者简介: 李永丰(1995-),西北工业大学博士研究生,主要从事飞行控制与无人机空战方法研究。
相关功能
PDF(2513KB) Free
打印本文
把本文推荐给朋友
作者相关文章
李永丰  在本刊中的所有文章
吕永玺  在本刊中的所有文章
史静平  在本刊中的所有文章
李卫华  在本刊中的所有文章

参考文献:
[1] EHTAMO H, RAIVIO T. On Applied nonlinear and bilevel programming for some pursuit-evasion games[J]. Journal of Optimization Theory and Applications, 2001, 108(1):65-96
[2] 顾佼佼, 赵建军, 刘卫华. 基于博弈论及Memetic算法求解的空战机动决策框架[J]. 电光与控制, 2015, 22(1):20-23 GU Jiaojiao, ZHAO Jianjun, LIU Weihua. Air combat maneuvering decision framework based on game theory and memetic algorithm[J]. Electronics Optics & Control, 2015, 22(1):20-23 (in Chinese)
[3] 万伟, 姜长生, 吴庆宪. 单步预测影响图法在空战机动决策中的应用[J]. 电光与控制, 2009, 16(7):13-17 WAN Wei, JIANG Changsheng, WU Qingxian. Application of one-step prediction influence diagram in air combat maneuvering decision[J]. Electronics Optics & Control, 2009, 16(7):13-17 (in Chinese)
[4] KUMAR S, JAIN S, KUMAR H. Prediction of jatropha-algae biodiesel blend oil yield with the application of artificial neural networks technique[J]. Energy Sources, 2018, 41(7/8/9/10/11/12):1285-1295
[5] SMITH R E, DIKE B A, MEHRA R K. Classifier systems in combat:two-sided learning of maneuvers for advanced fighter aircraft[J]. Computer Methods in Applied Mechanics and Engineering, 2000, 186(2/3/4):421-437
[6] 丁林静, 杨啟明. 基于强化学习的无人机空战机动决策[J]. 火力与指挥控制, 2018,49(2):29-35 DING Linjing, YANG Qiming. Research on air combat maneuver decision of UAVs based on reinforcement learning[J]. Avionics Technology, 2018, 49(2):29-35 (in Chinese)
[7] YANG Q, ZHANG J, SHI G, et al. Maneuver decision of UAV in short-range air combat based on deep reinforcement learning[J]. IEEE Access, 2020, 8:363-378
[8] BAI S, SONG S, LIANG S, et al. UAV maneuvering decision-making algorithm based on twin delayed deep deterministic policy gradient algorithm[J]. Journal of Artificial Intelligence and Technology, 2022, 2(1):16-22
[9] LI B, YANG Z P, CHEN D Q, et al. Maneuvering target tracking of UAV based on MN-DDPG and transfer learning[J]. Defence Technology, 2021, 17(2):457-466
[10] WANG L, HU J, XU Z, et al. Autonomous maneuver strategy of swarm air combat based on DDPG[J]. Journal of Artificial Intelligence and Technology, 2021, 1(1):232-243
[11] ZHANG J, YANG Q, SHI G, et al. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning[J]. Journal of Systems Engineering & Electronics, 2021, 32(6):1421-1438
[12] 韩占朋, 王玉惠, 程聪. 态势估计方法研究综述[J]. 航空兵器, 2013(1):14-19 HAN Zhanpeng, WANG Yuhui, CHENG Cong. Summary on situation assessment method research[J]. Aero Weaponry, 2013(1):14-19 (in Chinese)
[13] 毛梦月, 张安, 周鼎, 等. 基于机动预测的强化学习无人机空中格斗研究[J]. 电光与控制, 2019, 26(2):5-10 MAO Mengyue, ZHANG An, ZHOU Ding, et al. Reinforcement learning of UCAV air combat based on maneuver prediction[J]. Electronics Optics and Control, 2019, 26(2):5-10 (in Chinese)
相关文献:
1.闫党辉, 章卫国, 陈航.基于误差模型的多约束鲁棒编队控制器的设计[J]. 西北工业大学学报, 2022,40(5): 1012-1020
2.邵壮, 祝小平, 周洲, 王彦雄.无人机编队机动飞行时的队形保持反馈控制[J]. 西北工业大学学报, 2015,33(1): 26-32