论文:2022,Vol:40,Issue(5):970-979
引用本文:
魏瑶, 刘志成, 蔡彬, 陈家新, 杨尧, 张凯. 基于深度循环双Q网络的无人机避障算法研究[J]. 西北工业大学学报
WEI Yao, LIU Zhicheng, CAI Bin, CHEN Jiaxin, YANG Yao, ZHANG Kai. Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network[J]. Journal of Northwestern Polytechnical University

基于深度循环双Q网络的无人机避障算法研究
魏瑶1, 刘志成2, 蔡彬3,4, 陈家新3,4, 杨尧5, 张凯5
1. 西北工业大学 航天学院, 陕西 西安 710072;
2. 空军装备部驻北京地区军事代表局驻天津地区第三军事代表室, 天津 300000;
3. 上海航天控制技术研究所, 上海 201109;
4. 中国航天科技集团有限公司红外探测技术研发中心, 上海 201109;
5. 西北工业大学 无人系统技术研究院, 陕西 西安 710072
摘要:
针对传统强化学习方法在机器运动规划领域,尤其是无人机避障问题上存在价值函数过度估计以及部分可观测性导致网络训练过程中训练时间长、难以收敛的问题,提出一种基于深度循环双Q网络的无人机避障算法。通过将单网络结构变换为双网络结构,解耦最优动作选择和动作价值估计降低价值函数过度估计;在双网络模块的全连接层引入GRU循环神经网络模块,利用GRU处理时间维度信息,增强真实神经网络的可分析性,提高算法在部分可观察环境中的性能。在此基础上,结合强化学习优先经验回放机制加快网络收敛。在仿真环境中分别对原有算法以及改进算法进行测试,实验结果表明,该算法在训练时间、避障成功率以及鲁棒性方面均有更好的性能。
关键词:    深度强化学习    无人机    避障    循环神经网络    DDQN   
Study on UAV obstacle avoidance algorithm based on deep recurrent double Q network
WEI Yao1, LIU Zhicheng2, CAI Bin3,4, CHEN Jiaxin3,4, YANG Yao5, ZHANG Kai5
1. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China;
2. The Third Military Representative Office of Beijing Military Representative Office of Air Force Equipment Department in Tianjin, Tianjin 300000, China;
3. Shanghai Aerospace Control Technology Institute, Shanghai 201109, China;
4. Infrared Detection Technology R & D Center of China Aerospace Science and Technology Corporation, Shanghai 201109, China;
5. Unmanned System Research Institute, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:
The traditional reinforcement learning method has the problems of overestimation of value function and partial observability in the field of machine motion planning, especially in the obstacle avoidance problem of UAV, which lead to long training time and difficult convergence in the process of network training. This paper proposes an obstacle avoidance algorithm for UAVs based on a deep recurrent double Q network. By transforming the single-network structure into a dual-network structure, the optimal action selection and action value estimation are decoupled to reduce the overestimation of the value function. The fully connected layer introduces the GRU recurrent neural network module, and uses the GRU to process the time dimension information, enhance the analyzability of the real neural network, and improve the performance of the algorithm in some observable environments. On this basis, combining with the priority experience playback mechanism, the network convergence is accelerated. Finally, the original algorithm and the improved algorithm are tested in the simulation environment. The experimental results show that the algorithm has better performance in terms of training time, obstacle avoidance success rate and robustness.
Key words:    deep reinforcement learning    UAV    obstacle avoidance    recurrent neural network    DDQN   
收稿日期: 2021-12-17     修回日期:
DOI: 10.1051/jnwpu/20224050970
基金项目: 上海航天科技创新基金(SAST2020-070)资助
通讯作者:     Email:
作者简介: 魏瑶(1998—),西北工业大学硕士研究生,主要从事强化学习、无人机避障研究。e-mail:weiyaonwpu@163.com
相关功能
PDF(3355KB) Free
打印本文
把本文推荐给朋友
作者相关文章
魏瑶  在本刊中的所有文章
刘志成  在本刊中的所有文章
蔡彬  在本刊中的所有文章
陈家新  在本刊中的所有文章
杨尧  在本刊中的所有文章
张凯  在本刊中的所有文章

参考文献:
[1] 吕倩,陶鹏,吴宏,等.无人机在室内环境中自主飞行与避障[J].软件导刊, 2021, 20(2):114-118 LYU Qian, TAO Peng, WU Hong, et al. Autonomous flight and obstacle avoidance of mav in indoor environment[J]. Software Guide, 2021, 20(2):114-118(in Chinese)
[2] 赖武刚,刘宗汶,姬凯飞,等.无人机的激光雷达自主巡逻飞行器系统设计[J].单片机与嵌入式系统应用, 2020, 20(5):4 LAI Wugang, LIU Zongwen, JI Kaifei, et al. System design of autonomous patrol UAV based on lidar[J]. Microcontrollers and Embedded System Applications, 2020, 20(5):4(in Chinese)
[3] 李博昊,罗咏涵,彭克勤.基于激光SLAM的室内移动机器人设计[J].数码世界, 2020, 12:16-18 LI Bohao, LUO Yonghan, PENG Keqin. Design of indoor mobile robot based on laser SLAM[J]. Digital World, 2020, 12:16-18(in Chinese)
[4] 杨维,朱文球,张长隆.基于RGB-D相机的无人机快速自主避障[J].湖南工业大学学报, 2015, 29(6):6 YANG Wei, ZHU Wenqiu, ZHANG Changlong. UAV rapid and autonomous obstacle avoidance based on RGB-D camera[J]. Journal of Hunan University of Technology, 2015, 29(6):6(in Chinese)
[5] GIUSTI A, GUZZI J, DAN C C, et al. A machine learning approach to visual perception of forest trails for mobile robots[J]. IEEE Robotics&Automation Letters, 2017, 1(2):661-667
[6] HADSELL R, ERKAN A, SERMANET P, et al. Deep belief net learning in a long-range vision system for autonomous off-road driving[C]//Proceedings of the 2008 IEEE Conference on Intelligent Robots and Systems, New York, 2008:628-633
[7] 邢关生,张凯文,杜春燕.基于深度强化学习的移动机器人避障算法[C]//第30届中国过程控制会议, 2019 XING Guansheng, ZHANG Kaiwen, DU Chunyan. Mobile robot obstacle avoidance algorithm based on deep reinforcement learning[C]//The 30th China Process Control Conference, 2019(in Chinese)
[8] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529
[9] KULKARNI T D, NARASIMHAN K R, SAEEDI A, et al. Hierarchical deep reinforcement learning:integrating temporal abstraction and intrinsic[J/OL].(2016-05-31)[2021-12-02]. https://arxiv.org/pdf/1604.06057.pdf
[10] SILVER D, HUANG A, MADDISON C J, et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484-489
[11] SADEGHI F, LEVINE S. CAD2RL:Real single-image flight without asingle real image[J/OL].(2016-06-08)[2021-12-02]. https://arxiv.org/pdf/1611.04201v4.pdf
[12] SINGLA A, PADAKANDLA S, BHATNAGAR S. Memory-based deep reinforcement learning for obstacle avoidance in UAV with limited environment knowledge[J]. IEEE Trans on Intelligent Transportation Systems, 2019, 99:1-12
[13] HOWARD R A. Dynamic programming and Markov processes[J]. Mathematical Gazette, 1960, 3(358):120
[14] BERTSEKAS D P, BERTSEKAS D P, BERTSEKAS D P, et al. Dynamic programming and optimal control[M]. Belmont, MA:Athena Scientific, 1995:20-25
[15] SCHMIDHUBER J. Multi-column deep neural networks for image classification[C]//2012 IEEE Conference on Computer Vision and Pattern Recognition, New York, 2012:3642-3649
[16] LONG D, ZHAN R, MAO Y. Recurrent neural networks with finite memory length[J]. IEEE Access, 2019, 7:3642-3649
[17] CHUNG J, GULCEHRE C, CHO K H, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling[J/OL].(2014-12-11)[2021-12-02]. https://arxiv.org/pdf/1412.3555.pdf
[18] SRDJAN J, VICTOR G, ANDREW B. Airway anatomy of airsim high-fidelity simulator[J]. Anesthesiology, 2013, 118(1):229-230
相关文献:
1.张云燕, 魏瑶, 刘昊, 杨尧.基于深度强化学习的端到端无人机避障决策[J]. 西北工业大学学报, 2022,40(5): 1055-1064
2.符小卫, 徐哲, 王辉.基于DDPG的无人机追捕任务泛化策略设计[J]. 西北工业大学学报, 2022,40(1): 47-55
3.李樾, 韩维, 陈清阳, 张勇.基于改进的速度障碍法的有人/无人机协同系统三维实时避障方法[J]. 西北工业大学学报, 2020,38(2): 309-318