论文:2022,Vol:40,Issue(5):1055-1064
引用本文:
张云燕, 魏瑶, 刘昊, 杨尧. 基于深度强化学习的端到端无人机避障决策[J]. 西北工业大学学报
ZHANG Yunyan, WEI Yao, LIU Hao, YANG Yao. End-to-end UAV obstacle avoidance decision based on deep reinforcement learning[J]. Journal of Northwestern Polytechnical University

基于深度强化学习的端到端无人机避障决策
张云燕1, 魏瑶2, 刘昊2, 杨尧3
1. 西北工业大学 电子信息学院, 陕西 西安 710072;
2. 西北工业大学 航天学院, 陕西 西安 710072;
3. 西北工业大学 无人系统技术研究院, 陕西 西安 710072
摘要:
针对传统无人机避障算法需要构建离线三维地图以及速度控制不连续、速度方向选择受限的问题,基于深度确定性策略梯度(deep deterministic policy gradient,DDPG)的深度强化学习算法,对无人机连续型动作输出的端到端避障决策方法展开研究。建立了基于DDPG算法的端到端决策控制模型,该模型可以根据感知得到的连续状态信息输出连续控制变量即无人机避障动作;在UE4+Airsim的平台下进行了训练验证表明该模型可以实现端到端的无人机避障决策,与数据来源相同的三维向量场直方图(three dimensional vector field histogram,3DVFH)避障算法模型进行了对比分析,实验表明DDPG算法对无人机的避障轨迹有更好的优化效果。
关键词:    无人机    避障    DDPG    强化学习   
End-to-end UAV obstacle avoidance decision based on deep reinforcement learning
ZHANG Yunyan1, WEI Yao2, LIU Hao2, YANG Yao3
1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710072, China;
2. School of Astronautics, Northwestern Polytechnical University, Xi'an 710072, China;
3. Unmanned System Research Institute, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:
Aiming at the problem that the traditional UAV obstacle avoidance algorithm needs to build offline three-dimensional maps, discontinuous speed control and limited speed direction selection, we study the end-to-end obstacle avoidance decision method of UAV continuous action output based on DDPG(deep deterministic policy gradient) deep reinforcement learning algorithm. Firstly, an end-to-end decision control model based on DDPG algorithm is established. The model can output continuous control variables, namely UAV obstacle avoidance actions, according to the continuous state information perceived. Secondly, the training verification is carried out on the platform of UE4 + Airsim. The results show that the model can realize the end-to-end UAV obstacle avoidance decision. Finally, the 3DVFH(three dimensional vector field histogram) obstacle avoidance algorithm model with the same data source is compared and analyzed. The experiment shows that DDPG algorithm has better optimization effect on the obstacle avoidance trajectory of UAV.
Key words:    UAV    obstacle avoidance    deep deterministic policy gradient (DDPG)    reinforcement learning   
收稿日期: 2021-12-14     修回日期:
DOI: 10.1051/jnwpu/20224051055
基金项目: 陕西省自然科学基金青年项目(2021JQ-075)与上海航天科技创新基金(SAST2020-070)资助
通讯作者:     Email:
作者简介: 张云燕(1982—),西北工业大学助理研究员,主要从事无人系统控制与决策研究。e-mail:yyzhangppwk@163.com
相关功能
PDF(4433KB) Free
打印本文
把本文推荐给朋友
作者相关文章
张云燕  在本刊中的所有文章
魏瑶  在本刊中的所有文章
刘昊  在本刊中的所有文章
杨尧  在本刊中的所有文章

参考文献:
[1] WEI C, ZHANG F, YIN C, et al. Research on UAV intelligent obstacle avoidance technology during inspection of transmission line[C]//The 2015 International Conference on Applied Mechanics, Mechatronics and Intelligent Systems, 2015:319-327
[2] HWANGBO M, KUFFNER J, KANADE T. Efficient two-phase 3D motion planning for small fixed-wing uavs[C]//IEEE International Conference on Robotics and Automation, 2007:10-14
[3] HENG L, MEIER L, TANSKANEN P, et al, Autonomous obstacle avoidance and maneuvering on a vision-guided MAV using on-board processing[C]//IEEE International Conference on Robotics and Automation, 2011:2472-2477
[4] KUFFNER J, LAVALLE S. RRT-connect:an efficient approach to single query path planning[C]//IEEE International Conference on Robotics and Automation, 2000:995-1001
[5] DEITS R, TEDRAKE R. Efficient mixed-integer planning for UAVs in cluttered environments[C]//IEEE International Conference on Robotics and Automation, 2016:42-49
[6] SHIM D, CHUNG H, KIM H J, et al. Sastry, autonomous exploration in unknown urban environments for unmanned aerial vehicles[C]//AIAA Guidance, Navigation, and Control Conference and Exhibit, 2005:1-8
[7] 杨坤山.基于深顶学习的语义图像分割在三维重建系统中应用与实时化[D].成都:电子科技大学,2019 YANG Kunshan. Application and real-time of ivnage sematic segmentation based on deep lecrning in 3D rocomstmction system[D]. Chengdu:University of Electronic Science and Technolgey of China, 2019(in Chinese)
[8] WAYDO S. Vehicle motion planning using stream functions[C]//IEEE International Conference on Robotics and Automation, 2003, 2:2484-2491
[9] TAI L, LI S, LIU M. A deep-network solution towards model-less obstacle avoidance[C]//2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, 2016:2759-2764
[10] Sutton R S, Barto A G. Reinforcement learning:An introduction[J]. Cambridge:MIT Press, 2018
[11] VANNESTE S, BELLEKENS B, WEYN M. 3DVFH+:real-time three-dimensional obstacle avoidance using an octomap[C]//Procedings of the 1st International Workshop on Model-Diwen Robot Scitwane Engineering Foundations, 2014
[12] KIM M S, HAN D K, PARK J H, et al. Motion planning of robot manipulators for a smoother path using a twin delayed deep deterministic policy gradient with hindsight experience replay[J]. Applied Sciences, 2020, 10(2):575
[13] PETERS J, VIJAYAKUMAR S, SCHAAL S. Natural actor-critic[C]//European Conference on Machine Learining, Berlin, 2005:280-291
[14] SRDJAN J, VICTOR G, ANDREW B. Airway anatomy of airsim high-fidelity simulator[J]. Anesthesiology, 2013, 118(1):229-230
相关文献:
1.符小卫, 徐哲, 王辉.基于DDPG的无人机追捕任务泛化策略设计[J]. 西北工业大学学报, 2022,40(1): 47-55
2.魏瑶, 刘志成, 蔡彬, 陈家新, 杨尧, 张凯.基于深度循环双Q网络的无人机避障算法研究[J]. 西北工业大学学报, 2022,40(5): 970-979
3.李樾, 韩维, 陈清阳, 张勇.基于改进的速度障碍法的有人/无人机协同系统三维实时避障方法[J]. 西北工业大学学报, 2020,38(2): 309-318