论文:2021,Vol:39,Issue(5):1057-1063
引用本文:
胡春阳, 王恒, 史豪斌. 基于强化学习的机械臂自主视觉感知控制方法[J]. 西北工业大学学报
HU Chunyang, WANG Heng, SHI Haobin. Robotic arm reinforcement learning control method based on autonomous visual perception[J]. Northwestern polytechnical university

基于强化学习的机械臂自主视觉感知控制方法
胡春阳1, 王恒2, 史豪斌2
1. 湖北文理学院 计算机工程学院, 湖北 襄阳 441053;
2. 西北工业大学 计算机学院, 陕西 西安 710129
摘要:
传统机械臂控制方法按照人为预设固定轨迹来对其进行控制,完成特定的任务,依赖于精确的环境模型,并且控制过程缺乏一定的自适应性。为解决该问题,提出一种自主视觉感知与强化学习相结合的端到端机械臂智能控制方法。该方法中视觉感知使用YOLO算法,策略控制模块使用DDPG强化学习算法,使机械臂能够在复杂的环境中学习到自主控制策略,并且在训练过程使用了模仿学习与后视经验重播,加速了机械臂的学习过程。实验结果表明算法能够在更短的时间内收敛,并且在仿真环境中自主感知目标位置及整体策略控制都有着出色的表现。
关键词:    机器视觉    强化学习    模仿学习    系统仿真    智能控制   
Robotic arm reinforcement learning control method based on autonomous visual perception
HU Chunyang1, WANG Heng2, SHI Haobin2
1. School of Computer, Hubei University of Arts and Science, Xiangyang 441053, China;
2. School of Computer, Northwestern Polytechnical University, Xi'an 710129, China
Abstract:
The traditional robotic arm control methods are often based on artificially preset fixed trajectories to control them to complete specific tasks, which rely on accurate environmental models, and the control process lacks the ability of self-adaptability. Aiming at the above problems, we proposed an end-to-end robotic arm intelligent control method based on the combination of machine vision and reinforcement learning. The visual perception uses the YOLO algorithm, and the strategy control module uses the DDPG reinforcement learning algorithm, which enables the robotic arm to learn autonomous control strategies in a complex environment. Otherwise, we used imitation learning and hindsight experience replay algorithm during the training process, which accelerated the learning process of the robotic arm. The experimental results show that the algorithm can converge in a shorter time, and it has excellent performance in autonomously perceiving the target position and overall strategy control in the simulation environment.
Key words:    machine vision    reinforcement learning    imitation learning    system simulation    intelligent control   
收稿日期: 2021-06-03     修回日期:
DOI: 10.1051/jnwpu/20213951057
基金项目: 湖北省科技厅重点研发项目(2020BBB092)和湖北省教育厅科学研究计划重点项目(D20192602)资助
通讯作者: 史豪斌(1978-),西北工业大学教授,主要从事智能机器人、群机器人协同合作及机器人路径规划与导航研究。e-mail:shihaobin@nwpu.edu.cn     Email:shihaobin@nwpu.edu.cn
作者简介: 胡春阳(1975-),湖北文理学院副教授、博士,主要从事云计算、大数据和机器学习研究。
相关功能
PDF(2574KB) Free
打印本文
把本文推荐给朋友
作者相关文章
胡春阳  在本刊中的所有文章
王恒  在本刊中的所有文章
史豪斌  在本刊中的所有文章

参考文献:
[1] LUMELSKY V, STEPANOV A. Dynamic path planning for a mobile automaton with limited information on the environment[J]. IEEE Trans on Automatic Control, 1986, 31(11):1058-1063
[2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 12(19):5602
[3] LILLICRAP T P, HUNT J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J/OL]. (2019-07-05)[2021-10-22]. https://arxiv.org/pdf/1509.02971v6.pdf
[4] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J/OL]. (2017-08-28)[2021-10-22]. https://arxiv.org/pdf/1707.06347.pdf
[5] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning, 2016:1928-1937
[6] LI J, SHI H, HWANG K S. An explainable ensemble feedforward method with Gaussian convolutional filter[J]. Knowledge Based Systems, 2021(225):107103
[7] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016
[8] SCHAUL T, HORGAN D, GREGOR K, et al. Universal value function approximators[C]//International Conference on Machine learning, 2015:1312-1320
[9] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:5055-5065
[10] HESTER T, VECERIK M, PIETQUIN O, et al. Deep q-learning from demonstrations[C]//Thirty-Second AAAI Conference on Artificial Intelligence, 2018
[11] VECERÍK M, HESTER T, SCHOLZ J, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards[J/OL]. (2018-10-08)[2021-10-22]. https://arxiv.org/pdf/1707.08817v1.pdf
[12] BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4:optimal speed and accuracy of object detection[J/OL]. (2020-04-23)[2021-10-22]. https://arxiv.org/pdf/2004.10934v1.pdf
[13] Wu Y H, Lin S D. A low-cost ethics shaping approach for designing reinforcement learning agents[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018
[14] CHRISTIANO P F, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, 2017