基于强化学习的机械臂自主视觉感知控制方法 -- 西北工业大学学报,2021,39(5):1057-1063

	论文:2021,Vol:39,Issue(5):1057-1063
	引用本文：
	胡春阳, 王恒, 史豪斌. 基于强化学习的机械臂自主视觉感知控制方法[J]. 西北工业大学学报
	HU Chunyang, WANG Heng, SHI Haobin. Robotic arm reinforcement learning control method based on autonomous visual perception[J]. Northwestern polytechnical university

基于强化学习的机械臂自主视觉感知控制方法

胡春阳¹, 王恒², 史豪斌²

1. 湖北文理学院计算机工程学院, 湖北襄阳 441053;
2. 西北工业大学计算机学院, 陕西西安 710129

摘要:

传统机械臂控制方法按照人为预设固定轨迹来对其进行控制，完成特定的任务，依赖于精确的环境模型，并且控制过程缺乏一定的自适应性。为解决该问题，提出一种自主视觉感知与强化学习相结合的端到端机械臂智能控制方法。该方法中视觉感知使用YOLO算法，策略控制模块使用DDPG强化学习算法，使机械臂能够在复杂的环境中学习到自主控制策略，并且在训练过程使用了模仿学习与后视经验重播，加速了机械臂的学习过程。实验结果表明算法能够在更短的时间内收敛，并且在仿真环境中自主感知目标位置及整体策略控制都有着出色的表现。

关键词: 机器视觉强化学习模仿学习系统仿真智能控制

Robotic arm reinforcement learning control method based on autonomous visual perception

HU Chunyang¹, WANG Heng², SHI Haobin²

1. School of Computer, Hubei University of Arts and Science, Xiangyang 441053, China;
2. School of Computer, Northwestern Polytechnical University, Xi'an 710129, China

Abstract:

The traditional robotic arm control methods are often based on artificially preset fixed trajectories to control them to complete specific tasks, which rely on accurate environmental models, and the control process lacks the ability of self-adaptability. Aiming at the above problems, we proposed an end-to-end robotic arm intelligent control method based on the combination of machine vision and reinforcement learning. The visual perception uses the YOLO algorithm, and the strategy control module uses the DDPG reinforcement learning algorithm, which enables the robotic arm to learn autonomous control strategies in a complex environment. Otherwise, we used imitation learning and hindsight experience replay algorithm during the training process, which accelerated the learning process of the robotic arm. The experimental results show that the algorithm can converge in a shorter time, and it has excellent performance in autonomously perceiving the target position and overall strategy control in the simulation environment.

Key words: machine vision reinforcement learning imitation learning system simulation intelligent control

收稿日期: 2021-06-03 修回日期:

DOI: 10.1051/jnwpu/20213951057

基金项目: 湖北省科技厅重点研发项目（2020BBB092）和湖北省教育厅科学研究计划重点项目（D20192602）资助

通讯作者: 史豪斌(1978-),西北工业大学教授,主要从事智能机器人、群机器人协同合作及机器人路径规划与导航研究。e-mail:shihaobin@nwpu.edu.cn Email：shihaobin@nwpu.edu.cn

作者简介: 胡春阳(1975-),湖北文理学院副教授、博士,主要从事云计算、大数据和机器学习研究。

相关功能

PDF(2574KB) Free

打印本文

把本文推荐给朋友

作者相关文章

胡春阳 在本刊中的所有文章

王恒在本刊中的所有文章

史豪斌 在本刊中的所有文章


	参考文献:
	[1] LUMELSKY V, STEPANOV A. Dynamic path planning for a mobile automaton with limited information on the environment[J]. IEEE Trans on Automatic Control, 1986, 31(11):1058-1063 [2] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning[J]. Computer Science, 2013, 12(19):5602 [3] LILLICRAP T P, HUNT J, PRITZEL A, et al. Continuous control with deep reinforcement learning[J/OL]. (2019-07-05)[2021-10-22]. https://arxiv.org/pdf/1509.02971v6.pdf [4] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Proximal policy optimization algorithms[J/OL]. (2017-08-28)[2021-10-22]. https://arxiv.org/pdf/1707.06347.pdf [5] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//International Conference on Machine Learning, 2016:1928-1937 [6] LI J, SHI H, HWANG K S. An explainable ensemble feedforward method with Gaussian convolutional filter[J]. Knowledge Based Systems, 2021(225):107103 [7] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double q-learning[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2016 [8] SCHAUL T, HORGAN D, GREGOR K, et al. Universal value function approximators[C]//International Conference on Machine learning, 2015:1312-1320 [9] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hindsight experience replay[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, 2017:5055-5065 [10] HESTER T, VECERIK M, PIETQUIN O, et al. Deep q-learning from demonstrations[C]//Thirty-Second AAAI Conference on Artificial Intelligence, 2018 [11] VECERÍK M, HESTER T, SCHOLZ J, et al. Leveraging demonstrations for deep reinforcement learning on robotics problems with sparse rewards[J/OL]. (2018-10-08)[2021-10-22]. https://arxiv.org/pdf/1707.08817v1.pdf [12] BOCHKOVSKIY A, WANG C Y, LIAO H. YOLOv4:optimal speed and accuracy of object detection[J/OL]. (2020-04-23)[2021-10-22]. https://arxiv.org/pdf/2004.10934v1.pdf [13] Wu Y H, Lin S D. A low-cost ethics shaping approach for designing reinforcement learning agents[C]//Proceedings of the AAAI Conference on Artificial Intelligence, 2018 [14] CHRISTIANO P F, LEIKE J, BROWN T B, et al. Deep reinforcement learning from human preferences[C]//Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, 2017

邮编:710072 电话：029-88495455 Email：xuebao@nwpu.edu.cn

本系统由北京仁和汇智信息技术有限公司设计开发技术支持：info@rhhz.net