多智能体编队控制中的迁移强化学习算法研究 -- 西北工业大学学报,2023,41(2):389-399

	论文:2023,Vol:41,Issue(2):389-399
	引用本文：
	胡鹏林, 潘泉, 郭亚宁, 赵春晖. 多智能体编队控制中的迁移强化学习算法研究[J]. 西北工业大学学报
	HU Penglin, PAN Quan, GUO Yaning, ZHAO Chunhui. Study on learning algorithm of transfer reinforcement for multi-agent formation control[J]. Journal of Northwestern Polytechnical University

多智能体编队控制中的迁移强化学习算法研究

胡鹏林, 潘泉, 郭亚宁, 赵春晖

西北工业大学自动化学院, 陕西西安 710129

摘要:

针对多障碍环境下的多智能体系统协同编队避障与防撞问题,提出一种迁移学习与强化学习相结合的编队控制算法。在源任务学习阶段,利用值函数近似方法避免Q-表格求解法所需的大规模存储空间问题,有效降低对存储空间的需求,提升算法求解速度;在目标任务学习阶段,采用高斯聚类算法对源任务进行分类,根据聚类中心和目标任务之间的距离,选择最优的源任务类进行目标任务学习,有效避免了负迁移现象,进而提升了强化学习算法的泛化能力及收敛速度。仿真实验结果表明,所提方法能使多智能体系统在复杂的障碍环境下有效地形成并保持编队构型,同时实现避障与防撞。

关键词: 多智能体系统迁移强化学习值函数近似编队控制高斯聚类

Study on learning algorithm of transfer reinforcement for multi-agent formation control

HU Penglin, PAN Quan, GUO Yaning, ZHAO Chunhui

School of Automation, Northwestern Polytechnical University, Xi'an 710129, China

Abstract:

Considering the obstacle avoidance and collision avoidance for multi-agent cooperative formation in multi-obstacle environment, a formation control algorithm based on transfer learning and reinforcement learning is proposed. Firstly, in the source task learning stage, the large storage space required by Q-table solution is avoided by using the value function approximation method, which effectively reduces the storage space requirement and improves the solving speed of the algorithm. Secondly, in the learning phase of the target task, Gaussian clustering algorithm was used to classify the source tasks. According to the distance between the clustering center and the target task, the optimal source task class was selected for target task learning, which effectively avoided the negative transfer phenomenon, and improved the generalization ability and convergence speed of reinforcement learning algorithm. Finally, the simulation results show that this method can effectively form and maintain formation configuration of multi-agent system in complex environment with obstacles, and realize obstacle avoidance and collision avoidance at the same time.

Key words: multi-agent system transfer reinforcement learning value function approximation formation control Gaussian clustering

收稿日期: 2022-06-15 修回日期:

DOI: 10.1051/jnwpu/20234120389

基金项目: 国家自然科学基金 (61790552,62073264)资助

通讯作者: 潘泉(1961-),西北工业大学教授,主要从事无人机信息安全与多源信息融合研究。e-mail:quanpan@nwpu.edu.cn Email：quanpan@nwpu.edu.cn

作者简介: 胡鹏林(1996-),西北工业大学博士研究生,主要从事博弈论、强化学习及多智能体最优控制研究。

相关功能

PDF(3864KB) Free

打印本文

把本文推荐给朋友

作者相关文章

胡鹏林 在本刊中的所有文章

潘泉在本刊中的所有文章

郭亚宁 在本刊中的所有文章

赵春晖 在本刊中的所有文章


	参考文献:
	[1] DORRI A, KANHERE S S, JURDAK R. Multi-agent systems:a survey[J]. IEEE Access, 2018, 6:28573-28593 [2] OH K K, PARK M C, AHN H S. A survey of multi-agent formation control[J]. Automatica, 2015, 53:424-440 [3] MOHIUDDIN A, TAREK T, ZWEIRI Y, et al. A survey of single and multi-UAV aerial manipulation[J]. Unmanned Systems, 2020, 8(2):119-147 [4] GE J, FAN C, YAN C, et al. Multi-UAVs close formation control based on wild geese behavior mechanism[C]//2019 Chinese Automation Congress, 2019:967-972 [5] HUO M, DUAN H, FAN Y. Pigeon-inspired circular formation control for multi-UAV system with limited target information[J]. Guidance, Navigation and Control, 2021, 1(1):2150004 [6] LIN Y, WU X, WANG X, et al. Bio-inspired formation control for UUVs swarm based on social force model[C]//International Conference on Autonomous Unmanned Systems, Singapore, 2021:3250-3259 [7] 李正平, 鲜斌. 基于虚拟结构法的分布式多无人机鲁棒编队控制[J]. 控制理论与应用, 2020, 37(11):2423-2431 LI Zhengping, XIAN Bin. Robust distributed formation control of multiple unmanned aerial vehicles based on virtual structure[J]. Control Theory & Applications, 2020, 37(11):2423-2431 (in Chinese) [8] YAN X, JIANG D, MIAO R, et al. Formation control and obstacle avoidance algorithm of a multi-USV system based on virtual structure and artificial potential field[J]. Journal of Marine Science and Engineering, 2021, 9(2):161 [9] RIAH A, AGUSTINAH T. Formation control of multi-robot using virtual structures with a linear algebra approach[J]. Journal on Advanced Research in Electrical Engineering, 2020, 4(1):45-50 [10] XUAN Mung N, HONG S K. Robust adaptive formation control of quadcopters based on a leader-follower approach[J]. International Journal of Advanced Robotic Systems, 2019, 16(4):1-11 [11] HE S, WANG M, DAI S L, et al. Leader-follower formation control of USVs with prescribed performance and collision avoidance[J]. IEEE Trans on Industrial Informatics, 2018, 15(1):572-581 [12] TANG Z, CUNHA R, HAMEL T, et al. Formation control of a leader-follower structure in three dimensional space using bearing measurements[J]. Automatica, 2021, 128:109567 [13] WU Y, GOU J, HU X, et al. A new consensus theory-based method for formation control and obstacle avoidance of UAVs[J]. Aerospace Science and Technology, 2020, 107:106332 [14] HE X, GENG Z. Consensus-based formation control for nonholonomic vehicles with parallel desired formations[J]. International Journal of Control, 2021, 94(2):507-520 [15] SUTTON R S, BARTO A G. Reinforcement learning:an introduction[M]. Cambridge:MIT Press, 2018 [16] AFIFI A M, ALHOSAINY O H, ELIAS C M, et al. Deep policy-gradient based path planning and reinforcement cooperative Q-learning behavior of multi-vehicle systems[C]//IEEE International Conference on Vehicular Electronics and Safety, 2019:1-7 [17] LIN J L, HWANG K S, WANG Y L. A simple scheme for formation control based on weighted behavior learning[J]. IEEE Trans on Neural Networks and Learning Systems, 2013, 25(6):1033-1044 [18] ZHU P, DAI W, YAO W, et al. Multi-robot flocking control based on deep reinforcement learning[J]. IEEE Access, 2020, 8:150397-150406 [19] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Trans on Knowledge and Data Engineering, 2009, 22(10):1345-1359 [20] NIU S, LIU Y, WANG J, et al. A decade survey of transfer learning[J]. IEEE Trans on Artificial Intelligence, 2020, 1(2):151-166 [21] ZENG M, LI M, FEI Z, et al. Automatic ICD-9 coding via deep transfer learning[J]. Neurocomputing, 2019, 324:43-50 [22] BYRA M, WU M, ZHANG X, et al. Knee menisci segmentation and relaxometry of 3D ultrashort echo time cones MR imaging using attention U-net with transfer learning[J]. Magnetic Resonance in Medicine, 2020, 83(3):1109-1122 [23] PETEGROSSO R, PARK S, HWANG T H, et al. Transfer learning across ontologies for phenome-genome association prediction[J]. Bioinformatics, 2017, 33(4):529-536 [24] HWANG T, KUANG R. A heterogeneous label propagation algorithm for disease gene discovery[C]//Proceedings of the 2010 SIAM International Conference on Data Mining, 2010:583-594 [25] ABDI H. Partial least squares regression and projection on latent structure regression(PLS Regression)[J]. Wiley Interdisciplinary Reviews:Computational Statistics, 2010, 2(1):97-106 [26] LU C, HU F, CAO D, et al. Transfer learning for driver model adaptation in lane-changing scenarios using manifold alignment[J]. IEEE Trans on Intelligent Transportation Systems, 2019, 21(8):3281-3293 [27] HU G, ZHANG Y, YANG Q. Transfer meets hybrid:a synthetic approach for cross-domain collaborative filtering with text[C]//The World Wide Web Conference, 2019:2822-2829 [28] ZHUANG F, ZHOU Y, ZHANG F, et al. Sequential transfer learning:cross-domain novelty seeking trait mining for recommendation[C]//Proceedings of the 26th International Conference on World Wide Web Companion, 2017:881-882 [29] 胡鹏林, 潘泉, 武胜帅, 等. 基于迁移强化学习的多智能体系统协同编队避障与防撞控制[C]//2021中国自动化大会论文集, 2021:591-596 HU Penglin, PAN Quan, WU Shengshuai, et al. Transfer reinforcement learning-based cooperative formation control of multi-agent systems with collision and obstacle aviodance[C]//Proceedings of 2021 China Automation Conference, 2021:591-596 (in Chinese)

邮编:710072 电话：029-88495455 Email：xuebao@nwpu.edu.cn

本系统由北京仁和汇智信息技术有限公司设计开发技术支持：info@rhhz.net