论文:2021,Vol:39,Issue(5):1077-1086
引用本文:
饶宁, 许华, 齐子森, 宋佰霖, 史蕴豪. 基于最大策略熵深度强化学习的通信干扰资源分配方法[J]. 西北工业大学学报
RAO Ning, XU Hua, QI Zisen, SONG Bailin, SHI Yunhao. Allocation method of communication interference resource based on deep reinforcement learning of maximum policy entropy[J]. Northwestern polytechnical university

基于最大策略熵深度强化学习的通信干扰资源分配方法
饶宁, 许华, 齐子森, 宋佰霖, 史蕴豪
空军工程大学 信息与导航学院, 陕西 西安 710077
摘要:
针对通信组网对抗中干扰资源分配的优化问题,提出了一种基于最大策略熵深度强化学习(MPEDRL)的干扰资源分配方法。该方法将深度强化学习思想引入到通信对抗干扰资源分配领域,并通过加入最大策略熵准则且自适应调整熵系数,以增强策略探索性加速收敛至全局最优。该方法将干扰资源分配建模为马尔可夫决策过程,通过建立干扰策略网络输出分配方案,构建剪枝孪生结构的干扰效果评估网络完成方案效能评估,以策略熵最大化和累积干扰效能最大化为目标训练策略网络和评估网络,决策干扰资源最优分配方案。仿真结果表明,所提出的方法能有效解决组网对抗中的干扰资源分配问题,且相比于已有的深度强化学习方法具有学习速度更快,训练过程波动性更小等优点,干扰效能高出DDPG方法15%。
关键词:    干扰资源分配    深度强化学习    最大策略熵    神经网络   
Allocation method of communication interference resource based on deep reinforcement learning of maximum policy entropy
RAO Ning, XU Hua, QI Zisen, SONG Bailin, SHI Yunhao
College of Information and Navigation, Air Force Engineering University, Xi'an 710077, China
Abstract:
In order to solve the optimization of the interference resource allocation in communication network countermeasures, an interference resource allocation method based on the maximum policy entropy deep reinforcement learning(MPEDRL) was proposed. The method introduced the idea of deep reinforcement learning into the communication countermeasures resource allocation, it could enhance the exploration of the policy and accelerate the convergence to the global optimum with adding the maximum policy entropy criterion and adaptively adjusting the entropy coefficient. The method modeled interference resource allocation as Markov decision process, then established the interference strategy network to output allocation scheme, constructing the interference effect evaluation network of the clipped twin structure for efficiency evaluation, and trained the policy network and the evaluation network with the goal of maximizing the strategy entropy and the cumulative interference efficacy, then decided the optimal interference resource allocation scheme. The simulation results show that the algorithm can effectively solve the resource allocation problem in communication network confrontation, comparing with the existing deep reinforcement learning methods, it has faster learning speed and less fluctuation in the training process, and achieved 15% higher jamming efficacy than DDPG-based method.
Key words:    interference resource allocation    deep reinforcement learning    maximum policy entropy    deep neural network   
收稿日期: 2021-01-20     修回日期:
DOI: 10.1051/jnwpu/20213951077
基金项目: 国家自然科学基金(61601500)资助
通讯作者:     Email:
作者简介: 饶宁(1997-),空军工程大学硕士研究生,主要从事通信对抗、智能决策研究。
相关功能
PDF(2785KB) Free
打印本文
把本文推荐给朋友
作者相关文章
饶宁  在本刊中的所有文章
许华  在本刊中的所有文章
齐子森  在本刊中的所有文章
宋佰霖  在本刊中的所有文章
史蕴豪  在本刊中的所有文章

参考文献:
[1] BAO J J, JI L J. Frequency hopping sequences with optimal partial hamming correlation[J]. IEEE Trans on Information Theory, 2016, 62(6):3768-3783
[2] WANG X J, LEI M J, ZHAO M J, et al. Cooperative anti-jamming strategy and outage probability optimization for multi-hop ad-hoc networks[C]//2017 IEEE 86th Vehicular Technology Conference,2017:24-27
[3] SUN J, LI X. Carrier frequency offset synchronization algorithm for short burst communication system[C]//Proceedings of 2016 IEEE 13th International Conference on Signal Processing, 2016:6-10
[4] 李东生,高杨,雍爱霞. 基于改进离散布谷鸟算法的干扰资源分配研究[J]. 电子与信息学报,2016,38(4):899-905 LI Dongsheng, GAO Yang, Yong Aixia. Jamming resource allocation via improved discrete cuckoo search algorithm[J]. Journal of Electronics & Information Technology, 2016,38(4):899-905(in Chinese)
[5] 刘以安,倪天权,张秀辉,等. 模拟退火算法在雷达干扰资源优化分配中的应用[J]. 系统工程与电子技术, 2009, 31(8):1914-1917 LIU Yian, NI Tianquan, ZHANG Xiuhui, et al. Application of simulated annealing algorithm in optimizing allocation of radar jamming resources[J]. Systems Engineering and Electronics, 2009, 31(8):1914-1917(in Chinese)
[6] 袁建国, 南蜀崇, 张芳, 等. 基于人工蜂群算法的多用户OFDM自适应资源分配方案[J]. 吉林大学学报, 2019,49(2):624-630 YUAN Jianguo, NAN Shuchong, ZHANG Fang, et al. Adaptive resource allocation for multi-user OFDM based on bee colony algorithm[J]. Journal of Jilin University, 2019, 49(2):624-630(in Chinese)
[7] LUONG N C, HOANG D T, GONG S, et al. Applications of deep reinforcement learning in communications and networking:a survey[J]. IEEE Communications Surveys & Tutorials, 2019, 21(4):3133-3174
[8] WANG S, LIU H, GOMES P H, et al. Deep reinforcement learning for dynamic multichannel access in wireless networks[J]. IEEE Trans on Cognitive Communications and Networking, 2018,4(2):257-265
[9] XU Z, WANG Y, TANG J, et al. A Deep reinforcement learning based framework for power-efficient resource allocation in cloud RANs[C]//2017 IEEE International Conference on Communications, 2017:1-6
[10] 廖晓闽, 严少虎, 石嘉,等. 基于深度强化学习的蜂窝网资源分配算法[J]. 通信学报, 2019, 40(2):11-18 LIAO Xiaomin, YAN Shaohu, SHI Jia, et al. Deep Reinforcement learning based resource allocation algorithms in cellular networks[J]. Journal on Communications, 2019,40(2):11-18(in Chinese)
[11] FAN Meng, CHEN Peng, WU Lenan, et al. Power allocation in multi-user cellular networks:deep reinforcement learning approaches[J]. IEEE Trans on Wireless Communications,2020, 19(10):6255-6267
[12] ZHAO D, QIN H, SONG B, et al. A graph convolutional network-based deep reinforcement learning approach for resource allocation in a cognitive radio network[J]. Sensors,2020,20(18):5216-5239
[13] KAUR A, KUMAR K. Energy-efficient resource allocation in cognitive radio networks under cooperative multi-agent model-free reinforcement learning schemes[J]. IEEE Trans on Network and Service Management,2020,17(3):1337-1348
[14] ZHANG H, YANG N, LONG K, et al. Power control based on deep reinforcement learning for spectrum sharing[J]. IEEE Trans on Wireless Communications, 2020,19(6):4209-4219
[15] XU Y, YANG C, HUA M, et al. Deep deterministic policy gradient(DDPG)-based resource allocation scheme for NOMA vehicular communications[J]. IEEE Access, 2020,8:18797-18807
[16] ZHAO N, LIANG Y, NIYATO D, et al. Deep reinforcement learning for user association and resource allocation in heterogeneous cellular networks[J]. IEEE Trans on Wireless Communications, 2019, 18(11):5141-5152
[17] AMURU S, TEKIN C, SCHAAR M, et al. Jamming bandits-a novel learning method for optimal jamming[J]. IEEE Trans on Wireless Communications, 2016, 15(4):2792-2808
[18] LUO Z, ZHANG S. Dynamic spectrum management:complexity and duality[J]. IEEE Journal of Selected Topics in Signal Processing, 2008, 2(1):57-73
[19] HAARNOJA T, TANG H, ABBEEL P, et al. Reinforcement learning with deep energy-based policies[C]//Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 2017:1352-1361
[20] HAARNOJA T, ZHOU A, ABBEEL P, et al. Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018:1861-1870
[21] MNIHL V, KAVUKCUOGLU1 K, SLIVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540):529-540
[22] FUJIMOTO S, HOOF H, MEGER M. Addressing function approximation error in actor-critic methods[C]//Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 2018:1587-1596
[23] LILLICRAP T, HUNT J, PRITZEL A, et al. Continuous control with deep reinforcement learning[C]//Proceedings of the 32th International Conference on Machine Learning, Lille, France, 2015:2361-2369
[24] DURK K, SALIMANS T, WELLING M. Variational dropout and the local reparameterization trick[C]//Advances in Neural Information Processing Systems, Montreal, Canada, 2015:2575-2583