一种基于深度强化学习的调度优化方法 -- 西北工业大学学报,2017,35(6):1047-1053

	论文:2017,Vol:35,Issue(6):1047-1053
	引用本文：
	邓志龙, 张琦玮, 曹皓, 谷志阳. 一种基于深度强化学习的调度优化方法[J]. 西北工业大学学报
	Deng Zhilong, Zhang Qiwei, Cao Hao, Gu Zhiyang. A Scheduling Optimization Method Based on Depth Intensive Study[J]. Northwestern polytechnical university

一种基于深度强化学习的调度优化方法

邓志龙¹, 张琦玮², 曹皓², 谷志阳²

1. 西北工业大学电子信息学院, 陕西西安 710072;
2. 西北工业大学自动化学院, 陕西西安 710072

摘要:

深度强化学习在于将深度学习的感知能力与强化学习的决策能力相结合，可以直接根据输入进行控制，是一种更接近人类思维方式的人工智能方法。旨在二者结合基础上，研究了一种基于深度强化学习的资源调度算法的设计框架。该框架首先利用从网络节点获取的大量先验数据，训练深度学习网络；然后利用强化学习来分配网络资源；接着通过大量的自我对弈，实现基于深度强化学习的价值网络学习。最后，设计实验方案对算法的性能进行了仿真和对比验证，以验证该算法的有效性。

关键词: 深度学习调度算法蒙特卡洛模拟强化学习

A Scheduling Optimization Method Based on Depth Intensive Study

Deng Zhilong¹, Zhang Qiwei², Cao Hao², Gu Zhiyang²

1. School of Electronics and Information, Northwestern Polytechnical University, Xi'an 710072, China;
2. School of Automation, Northwestern Polytechnical University, Xi'an 710072, China

Abstract:

Depth intensive study is a combination of deep learning perceived ability and enhanced learning decision-making ability which can be controlled by the input. Depth intensive study is an artificial intelligence method which is closer to human thinking. Based on the combination of the two methods, the paper studies a designed framework of resource scheduling algorithm based on depth intensive study. First, the framework utilizes a large number of priori data from the network nodes to train depth learning network. Then use the enhanced learning to allocate network resources, Next realize the value of network learning based on deep reinforcement learning through a lot of self-chess. Finally, the performance of the algorithm is simulated and compared, and the results confirm the effectiveness of the algorithm.

Key words: deep learning scheduling algorithms Monte Carlo simulation reinforcement learning

收稿日期: 2017-02-01 修回日期:

DOI:

基金项目: 国家自然科学基金（U1609216）资助

通讯作者: Email：

作者简介: 邓志龙(1976-),西北工业大学博士研究生,主要从事网络大数据及数据挖掘研究。

相关功能

PDF(1324KB) Free

打印本文

把本文推荐给朋友

作者相关文章

邓志龙 在本刊中的所有文章

张琦玮 在本刊中的所有文章

曹皓在本刊中的所有文章

谷志阳 在本刊中的所有文章


	参考文献:
	[1] Deng Zhenghong, Ma Chunmiao, Mao Xudong. Historical Payoff Promotes Cooperation in the Prisoner's Dilemma Game[J]. Chaos, Solitons & Fractals, 2017, 104:1-5 [2] Gao Bo, Deng Zhenghong, Zhao Dawei. Competing Spreading Processes and Immunization in Multiplex Networks[J]. Chaos Solitons & Fractals, 2016, 93:175-181 [3] Xiao Luxin. Research on the Optimization of Enrollment Data Resources Based on Cloud Computing Platform[J]. International Information and Engineering Technology Association, 2015, 2(2):9-12 [4] 尹宝才, 王文通,王立春. 深度学习研究综述[J]. 北京工业大学学报, 2015(1):48-59 Yin Baocai, Wang Wentong, Wang Lichun. Review of Deep Learning[J]. Journal of Beijing University of Technology, 2015(1):48-59(in Chinese) [5] 刘建伟, 刘媛,罗雄麟. 深度学习研究进展[J]. 计算机应用研究, 2014(7):1921-1930 Liu Jianwei, Liu Yuan, Luo Xionglin. Research and Development on Deep Learning[J]. Application Research of Cernputers, 2014(7):1921-1930(in Chinese) [6] 张浩, 吴秀娟, 王静. 深度学习的目标与评价体系构建[J]. 中国电化教育, 2014(7):51-55 Zhang Hao, Wu Xiujuan, Wang Jing. Study on the Evaluation Theoretical Structure Building of Deep Learning[J]. China Educational Technology, 2014(7):51-55(in Chinese) [7] 邓正宏,薛静. 基于数量化Ⅱ类的数据分析库的设计与实现[J]. 计算机工程与应用, 2003, 39(28):42-45 Deng Zhengheng, Xue Jing. The Design and Implementation of QuantityⅡ Class Based Analysis Library System[J]. Computer Engineering and Applications, 2003, 39(28):42-45(in Chinese)

邮编:710072 电话：029-88495455 Email：xuebao@nwpu.edu.cn

本系统由北京仁和汇智信息技术有限公司设计开发技术支持：info@rhhz.net