论文:2021,Vol:39,Issue(3):529-538
引用本文:
朱紫钰, 汤小春, 赵全. 面向CPU-GPU集群的分布式机器学习资源调度框架研究[J]. 西北工业大学学报
ZHU Ziyu, TANG Xiaochun, ZHAO Quan. A unified schedule policy of distributed machine learning framework for CPU-GPU cluster[J]. Northwestern polytechnical university

面向CPU-GPU集群的分布式机器学习资源调度框架研究
朱紫钰, 汤小春, 赵全
西北工业大学 计算机学院, 陕西 西安 710072
摘要:
随着GPU硬件设施的广泛应用,越来越多的分布式机器学习应用程序开始使用CPU-GPU混合集群资源来提高算法的效率。但是,现有的分布式机器学习调度框架要么只考虑CPU资源上的任务调度,要么只考虑GPU资源上的任务调度,即使综合考虑CPU与GPU资源的不同,也很难提高整个系统的资源使用效率,即使用CPU-GPU集群进行分布式机器学习作业面临的关键挑战是如何高效地调度作业中的任务。在对现有的方法进行分析后,提出了一种基于不均匀数据分片的策略,利用线性规划的原理,使得CPU任务时间与GPU任务时间尽可能接近,从而减少分布式机器学习作业的整体执行时间。介绍了CPU-GPU混合计算框架的调度结构,这种调度结构针对CPU计算能力与GPU计算能力的不同特点,将数据分割成大小不等的数据分片以适应于CPU和GPU计算资源,给出了CPU-GPU混合资源下的任务调度方法,对该方法进行K-Means算法验证。使用CPU-GPU混合资源计算框架,K-Means性能平均提高1.5倍,且随着GPU数量的增加,K-Means性能能够显著提升。
关键词:    异构任务    一体化调度    聚类算法    分布式   
A unified schedule policy of distributed machine learning framework for CPU-GPU cluster
ZHU Ziyu, TANG Xiaochun, ZHAO Quan
School of Computer Science, Northwestern Polytechnical University, Xi'an 710072, China
Abstract:
With the widespread using of GPU hardware facilities, more and more distributed machine learning applications have begun to use CPU-GPU hybrid cluster resources to improve the efficiency of algorithms. However, the existing distributed machine learning scheduling framework either only considers task scheduling on CPU resources or only considers task scheduling on GPU resources. Even considering the difference between CPU and GPU resources, it is difficult to improve the resource usage of the entire system. In other words, the key challenge in using CPU-GPU clusters for distributed machine learning jobs is how to efficiently schedule tasks in the job. In the full paper, we propose a CPU-GPU hybrid cluster schedule framework in detail. First, according to the different characteristics of the computing power of the CPU and the computing power of the GPU, the data is divided into data fragments of different sizes to adapt to CPU and GPU computing resources. Second, the paper introduces the task scheduling method under the CPU-GPU hybrid. Finally, the proposed method is verified at the end of the paper. After our verification for K-Means, using the CPU-GPU hybrid computing framework can increase the performance of K-Means by about 1.5 times. As the number of GPUs increases, the performance of K-Means can be significantly improved.
Key words:    CPU-GPU tasks    unified scheduler    clustering algorithm    distribution   
收稿日期: 2020-10-10     修回日期:
DOI: 10.1051/jnwpu/20213930529
基金项目: 科技部重点研发基金(2018YFB1003403)资助
通讯作者:     Email:
作者简介: 朱紫钰(1996-),女,西北工业大学硕士研究生,主要从事大数据计算及集群资源管理研究。
相关功能
PDF(1662KB) Free
打印本文
把本文推荐给朋友
作者相关文章
朱紫钰  在本刊中的所有文章
汤小春  在本刊中的所有文章
赵全  在本刊中的所有文章

参考文献:
[1] CHEN T, LI M, LI Y, et al. Mxnet:a flexible and efficient machine learning library for heterogeneous distributed systems[J/OL]. (2015-12-03)[2015-12-07]. https://arxiv.org/abs/1512.01274
[2] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe:convolutional architecture for fast feature embedding[C]//Proceedings of the 22nd ACM International Conference on Multimedia, 2014:675-678
[3] XING E P, HO Q, DAI W, et al. Petuum:a new platform for distributed machine learning on big data[J]. IEEE Trans on Big Data, 2015, 1:49-67
[4] CHEN L, HUO X, AGRAWAL G. Accelerating mapreduce on a coupled CPU-GPU architecture[C]//Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, 2012:1-11
[5] RAVI V T, BECCHI M, JIANG W, et al. Scheduling concurrent applications on a cluster of CPU-GPU nodes[C]//2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, 2012:140-147
[6] GU J, LIU H, ZHOU Y, et al. Deepprof:performance analysis for deep learning applications via mining GPU execution patterns[J/OL]. (2017-07-12)[2017-07-13]. https://arxiv.org/abs/1707.03750
[7] RHU M, GIMELSHEIN N, CLEMONS J, et al. vDNN:virtualized deep neural networks for scalable, memory-efficient neural network design[C]//2016 49th Annual IEEE/ACM International Symposium on Microarchitecture, 2016:1-13
[8] GOYAL P, DOLLÁR P, GIRSHICK R, et al. Accurate, large minibatch SGD:training imagenet in 1 hour[J/OL]. (2018-04-30)[2018-05-02]. https://arxiv.org/abs/1706.02677
[9] VAVILAPALLI V K, MURTHY A C, DOUGLAS C, et al. Apache hadoop yarn:yet another resource negotiator[C]//Proceedings of the 4th Annual Symposium on Cloud Computing, 2013:1-16
[10] ZHANG H, STAFMAN L, OR A, et al. Slaq:quality-driven scheduling for distributed machine learning[C]//Proceedings of the 2017 Symposium on Cloud Computing, 2017:390-404
[11] 汤小春,符莹,樊雪枫. 数据中心上异构资源的细粒度分配算法研究[J]. 西北工业大学学报,2020,38:589-595 TANG Xiaochun, FU Ying, FAN Xuefeng. Research on fine-grained allocation algorithm of heterogeneous resources in data center[J]. Journal of Northwestern Polytechnical University, 2020, 38:589-595(in Chinese)
[12] 王彦华,乔建忠,林树宽,等. 基于SVM的CPU-GPU异构系统任务分配模型[J]. 东北大学学报,2016,37:1089-1094 WANG Yanhua, QIAO Jianzhong, LIN Shukuan, et al. SVM-based task allocation model of CPU-GPU heterogeneous system[J]. Journal of Northeastern University, 2016, 37:1089-1094(in Chinese)
[13] XIAO W, BHARDWAJ R, RAMJEE R, et al. Gandiva:introspective cluster scheduling for deep learning[C]//13th Symposium on Operating Systems Design and Implementation, 2018:595-610
[14] GU J, CHOWDHURY M, SHIN K G, et al. Tiresias:a {GPU} cluster manager for distributed deep learning[C]//16th Symposium on Networked Systems Design and Implementation, 2019:485-500
[15] PENG Y, BAO Y, CHEN Y, et al. Optimus:an efficient dynamic resource scheduler for deep learning clusters[C]//Proceedings of the Thirteenth EuroSys Conference, 2018:1-14
[16] JEON M, VENKATARAMAN S, QIAN J, et al. Multi-tenant GPU clusters for deep learning workloads:analysis and mplications[J/OL]. (2018-05-13)[2018-06-16]. https://www.microsoft.com/en-us/research/publication/multi-tenant-gpu-clusters-deep-learning-workloads-analysis-implications-tr
[17] SHIRAHATA K, SATO H, MATSUOKA S. Hybrid map task scheduling for GPU-based heterogeneous clusters[C]//2010 IEEE Second International Conference on Cloud Computing Technology and Science, 2010:733-740
[18] ZHOU H, LIU C. Task mapping in heterogeneous embedded systems for fast completion time[C]//2014 International Conference on Embedded Software, 2014:1-10
[19] CHE S, BOYER M, MENG J, et al. Rodinia:a benchmark suite for heterogeneous computing[C]//2009 IEEE International Symposium on Workload Characterization, 2009:44-54
[20] ROUSSEEUW P J. Silhouettes:a graphical aid to the interpretation and validation of cluster analysis[J]. Journal of Computational and Applied Mathematics, 1987, 20:53-65