面向图像识别的深度学习VLIW处理器设计 -- 西北工业大学学报,2020,38(1):216-224

	论文:2020,Vol:38,Issue(1):216-224
	引用本文：
	李林, 张盛兵, 吴鹃. 面向图像识别的深度学习VLIW处理器设计[J]. 西北工业大学学报
	LI Lin, ZHANG Shengbing, WU Juan. Design of Deep Learning VLIW Processor for Image Recognition[J]. Northwestern polytechnical university

面向图像识别的深度学习VLIW处理器设计

李林^1,2, 张盛兵¹, 吴鹃³

1. 西北工业大学计算机学院, 陕西西安 710072;
2. 北京微电子技术研究所设计四部, 北京 100076;
3. 西安职业技术学院动漫软件学院, 陕西西安 710077

摘要:

为了适应航空航天领域高分辨率图像识别和本地化高效处理的需求，解决现有研究中计算并行性不足的问题，在对深度卷积神经网络模型各层计算优化的基础上，设计了一款可扩展的多处理器簇的深度学习超长指令字（VLIW）处理器体系结构。设计中采用了特征图和神经元的并行处理，基于VLIW的指令级并行，多处理器簇的数据级并行以及流水线技术。FPGA原型系统测试结果表明，该处理器可有效完成图像分类和目标检测应用；当工作频率为200 MHz时，处理器的峰值性能可以达到128 GOP/s；针对选取的测试基准，该处理器的计算速度至少是CPU的12倍，是GPU的7倍；对比软件框架运行结果，处理器的测试精度的平均误差不超过1%。

关键词: 图像识别深度学习卷积神经网络超长指令字(VLIW) 处理器可扩展

Design of Deep Learning VLIW Processor for Image Recognition

LI Lin^1,2, ZHANG Shengbing¹, WU Juan³

1. School of Computer Science and Engineering, Northwestern Polytechnical University, Xi'an 710072, China;
2. Fourth Design Department, Beijing Institute of Micoelectronics Technology, Beijing 100076, China;
3. School of Animation and Software, Xi'an Vocational and Technical College, Xi'an 710077, China

Abstract:

In order to adapt the application demands of high resolution images recognition and efficient processing of localization in aviation and aerospace fields, and to solve the problem of insufficient parallelism in existing researches, an extensible multiprocessor cluster deep learning processor architecture based on VLIW is designed by optimizing the computation of each layer of deep convolutional neural network model. Parallel processing of feature maps and neurons, instruction level parallelism based on very long instruction word (VLIW), data level parallelism of multiprocessor clusters and pipeline technologies are adopted in the design. The test results based on FPGA prototype system show that the processor can effectively complete the image classification and object detection applications. The peak performance of processor is up to 128 GOP/s when it operates at 200 MHz. For selecting benchmarks, the processor speed is about 12X faster than CPU and 7X faster than GPU at least. Comparing with the results of the software framework, the average error of the test accuracy of the processor is less than 1%.

Key words: image recognition deep learning convolutional neural networks very long instruction word(VLIW) processor extensible

收稿日期: 2019-01-08 修回日期:

DOI: 10.1051/jnwpu/20203810216

通讯作者: Email：

作者简介: 李林(1982-),西北工业大学博士研究生,主要从事微处理器体系结构及集成电路设计研究。

相关功能

PDF(1878KB) Free

打印本文

把本文推荐给朋友

作者相关文章

李林在本刊中的所有文章

张盛兵 在本刊中的所有文章

吴鹃在本刊中的所有文章


	参考文献:
	[1] LI L, ZHANG S, WU J. An Efficient Hardware Architecture for Activation Function in Deep Learning Processor[C]//IEEE International Conference on Image, Vision and Computing, 2018:911-918 [2] LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-Based Learning Applied to Document Recognition[J]. Proceedings of the IEEE, 1998, 86(11):2278-2324 [3] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Image Net Classification with Deep Convolutional Neural Networks[C]//International Conference on Neural Information Processing Systems, 2012:1097-1105 [4] LIU W, ANGUELOV D, ERHAN D, et al. SSD:Single Shot MultiBox Detector[C]//European Conference on Computer Vision, 2016:21-37 [5] HOWARD A G, ZHU M, CHEN B, et al. MobileNets:Efficient Convolutional Neural Networks for Mobile Vision Applications[EB/OL].(2017-04-17)[2019-01-02]. https://arxiv.org/abs/1704.04861 [6] HENNESSY J L, PATTERSON D A. Computer Architecture:a Quantitative Approach[M]. 6th Edition. Cambridge:Morgan Kaufmann Publishers Inc, 2018 [7] FARABET C, MARTINI B, CORDA B, et al. NeuFlow:a Runtime Reconfigurable Dataflow Processor for Vision[C]//Computer Vision and Pattern Recognition Workshops, 2011:109-116 [8] PEEMEN M, SETIO A A A, MESMAN B, et al. Memory-Centric Accelerator Design for Convolutional Neural Networks[C]//IEEE International Conference on Computer Design, 2013:13-19 [9] CHEN T, DU Z, SUN N, et al. DianNao:a Small-Footprint High-Throughput Accelerator for Ubiquitous Machine-Learning[J]. Acm Sigplan Notices, 2014, 49(4):269-284 [10] DU Z, FASTHUBER R, CHEN T, et al. Shidiannao:Shifting Vision Processing Closer to the Sensor[C]//International Symposium on Computer Architecture, 2015:92-104 [11] JOUPPI N, YOUNG C, PATIL N, et al. In-Datacenter Performance Analysis of a Tensor Processing Unit[C]//International Symposium on Computer Architecture, 2017:1-12 [12] JIA Y, SHELHAMER E, DONAHUE J, et al. Caffe:Convolutional Architecture for Fast Feature Embedding[C]//ACM International Conference on Multimedia, 2014:675-678 [13] KRIZHEVSKY A, HINTON G. Learning Multiple Layers of Features from Tiny Images[R]. Technical Report TR-2009 [14] KHOSLA A, JAYADEVAPRAKASH N, YAO B, et al. Novel Dataset for Fine-Grained Image Categorization[C]//First Workshop on Fine-Grained Visual Categorization, IEEE Conference on Computer Vision and Pattern Recognition Colorado Springs, 2011 [15] EVERINGHAM M, ESLAMI S, VAN G, et al. The Pascal Visual Object Classes Challenge:A Retrospective[J]. International Journal of Computer Vision, 2015, 111(1):98-136 [16] Satellite Imaging Corporation. WorldView-3 Satellite Sensor[EB/OL].(2018-02-06)[2019-01-02]. https://www.satimagingcorp.com/satellite-sensors/worldview-3/ [17] ZHANG C, LI P, SUN G, et al. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks[C]//ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 2015:161-170

	相关文献:
	1．白芃远, 许华, 孙莉.基于卷积神经网络与时频图纹理信息的信号调制方式分类方法[J]. 西北工业大学学报, 2019,37(4): 816-823

邮编:710072 电话：029-88495455 Email：xuebao@nwpu.edu.cn

本系统由北京仁和汇智信息技术有限公司设计开发技术支持：info@rhhz.net