论文:2020,Vol:38,Issue(1):162-169
引用本文:
张科, 苏雨, 王靖宇, 王霰宇, 张彦华. 基于融合特征以及卷积神经网络的环境声音分类系统研究[J]. 西北工业大学学报
ZHANG Ke, SU Yu, WANG Jingyu, WANG Sanyu, ZHANG Yanhua. Environment Sound Classification System Based on Hybrid Feature and Convolutional Neural Network[J]. Northwestern polytechnical university

基于融合特征以及卷积神经网络的环境声音分类系统研究
张科1,2, 苏雨1,2,3, 王靖宇1,2, 王霰宇1,2, 张彦华1,2
1. 航天飞行动力学技术重点实验室, 陕西 西安 710072;
2. 西北工业大学 航天学院, 陕西 西安 710072;
3. Signals, Images, and Intelligent Systems Laboratory(LISSI/EA 3956), University Paris-Est Creteil, Senart-FB Institute of Technology, 36-37 rue Charpak, 77127 Lieusaint, France
摘要:
环境声音识别系统主要基于深度神经网络以及种类繁多的听觉特征对环境声音进行分类识别。分析基于深度神经网络的环境分类任务中,哪种听觉特征更适合环境声音识别系统十分必要。选择了基于2个广泛使用的滤波器:梅尔和Gammatone滤波器组提取的3种声音特征。随后,提出了一个MFCC和GFCC融合的特征MGCC。最后采用文中提出的深度卷积神经网络来验证哪种特征更适合于环境声音的分类识别。实验结果表明,在基于神经网络的环境声音分类系统中,信号处理特征比频谱图特征的效果好,其中,MGCC特征具有比其他特征更好的性能。最后,用文中提出的MCC-CNN模型与其他环境声音分类模型在UrbanSound 8K数据集上进行了对比。实验结果表明,所提模型分类精度最好。
关键词:    环境声音    特征融合    声音分类    卷积神经网络   
Environment Sound Classification System Based on Hybrid Feature and Convolutional Neural Network
ZHANG Ke1,2, SU Yu1,2,3, WANG Jingyu1,2, WANG Sanyu1,2, ZHANG Yanhua1,2
1. National Key Laboratory of Aerospace Flight Dynamics, Xi'an 710072, China;
2. School of Astronautics, Northwestern Polytecnical University, Xi'an 710072, China;
3. Signals, Images, and Intelligent Systems Laboratory(LISSI/EA 3956), University Paris-Est Creteil, Senart-FB Institute of Technology, 36-37 rue Charpak, 77127 Lieusaint, France
Abstract:
At present, the environment sound recognition system mainly identifies environment sounds with deep neural networks and a wide variety of auditory features. Therefore, it is necessary to analyze which auditory features are more suitable for deep neural networks based ESCR systems. In this paper, we chose three sound features which based on two widely used filters:the Mel and Gammatone filter banks. Subsequently, the hybrid feature MGCC is presented. Finally, a deep convolutional neural network is proposed to verify which features are more suitable for environment sound classification and recognition tasks. The experimental results show that the signal processing features are better than the spectrogram features in the deep neural network based environmental sound recognition system. Among all the acoustic features, the MGCC feature achieves the best performance than other features. Finally, the MGCC-CNN model proposed in this paper is compared with the state-of-the-art environmental sound classification models on the UrbanSound 8K dataset. The results show that the proposed model has the best classification accuracy.
Key words:    environment sound    hybrid feature    sound classification    convolutional neural network    filter   
收稿日期: 2019-01-16     修回日期:
DOI: 10.1051/jnwpu/20203810162
基金项目: 国家自然科学基金重大项目(51890884)与国家自然科学基金(61976179,61502391)资助
通讯作者:     Email:
作者简介: 张科(1965-),西北工业大学教授,主要从事导航、制导与控制研究。
相关功能
PDF(1985KB) Free
打印本文
把本文推荐给朋友
作者相关文章
张科  在本刊中的所有文章
苏雨  在本刊中的所有文章
王靖宇  在本刊中的所有文章
王霰宇  在本刊中的所有文章
张彦华  在本刊中的所有文章

参考文献:
[1] ADIGA A, MAGIMAI M, SEELAMANTULA C S. Gammatone wavelet Cepstral Coefficients for Robust Speech Recognition[C]//2013 IEEE International Conference of IEEE Region 10, 2013:1-4
[2] ALI H, TRAN S N, BENETOS E, et al. Speaker Recognition with Hybrid Features from a Deep Belief Network[J]. Neural Computing and Applications, 2018, 29(6):13-19
[3] DAI W. Acoustic Scene Recognition with Deep Learning[M]. Pittsburg:Carnegie Mellon, 2016
[4] BURGOS W. Gammatone and MFCC Features in Speaker Recognition[D]. Melbourne, Florida:Florida Institute of Technology, 2014
[5] LI J, DAI W, METZE F, et al. A Comparison of Deep Learning Methods for Environmental Sound Detection[C]//2017 IEEE International Conference on Acoustics, Speech and Signal Processing, 2017:126-130
[6] SALAMON J, JACOBY C, BELLO J P. A Dataset and Taxonomy for Urban Sound Research[C]//ACM Press, 2014:1041-1044
[7] CHACHADA S, KUO C-C J. Environmental Sound Recognition:a Survey[J]. APSIPA Transactions on Signal and Information Processing, 2014(3):e14
[8] AGRAWAL D M, SAILOR H B, SONI M H, et al. Novel TEO-Based Gammatone Features for Environmental Sound Classification[C]//2017 25th European Signal Processing Conference, Kos, Greece, 2017:1809-1813
[9] NAIR V, HINTON G E. Rectified Linear Units Improve Restricted Boltzmann Machines[C]//Proceedings of the 27th International Conference on Machine Learning, 2010:807-814
[10] BOUREAU Y L, PONCE J, LECUN Y. A Theoretical Analysis of Feature Pooling in Visual Recognition[C]//Proceedings of the 27th International Conference on Machine Learning, 2010:111-118
[11] PICZAK K J. Environmental Sound Classification with Convolutional Neural Networks[C]//2015 IEEE 25th International Workshop on Machine Learning for Signal Processing, 2015:1-6
[12] ZHANG X, ZOU Y, SHI W. Dilated Convolution Neural Network with Leaky ReLU for Environmental Sound Classification[C]//22nd International Conference on Digital Signal Processing, 2017:1-5