多优化机制下深度神经网络的音频场景识别 Audio scene recognition of deep neural network under multiple optimization mechanisms期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

多优化机制下深度神经网络的音频场景识别

引用本文：	杨立东,胡江涛. 多优化机制下深度神经网络的音频场景识别[J]. 信号处理, 2021, 37(10): 1969-1976. DOI: 10.16798/j.issn.1003-0530.2021.10.021

作者姓名：	杨立东胡江涛

作者单位：	内蒙古科技大学信息工程学院

基金项目：	国家自然科学基金项目（61640012）资助；内蒙古自然科学基金项目（2017MS（LH）0602）资助

摘要：	随着并行计算能力的不断攀升和音频数据量的日益扩增，音频场景识别成为场景理解领域重要的研究内容之一。针对音频场景识别建模难度大和识别准确率不高的问题，本文提出了融合多优化机制的并行卷积循环神经网络算法模型。首先，将音频信号经预处理后转化为一定尺寸的梅尔声谱图，之后输入到网络模型中进行充分的空间特征和时间特征学习，最后进行识别。为了验证模型的有效性，在DCASE2019音频场景数据集上进行识别性能测试，结果显示，该算法模型对音频场景的识别准确率能够达到88.84%，优于传统网络模型，说明该算法模型对音频场景识别问题的有效性。
关键词：	音频场景识别卷积神经网络批标准化机制双向门控循环单元
收稿时间：	2021-03-02
Audio scene recognition of deep neural network under multiple optimization mechanisms

Affiliation:	Inner Mongolia University of Science and Technology, School of Information Engineering

Abstract:	With the increasing parallel computing power and the increasing amount of audio data, audio scene recognition has become one of the important research contents in the field of scene understanding. In order to solve the problems of difficult modeling and low accuracy of audio scene recognition, a Paralleling Convolutional Recurrent Neural Network algorithm model with multi-optimization mechanism is proposed in this paper. First of all, the audio signal is preprocessed and converted into a Mel spectrogram of a certain size, and then input into the network model for full spatial and temporal feature learning, and finally recognition. In order to verify the effectiveness of the model, the recognition performance test is carried out on the DCASE2019 audio scene data set. The results show that the accuracy of the algorithm model for audio scene recognition can reach 88.84%, which is better than the traditional network model, indicating the effectiveness of the algorithm model for audio scene recognition.

Keywords:

	点击此处可从《信号处理》浏览原始摘要信息
	点击此处可从《信号处理》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏