首页 | 本学科首页   官方微博 | 高级检索  
     

结合MGCC特征与多尺度通道注意力的环境声深度学习分类方法*
引用本文:杨俊杰,丁家辉,杨柳,冯丽,杨超. 结合MGCC特征与多尺度通道注意力的环境声深度学习分类方法*[J]. 应用声学, 2024, 43(3): 513-524
作者姓名:杨俊杰  丁家辉  杨柳  冯丽  杨超
作者单位:广东工业大学,广东工业大学,广州大学,澳门科技大学,广东工业大学
基金项目:国家自然科学青年基金项目(62003101), 广东省自然科学基金面上基金(2022A1515010181,2023A1515011290)
摘    要:环境声分类技术在家居安全监测、人机语音交互等领域具有关键作用。然而,声源的多样性与混合性给环境声分类方法设计带来了重大挑战。为提高分类准确率与节约计算资源,本文提出一种基于多尺度通道注意力机制下的深度学习分类模型。所提模型由特征提取模块、多尺度卷积模块、高效通道注意力模块、输出层四部分组成。首先,通过引入加权型梅尔Gammatone频率倒谱系数挖掘环境声频谱幅值与相位结构信息;其次,融合多尺度卷积核与高效通道注意力机制优选出音频关键局部细节和通道特征;最后,在全连接层采用softmax函数映射特征并输出环境声类型的概率值。所提模型在6种环境声的iFLYTEK、10种环境声的Urbansound8k数据集上开展测试验证,分别取得了94%、76.52%、79.24%(iFLYTEK+Urbansound8k)的分类准确率。消融实验结果进一步表明:引入的多尺度卷积模块、通道注意力机制模块对分类准确率的提升贡献率分别接近于3.77%和1.89%。实验还详细对比了7种现有的深度学习分类方法,所提算法在分类准确率上排名第二;另外, 在同级别算法中如ResNet18、GoogLeNet,所提算法在模型参数量和计算复杂度方面上实现了进一步的约减。

关 键 词:环境声分类;梅尔Gammatone频率倒谱;多尺度核卷积;高效通道注意力;卷积神经网络
收稿时间:2023-11-30
修稿时间:2024-04-30

Environmental sound classification method using MGCC feature and multi-scale channel attention based deep neural network collaboration
Yang Junjie,Ding Jiahui,Yang Liu,Feng Li and Yang Chao. Environmental sound classification method using MGCC feature and multi-scale channel attention based deep neural network collaboration[J]. Applied Acoustics(China), 2024, 43(3): 513-524
Authors:Yang Junjie  Ding Jiahui  Yang Liu  Feng Li  Yang Chao
Affiliation:Guangdong University of Technology,Guangdong University of Technology,Guangzhou University,Macau University of Science and Technology,Guangdong University of Technology
Abstract:Environmental sound classification (ESC) plays an important role in varies areas such as home security monitoring and human-machine voice interaction etc. However, the diversity and complexity of sound sources pose significant challenges to the design of ESC methods. In order to enhance classification accuracy and conserve computational resources, an advanced deep classification approach based on convolutional neural networks (CNN), collaborated by a multi-scale channel attention mechanism was established in this paper. The framework of this model is divided into four key segments: a feature extraction module, a multi-scale convolution network module, an efficient attention module, and an output layer for final classification. First, it incorporates a weighted mel-generalized cepstral coefficients (MGCC) feature, designed to extract both frequency and phase structure information of environmental sound. Second, this model cooperates the multi-scale kernel convolution and efficient channel attention mechanism to abstract and selectively focus to specific local structure and channel of environmental sounds. Finally, the softmax function is used in the fully connected layer to map features and output the probability of environmental sound types. Experimental results on public datasets of iFLYTEK and Urbansound8k demonstrated that the proposed model have achieved ESC accuracy of 94%,76.52%, 79.24%(iFLYTEK +Urbansound8k), respectively. Further ablation experiments indicate that the introduced multi-scale convolution module and channel attention mechanism module contribute to an improvement in classification accuracy by approximately 3.77% and 1.89%, respectively. The experiments also provide comparison with the state-of-the-art deep learning classification methods, ranking the proposed algorithm second in terms of classification accuracy. Additionally, comparing to the best methods such as ResNet18 and GoogLeNet, the proposed algorithm achieves further reduction in model parameters and computational complexity.
Keywords:Environmental sound classification   mel-generalized cepstral coefficients   multi-scale kernel convolution  
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号