首页 | 本学科首页   官方微博 | 高级检索  
     检索      

神经网络的声场景自动分类方法*
引用本文:梁腾,姜文宗,王立,刘宝弟,王延江.神经网络的声场景自动分类方法*[J].应用声学,2022,41(3):373-380.
作者姓名:梁腾  姜文宗  王立  刘宝弟  王延江
作者单位:中国石油大学(华东)海洋与空间信息学院,中国石油大学(华东),中国石油大学(华东),中国石油大学(华东),中国石油大学(华东)
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:声场景探察和自动分类能帮助人类制定应对特定环境的正确策略,具有重要的研究价值。随着卷积神经网络的发展,出现了许多基于卷积神经网络的声场景分类方法。其中时频卷积神经网络(TS-CNN)采用了时频注意力模块,是目前声场景分类效果最好的网络之一。为了在保持网络复杂度不变的前提下进一步提高网络的声场景分类性能,该文提出了一种基于协同学习的时频卷积神经网络模型(TSCNN-CL)。具体地说,该文首先建立了基于同构结构的辅助分支参与网络的训练。其次,提出了一种基于KL散度的协同损失函数,实现了分支与主干的知识协同,最后,在测试过程中,为了不增加推理计算量,该文提出的模型只使用主干网络预测结果。在ESC-10、ESC-50和UrbanSound8k数据集的综合实验表明,该模型分类效果要优于TS-CNN模型以及当前大部分的主流方法。

关 键 词:声场景分类  时频卷积神经网络  协同学习  声信号处理
收稿时间:2021/4/12 0:00:00
修稿时间:2022/5/5 0:00:00

Automatic classification of acoustic scene based on neural network
liangteng,jiangwenzong,wangli,liubaodi and wangyanjiang.Automatic classification of acoustic scene based on neural network[J].Applied Acoustics,2022,41(3):373-380.
Authors:liangteng  jiangwenzong  wangli  liubaodi and wangyanjiang
Abstract:Sound is one of the elements of human perception of the external world, which is closely related to human daily work and life. Acoustic scene detection and automatic classification help human beings to formulate correct strategies to deal with specific environments, which has important research value. With the development of convolutional neural networks (CNN), many CNN-based acoustic scene classification methods have emerged. Among them, the Temporal-Spectral CNN (TS-CNN) adopts Temporal-Spectral attention module, which is one of the best methods for classification of acoustic scenes at present, but because of its complex structure and low operation efficiency, it is difficult to achieve the best classification performance. To this end, this paper proposes a convolutional neural network model (TSCNN-CL) based on cooperative learning to improve the computational efficiency of the acoustic scene classification model, which can train multiple classifier heads simultaneously on one network without additional test costs. Furthermore, TSCNN-CL can reduce the occurrence of gradient explosion and gradient disappearance. Comprehensive experiments on ESC-10, ESC-50, and UrbanSound8k datasets show that the classification performance of TSCNN-CL model outperforms the TS-CNN model and has compelling advantages in comparison with some other state-of-art models.
Keywords:environmental sound classification  time-spatial convolutional neural network  collaborative learning  sound signal processing
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号