首页 | 本学科首页   官方微博 | 高级检索  
     

语音情感识别中的特征选择方法
引用本文:褚钰,李田港,叶硕,叶光明. 语音情感识别中的特征选择方法[J]. 应用声学, 2020, 39(2): 223-230
作者姓名:褚钰  李田港  叶硕  叶光明
作者单位:武汉邮电科学研究院,武汉邮电科学研究院,武汉邮电科学研究院,武汉烽火众智数字技术有限责任公司
基金项目:湖北省科技厅2018年度湖北省技术创新专项重大项目
摘    要:为了解决传统卷积神经网络在识别中文语音时预测错误率较高、泛化性能弱的问题,首先以深度卷积神经网络(DCNN)-连接时序分类(CTC)为研究对象,深入分析了不同卷积层、池化层以及全连接层的组合对其性能的影响;其次,在上述模型的基础上,提出了多路卷积神经网络(MCNN)-连接时序分类(CTC),并联合SENet提出了深度SE-MCNN-CTC声学模型,该模型融合了MCNN与SENet的优势,既能加强卷积神经网络的深层信息的传递、避免梯度问题,又可以对提取的特征图进行自适应重标定。最终实验结果表明:SE-MCNN-CTC相较于DCNN-CTC错误率相对降低13.51%,模型最终的错误率达22.21%;算法改进后的声学模型可以有效地提升泛化性能。

关 键 词:深度学习  语音识别  声学模型  SE-MCNN-CTC
收稿时间:2019-05-06
修稿时间:2020-02-25

Research on feature selection method in speech emotion recognition
Chu Yu,Li Tiangang,Ye Shuo and Ye Guangming. Research on feature selection method in speech emotion recognition[J]. Applied Acoustics(China), 2020, 39(2): 223-230
Authors:Chu Yu  Li Tiangang  Ye Shuo  Ye Guangming
Affiliation:Wuhan Research Institute of Posts and Telecommunications,Wuhan Research Institute of Posts and Telecommunications,Wuhan Research Institute of Posts and Telecommunications,Wuhan Fiberhome Wisdom Digital Technology co,LTD
Abstract:Speech emotion recognition is of great value in many fields. The recognition effect of different emotion acoustic features is obviously different when different classifiers are used for classification. Acoustic features related to speech emotions include spectral features, rhythmic features and quality features. This paper proposes a method of feature fusion, which combines the features of the three acoustic features with the best recognition ability: All the features of the spectral features that are stable in the experiment and have a high recognition rate are retained, and the relevant statistics of the rhythmic features and quality features are extracted as auxiliary features and integrated into the spectral features. Experiments show that the fusion feature proposed in this paper is better than the single feature when using the same classifier for classification; when using different classifiers, the fusion feature still has better recognition ability and stable recognition performance. It has better recognition rate on three data sets and basically realizes cross-dataset recognition.
Keywords:speech recognition   emotion recognition   feature selection   feature fusion
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号