层叠式“产生/判别”混合模型的语音情感识别 Speech emotion recognition using stacked generative and discriminative hybrid models期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

层叠式“产生/判别”混合模型的语音情感识别

引用本文：	黄永明,章国宝,董飞,李悦.层叠式“产生/判别”混合模型的语音情感识别[J].声学学报,2013,38(2):231-240.

作者姓名：	黄永明章国宝董飞李悦

作者单位：	东南大学自动化学院南京 210096

基金项目：	国家863计划;国家自然科学基金资助项目

摘要：	提出了层叠式“产生/判别”混合模型的语音情感识别方法。首先,提取63维语句级特征,运用Fisher从中选择12个最佳的语句级特征,建立小波神经网络(WNN)的层叠式产生式模型进行语音情感识别;然后提取69维帧级特征,采用SFS选择出待使用的8维特征,将高斯混合模型(GMM)进行多维概率输出,建立层叠式“产生/判别”混合模型进行语音情感识别。实验结果显示:(1)层叠式“产生/判别”混合模型较单独WNN、GMM、HMM (隐马尔可夫模型)、SVM (支持向量机)的识别率要高;(2)层叠式“产生/判决式”混合模型识别率较基于WNN的层叠产生式模型高;(3) M=13,D维GMM-MAP/SVM (MAP,最大后验概率)串联融合模型为最优的层叠式“产生/判别”混合模型,能获得最高85.1%的识别率。
收稿时间：	2011-10-12
Speech emotion recognition using stacked generative and discriminative hybrid models

HUANG Yongming,ZHANG Guobao,DONG Fei,LI Yue.Speech emotion recognition using stacked generative and discriminative hybrid models[J].Acta Acustica,2013,38(2):231-240.

Authors:	HUANG Yongming ZHANG Guobao DONG Fei LI Yue

Institution:	School of Automation, Southeast University Nanjing 210096

Abstract:	Generative models and discriminative models have advantages and disadvantages on internal distribution, optimize classification results,dynamic variation characteristics of emotion.This paper attempts to fuse the two kinds of models together and speech emotion recognition based on stacked hybrid generative and discriminative models.First, we reduce the dimensions of utterance-level eigenvectors from 63 to 12 by fisher discriminant,which is used for the stacked discriminative models.Then we use Sequential Forward Selection to select 8 dimensional frame-level features from the total 69 dimensional features,and two kind of GMM multidimensional likelihoods(the same dimension as eigenvector and mixtures of GMM) are proposed for hybrid generative and discriminative models.Experimental results on Berlin emotional speech databases show that(1) hybrid generative and discriminative models achieves significant improvements than merely using WNN,GMM,HMM,or SVM;(2) the recognition rate of the stacked generative and discriminative hybrid models is higher than the stacked discriminative models(3) the GMM-MAP/SVM series hybrid model(the mixtures of GMM is 13,GMM multidimensional likelihoods is the same dimension with eigenvector) is the optimal stacked generative and discriminative hybrid Models,with the recognition rate up to 85.1%.

Keywords:
本文献已被 CNKI 等数据库收录！
	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏