半监督判别分析的跨库语音情感识别 Cross corpus speech emotion recognition using semi-supervised discriminant analysis期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

半监督判别分析的跨库语音情感识别

引用本文：	金赟, 宋鹏, 郑文明, 赵力. 半监督判别分析的跨库语音情感识别[J]. 声学学报, 2015, 40(1): 20-27. DOI: 10.15949/j.cnki.0371-0025.2015.01.003

作者姓名：	金赟宋鹏郑文明赵力

作者单位：	1. 江苏师范大学物理与电子工程学院徐州 221116;

基金项目：	国家自然科学基金(61231002,61273266,11274144,61301295)和江苏高校优势学科建设工程项目(PAPD)资助

摘要：	针对训练样本与测试样本来自不同语音情感数据库造成特征向量空间分布不匹配的问题,采用半监督判别分析减小二者的差异。首先寻找有标签的训练样本和来自另一个库的部分无标签训练样本之间的最优投影方向。基于一致性假设即相近的点更有可能具有相同的类别,利用p近邻图对无标签训练样本相近点之间的关系进行建模,从而获得无标签样本的分布信息。在保证无标签样本间流形结构的同时,使所有训练样本类间散度和类内散度的比值达到最大,从而得到最优的投影方向。采用两组实验进行验证,第1组用eNTERFACE库训练去测试Berlin库,识别率为51.41%,第2组用Berlin库训练测试eNTERFACE库,识别率为45.76%,相比未采用半监督判别分析的识别结果分别有了13.72%和22.81%的提高,说明该算法的有效性。通过实验前后数据的可视化分析,说明利用半监督判别分析确实减小了不同库之间特征向量空间分布的不匹配问题,从而提高跨库语音情感识别率。
关键词：	跨库语音情感半监督学习判别分析
收稿时间：	2013-07-08
修稿时间：	2014-04-22
Cross corpus speech emotion recognition using semi-supervised discriminant analysis

JIN Yun, SONG Peng, ZHENG Wenming, ZHAO Li. Cross corpus speech emotion recognition using semi-supervised discriminant analysis[J]. ACTA ACUSTICA, 2015, 40(1): 20-27. DOI: 10.15949/j.cnki.0371-0025.2015.01.003

Authors:	JIN Yun SONG Peng ZHENG Wenming ZHAO Li

Affiliation:	1. Jiangsu Normal University, School of Physics and Electronic Engineering Xuzhou 221116;2. Southeast University, Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education Nanjing 210096;3. Southeast University, Key Laboratory of Child Development and Learning Science, Ministry of Education Nanjing 210096

Abstract:	In order to solve the problem of feature vector distribution mismatch between training samples and testing samples from different speech emotion corpus, semi-supervised discriminant analysis is adopted to reduce such mismatch. Firstly, the optimal project direction of the labeled training samples from one corpus and some unlabeled training samples from another corpus should be determined. With the consistence assumption that the closer points are more likely to be the same class, the relationship among the close points is modeled using p nearest neighbor graph to obtain the distribution information of the unlabeled samples. The ratio between intra-class scatter matrix and inter- class scatter matrix is maximized and the manifold consistence of unlabeled training sample is kept as well. Then the optimal projection vector is obtained. Two classification experiments are carried out. Firstly, eNTERFACE corpus is for training and Berlin corpus is for testing, and the recognition rate is 51.41%. Secondly, Berlin corpus is for training and eNTERFACE corpus is for testing, and the recognition rate is 45.76%. Comparing to the results with directly classification, the recognition rates are inlproved by 13.72% and 22.81% respectively, which demonstrates the effectiveness of our proposed method. Through the visualization analysis to the data before and after experiments, it is observed that the mismatch between the samples from different corpus is reduced and the recognition rate is enhanced.

Keywords:	Cross corpus Speech emotion Semi-supervised learning Discriminant analysis

	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏