全局特征及弱尺度融合策略的小样本语音情感识别 Small sample size speech emotion recognition based on global features and weak metric learning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

全局特征及弱尺度融合策略的小样本语音情感识别

引用本文：	黄永明,章国宝,李雄,达飞鹏.全局特征及弱尺度融合策略的小样本语音情感识别[J].声学学报,2012,37(3):330-338.

作者姓名：	黄永明章国宝李雄达飞鹏

作者单位：	1 东南大学自动化学院南京 210096;

基金项目：	国家863计划和国家自然科学基金资助项目。

摘要：	语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。
收稿时间：	2010-12-08
Small sample size speech emotion recognition based on global features and weak metric learning

Affiliation:	1. School of Automation, Southeast University Nanjing 210096;2 Institute of Image Processing & Pattern Recognition, Shanghai Jiao Tong University Shanghai 200240

Abstract:	The emotional speech is a kind of non-stationary time and frequency signal,and it has been shown that local features extracted from each frame make great contribution to speech emotion recognition.However,it's inadequate to use only local features to build a robust speech emotion classification system,as local features extracted from speech divided into frames can not reflect the dynamic characteristics of emotion speech signal accurately.In this paper, utterance-level global features without dividing the emotion speech into frames based on multi-scale optimal wavelet packet decomposition,and 384-dimensional utterance-level local features,are extracted together to improve the robustness and recognition rate of classification system.Given less training samples,while the dimensions of eigenvectors being reduced by Fisher discriminant,a fusion strategy with metric learning,which is called weak metric learning in this work,is adopt for fusing global and local utterance-level features.The experimental results with LIBSVM show that our method achieves significant improvements about 4.2% to 13.8% with comparison to using local utterance-level feature merely,and the speech emotion recognition rate has less fluctuations especially in the case of small sample size.

Keywords:

	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏