面向语音情感识别的改进可辨别完全局部二值模式 Improved discriminative completed local binary pattern for speech emotion recognition期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

面向语音情感识别的改进可辨别完全局部二值模式

引用本文：	陶华伟,张昕然,梁瑞宇,查诚,赵力,王青云.面向语音情感识别的改进可辨别完全局部二值模式[J].声学学报,2016,41(6):905-912.

作者姓名：	陶华伟张昕然梁瑞宇查诚赵力王青云

作者单位：	1 东南大学信息科学与工程学院东南大学水声信号处理教育部重点实验室南京 210096;

基金项目：	国家自然科学基金项目(61375028,61673108)、江苏省“六大人才高峰”项目(2016-DZXX-023)、江苏省博士后科研资助计划项目(1601011B)和江苏省“青蓝工程”项目资助

摘要：	为了研究语音情感与语谱图特征间的关系,本文研究并提出一种面向语音情感识别的改进可辨别完全局部二值模式特征。首先,基于语谱图灰度图像,计算图像的完全局部二值符号模式(CLBP_S)、幅度模式(CLBP_M)的统计直方图。然后,将CLBP_S,CLBP_M统计直方图输入可区别特征学习模型中,训练得到全局显著性模式集合。最后,采用全局显著性模式集合对CLBP_S,CLBP_M直方图进行处理,将处理后的特征级联,得到面向语音情感识别的改进可辨别完全局部二值模式特征(IDisCLBP_SER)。基于柏林库、中文情感语音库的语音情感识别实验显示,IDisCLBP_SER特征召回率比纹理图像信息(TII)等特征提高了8%以上,比声学频谱特征平均提高了4%以上。而且,本文提出的特征可以和现有声学特征进行较好融合,融合后的特征召回率比现有声学特征召回率提高1%~4%。
收稿时间：	2015-04-04
Improved discriminative completed local binary pattern for speech emotion recognition

Institution:	1 Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University Nanjing 210096;2 School of Communication Engineering, Nanjing Institute of Technology Nanjing 211167

Abstract:	In order to study the relationship between speech emotion and speech spectrum, a new feature is proposed for speech emotion recognition, which is called improved discriminative completed local binary pattern (IDisCLBP_SER). Firstly, based on spectrogram gray image, CLBP_M and CLBP_S statistical histograms are obtained through completed local binary pattern algorithm. Then, CLBP_M and CLBP_S statistical histograms are input into discriminative feature learning model, and are trained to get global dominant pattern set. Finally, global dominant pattern set is used to process CLBP_S and CLBP_M statistical histograms, and processed statistical histograms are joint, then IDisCLBP_SER feature is obtained. Experiment on EMO-DB database and Chinese emotional speech database show that recall rate of IDisCLBP_SER is improved by at least 8% compared to that of Texture Image Information (TII), and is averagely improved by more than 4% compared to that of speech spectrum feature. In addition, IDisCLBP_SER is fused with acoustic features, and recall rates of fusion features are improved by 1% - 4% compared to those of acoustic features.

Keywords:
本文献已被 CNKI 等数据库收录！
	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏