首页 | 本学科首页   官方微博 | 高级检索  
     

嵌入马尔可夫网络的多尺度判决融合耳语音情感识别
引用本文:黄程韦,金赟,包永强,余华,赵力. 嵌入马尔可夫网络的多尺度判决融合耳语音情感识别[J]. 信号处理, 2013, 29(1): 98-106
作者姓名:黄程韦  金赟  包永强  余华  赵力
作者单位:东南大学水声信号处理教育部重点实验室
基金项目:国家自然科学基金(No:61231002;No:61273266;No:51075068);教育部博士点基金(No.20110092130004);江苏省高校自然科学研究基金(No.10KJB510005)
摘    要:本文中我们提出了一种将高斯混合模型同马尔可夫网络结合的时域多尺度语音情感识别框架,并将其应用在耳语音情感识别中。针对连续语音信号的特点,分别在耳语音信号的短句尺度上和长句尺度上进行了基于高斯混合模型的情感识别。根据情绪的维度空间论,耳语音信号中的情感信息具有时间上的连续性,因此利用三阶的马尔可夫网络对多尺度的耳语音情感分析进行了上下文的情感依赖关系的建模。采用了一种弹簧模型来定义二维情感维度空间中的高阶形变,并且利用模糊熵评价将高斯混合模型的似然度转化为马尔可夫网络中的一阶能量。实验结果显示,本文提出的情感识别算法在连续耳语音数据上获得了较好的识别结果,对愤怒的识别率达到了64.3%。实验结果进一步显示,与正常音的研究结论不同,耳语音中的喜悦情感的识别相对困难,而愤怒与悲伤之间的区分度较高,与Cirillo等人进行的人耳听辨研究结果一致。 

关 键 词:语音情感识别   多尺度分析   马尔可夫网络   判决融合
收稿时间:2012-06-19

Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion
HUANG Cheng-wei , JIN Yun , BAO Yong-qiang , YU Hua , ZHAO Li. Whispered Speech Emotion Recognition Embedded with Markov Networks and Multi-Scale Decision Fusion[J]. Signal Processing(China), 2013, 29(1): 98-106
Authors:HUANG Cheng-wei    JIN Yun    BAO Yong-qiang    YU Hua    ZHAO Li
Affiliation:Key Laboratory of Underwater Acoustic Signal Processing of Ministry of Education, Southeast University, Nanjing
Abstract:In this paper we proposed a multi-scale framework in the time domain to combine the Gaussian Mixture Model and the Markov Network, and apply which to the whispered speech emotion recognition. Based on Gaussian Mixture Model, speech emotion recognition on the long and short utterances are carried out in continuous speech signals. According to the emotion dimensional model, whispered speech emotion should be continuous in the time domain. Therefore we model the context dependency in whispered speech using Markov Network. A spring model is adopted to model the high-order variance in the emotion dimensional space and fuzzy entropy is used for calculating the unary energy in the Markov Network. Experimental results show that the recognition rate of anger emotion reaches 64.3%. Compared with the normal speech the recognition of happiness is more difficult in whispered speech, while anger and sadness is relatively easy to classify. This conclusion is supported by the listening experiment carried out by Cirillo and Todt. 
Keywords:
本文献已被 CNKI 万方数据 等数据库收录!
点击此处可从《信号处理》浏览原始摘要信息
点击此处可从《信号处理》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号