首页 | 本学科首页   官方微博 | 高级检索  
     检索      

Speech emotion recognition based on statistical pitch model
作者姓名:WANG  Zhiping  ZHAO  Li  ZOU  Cairong
作者单位:Department of Radio Engineering, Southeast University Nanjing 210096
基金项目:This work was supported by the Doctoral Foundation of the Ministry of Education of China the Foundation of Key Item of Science and Technology of the Ministry of Education of China (No.03082) the National Natural Science Foundation of China (No.60472058).
摘    要:A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech. The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85% if the traditional parameters are utilized.

关 键 词:语音情绪识别  统计音调模式  高分辨率  平滑性
收稿时间:2005-04-15
修稿时间:2005-04-152005-08-02

Speech emotion recognition based on statistical pitch model
Authors:WANG Zhiping  ZHAO Li  ZOU Cairong
Abstract:A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech. The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85% if the traditional parameters are utilized.
Keywords:
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号