Speech emotion recognition based on statistical pitch model Speech emotion recognition based on statistical pitch model期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Speech emotion recognition based on statistical pitch model

作者姓名：	WANG Zhiping ZHAO Li ZOU Cairong

作者单位：	Department of Radio Engineering, Southeast University Nanjing 210096

基金项目：	This work was supported by the Doctoral Foundation of the Ministry of Education of China the Foundation of Key Item of Science and Technology of the Ministry of Education of China (No.03082) the National Natural Science Foundation of China (No.60472058).

摘要：	A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech. The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85% if the traditional parameters are utilized.
关键词：	语音情绪识别统计音调模式高分辨率平滑性
收稿时间：	2005-04-15
修稿时间：	2005-04-152005-08-02
Speech emotion recognition based on statistical pitch model

Authors:	WANG Zhiping ZHAO Li ZOU Cairong

Abstract:	A modified Parzen-window method, which keep high resolution in low frequencies and keep smoothness in high frequencies, is proposed to obtain statistical model. Then, a gender classification method utilizing the statistical model is proposed, which have a 98% accuracy of gender classification while long sentence is dealt with. By separation the male voice and female voice, the mean and standard deviation of speech training samples with different emotion are used to create the corresponding emotion models. Then the Bhattacharyya distance between the test sample and statistical models of pitch, are utilized for emotion recognition in speech. The normalization of pitch for the male voice and female voice are also considered, in order to illustrate them into a uniform space. Finally, the speech emotion recognition experiment based on K Nearest Neighbor shows that, the correct rate of 81% is achieved, where it is only 73.85% if the traditional parameters are utilized.

Keywords:
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏