Human emotion recognition by optimally fusing facial expression and speech feature期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Human emotion recognition by optimally fusing facial expression and speech feature

Institution:	1. School of Automation, China University of Geosciences, Wuhan 430074, China;2. Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China;3. School of Engineering, Tokyo University of Technology, Tokyo 192-0982, Japan;4. Tokyo Institute of Technology, Yokohama 226-8502, Japan School of Automation, Beijing Institute of Technology, Beijing 100081, China;5. School of Automation, Beijing Institute of Technology, Beijing 100081, China;2. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China;3. Advanced Scientific Computing Division, Euro-Mediterranean Centre on Climate Change (CMCC Foundation), Lecce, Italy

Abstract:	Emotion recognition is a hot research in modern intelligent systems. The technique is pervasively used in autonomous vehicles, remote medical service, and human–computer interaction (HCI). Traditional speech emotion recognition algorithms cannot be effectively generalized since both training and testing data are from the same domain, which have the same data distribution. In practice, however, speech data is acquired from different devices and recording environments. Thus, the data may differ significantly in terms of language, emotional types and tags. To solve such problem, in this work, we propose a bimodal fusion algorithm to realize speech emotion recognition, where both facial expression and speech information are optimally fused. We first combine the CNN and RNN to achieve facial emotion recognition. Subsequently, we leverage the MFCC to convert speech signal to images. Therefore, we can leverage the LSTM and CNN to recognize speech emotion. Finally, we utilize the weighted decision fusion method to fuse facial expression and speech signal to achieve speech emotion recognition. Comprehensive experimental results have demonstrated that, compared with the uni-modal emotion recognition, bimodal features-based emotion recognition achieves a better performance.

Keywords:	Facial expression recognition Speech emotion recognition Bimodal fusion Feature fusion RNN
本文献已被 ScienceDirect 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏