首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Human emotion recognition by optimally fusing facial expression and speech feature
Institution:1. School of Automation, China University of Geosciences, Wuhan 430074, China;2. Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Wuhan 430074, China;3. School of Engineering, Tokyo University of Technology, Tokyo 192-0982, Japan;4. Tokyo Institute of Technology, Yokohama 226-8502, Japan School of Automation, Beijing Institute of Technology, Beijing 100081, China;5. School of Automation, Beijing Institute of Technology, Beijing 100081, China;2. Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China;3. Advanced Scientific Computing Division, Euro-Mediterranean Centre on Climate Change (CMCC Foundation), Lecce, Italy
Abstract:Emotion recognition is a hot research in modern intelligent systems. The technique is pervasively used in autonomous vehicles, remote medical service, and human–computer interaction (HCI). Traditional speech emotion recognition algorithms cannot be effectively generalized since both training and testing data are from the same domain, which have the same data distribution. In practice, however, speech data is acquired from different devices and recording environments. Thus, the data may differ significantly in terms of language, emotional types and tags. To solve such problem, in this work, we propose a bimodal fusion algorithm to realize speech emotion recognition, where both facial expression and speech information are optimally fused. We first combine the CNN and RNN to achieve facial emotion recognition. Subsequently, we leverage the MFCC to convert speech signal to images. Therefore, we can leverage the LSTM and CNN to recognize speech emotion. Finally, we utilize the weighted decision fusion method to fuse facial expression and speech signal to achieve speech emotion recognition. Comprehensive experimental results have demonstrated that, compared with the uni-modal emotion recognition, bimodal features-based emotion recognition achieves a better performance.
Keywords:Facial expression recognition  Speech emotion recognition  Bimodal fusion  Feature fusion  RNN
本文献已被 ScienceDirect 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号