首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Effect of speech-intrinsic variations on human and automatic recognition of spoken phonemes
Authors:Meyer Bernd T  Brand Thomas  Kollmeier Birger
Institution:Medizinische Physik, Carl-von-Ossietzky Universita?t Oldenburg, D-26111 Oldenburg, Germany. bernd.meyer@uni-oldenburg.de
Abstract:The aim of this study is to quantify the gap between the recognition performance of human listeners and an automatic speech recognition (ASR) system with special focus on intrinsic variations of speech, such as speaking rate and effort, altered pitch, and the presence of dialect and accent. Second, it is investigated if the most common ASR features contain all information required to recognize speech in noisy environments by using resynthesized ASR features in listening experiments. For the phoneme recognition task, the ASR system achieved the human performance level only when the signal-to-noise ratio (SNR) was increased by 15 dB, which is an estimate for the human-machine gap in terms of the SNR. The major part of this gap is attributed to the feature extraction stage, since human listeners achieve comparable recognition scores when the SNR difference between unaltered and resynthesized utterances is 10 dB. Intrinsic variabilities result in strong increases of error rates, both in human speech recognition (HSR) and ASR (with a relative increase of up to 120%). An analysis of phoneme duration and recognition rates indicates that human listeners are better able to identify temporal cues than the machine at low SNRs, which suggests incorporating information about the temporal dynamics of speech into ASR systems.
Keywords:
本文献已被 PubMed 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号