首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Aligning Text and Phonemes for Speech Technology Applications Using an EM-Like Algorithm
Authors:R I Damper  Y Marchand  J-D S Marsters  A I Bazin
Institution:(1) Image, Speech and Intelligent Systems (ISIS) Research Group, School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK;(2) Institute for Biodiagnostics (Atlantic), National Research Council Canada, Neuroimaging Research Laboratory, 1796 Summer Street, Suite 3900, Halifax, Nova Scotia, Canada, B3H 3A7;(3) Image, Speech and Intelligent Systems (ISIS) Research Group, School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, UK
Abstract:A common requirement in speech technology is to align two different symbolic representations of the same linguistic ‘message’. For instance, we often need to align letters of words listed in a dictionary with the corresponding phonemes specifying their pronunciation. As dictionaries become ever bigger, manual alignment becomes less and less tenable yet automatic alignment is a hard problem for a language like English. In this paper, we describe the use of a form of the expectation-maximization (EM) algorithm to learn alignments of English text and phonemes, starting from a variety of initializations. We use the British English Example Pronunciation (BEEP) dictionary of almost 200,000 words in this work. The quality of alignment is difficult to determine quantitatively since no ‘gold standard’ correct alignment exists. We evaluate the success of our algorithm indirectly from the performance of a pronunciation by analogy system using the aligned dictionary data as a knowledge base for inferring pronunciations. We find excellent performance—the best so far reported in the literature. There is very little dependence on the start point for alignment, indicating that the EM search space is strongly convex. Since the aligned BEEP dictionary is a potentially valuable resource, it is made freely available for research use.
Keywords:text-to-speech synthesis  string alignment  dynamic programming  EM algorithm  pronunciation by analogy
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号