声学发音模型辅助建模的发音错误检测与诊断 Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

声学发音模型辅助建模的发音错误检测与诊断

引用本文：	柳宗铭,王丽,李军锋,张鹏远.声学发音模型辅助建模的发音错误检测与诊断[J].声学学报,2023,48(1):264-273.

作者姓名：	柳宗铭王丽李军锋张鹏远

作者单位：	1 中国科学院声学研究所, 语言声学与内容理解重点实验室北京 100190;

基金项目：	国家重点研发计划项目(2020YFC2004100)资助

摘要：	发音错误检测与诊断(MDD)任务的专家标注数据稀缺。从添加发音模型更高效地利用有限数据建模发音规律,辅助基于音素识别的MDD的思路出发,提出一种同时融合声学和文本信息,在理论上更完备地建模发音错误产生过程的声学发音模型。基于发音错误产生过程不同部分的声学关联性,该模型通过与音素识别模型共享声学编码器网络参数,以多任务学习方式联合优化,实现辅助建模。并且,提出声学置信度掩蔽-预测训练方式进一步强化两个任务的联系,提高辅助建模效率。实验表明,声学发音模型能够有效建模发音错误规律;利用其辅助音素识别模型建模后,MDD系统在发音错误检测、诊断和音素识别上分别有4.9%,9.5%和14.0%的提升;声学置信度掩蔽-预测训练方法提高了辅助建模效率,掩蔽参数或联合优化参数选择也会影响辅助建模效果。
关键词：	发音错误音素识别置信度建模效率 MDD 检测与诊断模型辅助
收稿时间：	2022-05-06
Mispronunciation detection and diagnosis with acoustic pronunciation model aided modeling

Affiliation:	1 Key Laboratory of Speech Acoustic and Content Understanding, Institute of Acoustics Chinese Academy of Sciences Beijing 100190;2 University of Chinese Academy of Sciences Beijing 100049

Abstract:	For Mispronunciation Detection and Diagnosis (MDD) tasks, expert-annotated data are scarce. To efficiently model pronunciation regularities on limited data and then aid MDD systems, an acoustic pronunciation model that integrates both acoustic and textual information is proposed. It models the mispronunciation generation process in a more theoretically complete way. Based on the acoustic correlation of different parts of this process, the model achieves aided modeling by sharing the acoustic encoder network parameters with the phoneme recognition model and optimizing it jointly in a multi-task learning manner. Moreover, the acoustic confidence masking-prediction training approach is proposed to further strengthen the correlation between the two tasks and improve the efficiency of aided modeling. Experiments show that the acoustic pronunciation model can effectively model mispronunciation regularities. With its aid in phoneme recognition modeling, the MDD system showed 4.9%, 9.5%, and 14.0% improvement in mispronunciation detection, diagnosis, and phoneme recognition, respectively. The acoustic confidence masking-prediction training method improves the efficiency of aided modeling, and both the masking parameters and the multi-task learning parameters can affect the effectiveness of aided modeling.

Keywords:

	点击此处可从《声学学报》浏览原始摘要信息
	点击此处可从《声学学报》下载全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏