首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于特征比较和模拟退火-遗传算法的普通话音素评分模型
引用本文:王健,关添,叶大田.基于特征比较和模拟退火-遗传算法的普通话音素评分模型[J].清华大学学报(自然科学版),2012(6):880-884.
作者姓名:王健  关添  叶大田
作者单位:清华大学生物医学工程系;清华大学深圳研究生院生物医学工程研究中心
基金项目:国家自然科学基金资助项目(300800234);深圳市基础研究基金资助项目(JC200903180546A);广东省自然科学基金资助项目(10151805702000000)
摘    要:为了帮助发音困难者障碍者和外语学习者矫正普通话发音错误,提出基于Mel频率倒谱系数(Mel frequencycepstrum coefficient,MFCC)特征比较和模拟退火-遗传算法(simulated annealing genetic algorithm,SAGA)的普通话音素评分模型。该模型采用动态时间弯折(dynamic timewarping,DTW)算法对普通话音素进行相似度比对,并基于SAGA评分机制对发音进行自动评分。本文对比了不同优化算法(SAGA和局部优化算法)、不同DTW算法对语音评分的影响。结果发现:SAGA评分模型下的音素评分正确率大于94%,远远优于局部优化算法。此外,在SAGA评分模型下,搜索路径为平行四边形的改进DTW算法具有最优的评分结果。因此,基于MFCC和SAGA的评分模型适用于普通话音素评分。

关 键 词:特征比较  Mel频率倒谱系数(MFCC)  改进动态时间弯折(DTW)算法  模拟退火-遗传算法(SA-GA)  音素评分

Pronunciation scoring model for Mandarin Phonemes based on feature comparison using a simulated annealing genetic algorithm
WANG Jian,GUAN Tian,YE Datian.Pronunciation scoring model for Mandarin Phonemes based on feature comparison using a simulated annealing genetic algorithm[J].Journal of Tsinghua University(Science and Technology),2012(6):880-884.
Authors:WANG Jian  GUAN Tian  YE Datian
Institution:1,2(1.Department of Biomedical Engineering,Tsinghua University, Beijing 100084,China; 2.Research Center of Biomedical Engineering,Graduate School at Shenzhen,Tsinghua University,Shenzhen 518055,China)
Abstract:A Mandarin Phoneme pronunciation scoring model was developed to help people with difficulty in pronunciation and people learning foreign languages to rectify pronunciation errors.The method uses feature comparison of the Mel frequency cepstrum coefficient(MFCC) and a simulated annealing genetic algorithm(SAGA).The dynamic time warping(DTW) algorithm is used to evaluate the phoneme similarity,and to automatically compute the scores for these phonemes based on the SAGA scoring mechanism.This paper compares phoneme scores using different optimization algorithms(SAGA and local optimization) and different DTW algorithms.The results show that the SAGA model accuracy is better than 94%,significantly better than the local-optimization model.Moreover,the combination of SAGA and the improved DTW algorithm with a parallelogram search path resulted in the best pronunciation score.Thus,the model based on the MFCC and SAGA methods is appropriate for pronunciation scoring of Mandarin Phonemes.
Keywords:feature comparison  Mel frequency cepstrum coefficient(MFCC)  improved dynamic time warping algorithm(DTW)  simulated annealing genetic algorithm(SAGA)  phoneme scoring
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号