首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 156 毫秒
1.
一种基于音素模型感知度的发音质量评价方法   总被引:1,自引:1,他引:0  
张茹  韩纪庆 《声学学报》2013,38(2):201-207
为了提高发音质量判别精度,提出了一种基于音素模型感知度的发音质量评价方法。它采用不同语音样本集合下样本声学特征的对数后验概率期望差作为音素模型对变异发音的感知度,并以此为基础,生成各音素对应的识别模型候选集。实验表明,所提出的方法使语音识别网络候选音素模型集合尺寸减少约95%;在非母语语音数据库上,该方法评分与人工专家打分相关性为0.828,基于该方法得到的声韵母错误检出率为70.8%,声调错误检出率为42.5%,均优于其它方法。   相似文献   

2.
语音质量的客观评价可以代替昂贵的人工评分,但是目前客观指标的计算通常需要纯净的参考语音,这在许多实际声学系统中很难获得。为此提出了一种融合辅助目标学习和卷积循环网络(CRN)的非侵入式语音质量评价算法。为降低算法的复杂度,算法采用基于仿人耳听觉特性滤波器的Bark频率倒谱系数(BFCCs)作为CRN的输入。算法首先构建一个卷积神经网络(CNN)从BFCCs中提取帧级特征。然后,构建双向的长短记忆网络,在帧级特征中建模长期的时间依赖性和序列特征。最后,利用自注意力机制自适应地从帧级特征中筛选出有用信息,将其整合至话语层面的特征中,并将这些话语级特征映射为客观得分。为改善质量评测的有效性,算法采用多任务训练策略,引入语音激活检测(VAD)作为辅助学习目标。基于开源数据库的实验显示,与其他非侵入式算法相比,提出的算法和平均主观意见分(MOS)具有更好的相关性。而且,算法参数规模较小且对ITU-T P.808发布的带有主观MOS的失真语音数据库具有良好的泛化能力,接近语音质量感知评估(PESQ)指标的精度。   相似文献   

3.
认人的限定主题的连续汉语语音识别系统的研究   总被引:3,自引:0,他引:3  
本文描述一个基于矢量量化(VQ)、隐马尔可夫模型和有限态文法的认人的限定主题的连续汉语语音识别系统。引入跨零幅度差函数作为判定语音有无的特征参量之一,HMM训练用的各单个词语的语音数据由连续话句的语音数据经自动切分而得,识别过程中,每帧都考虑多个可能过渡到其它模型的文法节点。这些技术措施显著地提高了识别系统的准确率。这类系统能用于特定人操作的、特定主题的信息查询任务。待进一步解决非特定人的连续语音识别问题后,可用于特定主题的公用信息查询系统。  相似文献   

4.
多标度分形理论及其在语音质量客观评价中的应用   总被引:5,自引:1,他引:4  
探讨了多标度分形理论在语音信号处理中的应用,提出了语音信号的多标度分形分析 MFASS(Multifractal Approachof Speech Signal)方法,并在 MFASS基础上提出了一种新的基于输出方式的语音质量客观评价方法——OMBFD(ObjectiveMeasures Based on Fractal Dimension)方法。该方法利用多标度分形维数来描述语音信号的质量特征。实验结果表明,OMBFD方法能够描述语音质量好坏程度,其评价结果与主观评价分的相关度达到0.75以上。  相似文献   

5.
本文提出用同轴不共焦抛物面——双曲面掠射系统,代替现在流行的同轴共焦系统。利用不共焦引入预定量的球差,可以抵消一部分轴外球差。与Wolter-I型(象面离焦)相比较,在0~15弧分视场内象质有显著改善,在15~20弧分视场象质相同。指出用MTF评价,可以比用线扩散函数或均方根弥散圆(rms)更准确地评价这类系统的象质。本文还讨论了光阑设置及望远镜口径尺寸公差计算方法。把口径误差表示为焦点与端面的相对位移,可以定量计算口径误差对象质的影响。计算结果与国外已发表的数据基本符合。镜面口径差的公差与圆度公差比口径公差的要求严格一个数量级以上,抛物面比双曲面的口径差的公差严格4~5倍。  相似文献   

6.
重音是重要的语调特征,重音合成技术可以提高语音的自然度和表现力。针对重音的局部凸显性,该文提出了声学特征凸显度的表示方法,分析了不同韵律位置(韵律词首、中、尾,韵律短语首、中、尾等)重音音节的声学特征凸显度,发现在韵律单元末(韵律词末音节和韵律短语末韵律词)的重音其基频最大值凸显度要低于非韵律单元末重音,提出了基于声学特征凸显度的非线性的重音声学参数生成算法,解决了传统重音声学参数线性修改算法的修改幅度不足或过大的问题。采用该算法建立了基于隐Markov模型的支持重音合成的语音合成系统。实验表明,该系统可以有效合成带有重音的语音,提高了合成语音的自然度和表现力。   相似文献   

7.
语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。   相似文献   

8.
重音是重要的语调特征,重音合成技术可以提高语音的自然度和表现力。针对重音的局部凸显性,该文提出了声学特征凸显度的表示方法,分析了不同韵律位置(韵律词首、中、尾,韵律短语首、中、尾等)重音音节的声学特征凸显度,发现在韵律单元末(韵律词末音节和韵律短语末韵律词)的重音其基频最大值凸显度要低于非韵律单元末重音,提出了基于声学特征凸显度的非线性的重音声学参数生成算法,解决了传统重音声学参数线性修改算法的修改幅度不足或过大的问题。采用该算法建立了基于隐Markov模型的支持重音合成的语音合成系统。实验表明,该系统可以有效合成带有重音的语音,提高了合成语音的自然度和表现力。  相似文献   

9.
长时语音特征在说话人识别技术上的应用   总被引:1,自引:0,他引:1  
本文除介绍常用的说话人识别技术外,主要论述了一种基于长时时频特征的说话人识别方法,对输入的语音首先进行VAD处理,得到干净的语音后,对其提取基本时频特征。在每一语音单元内把基频、共振峰、谐波等时频特征的轨迹用Legendre多项式拟合的方法提取出主要的拟合参数,再利用HLDA的技术进行特征降维,用高斯混合模型的均值超向量表示每句话音时频特征的统计信息。在NIST06说话人1side-1side说话人测试集中,取得了18.7%的等错率,与传统的基于MFCC特征的说话人系统进行融合,等错率从4.9%下降到了4.6%,获得了6%的相对等错率下降。   相似文献   

10.
薛帅强  陈波  陈菲 《应用声学》2016,24(4):253-256
在对语音信号静音、清音、浊音划分的基础上,针对语音信号周期特征明显段分布随机性问题,提出改进的变长度平均幅度差函数LVAMDF及综合多因素基音检测算法,该算法对语音信号进行周期特征明显段和周期特征不明显段的聚类划分,同时,获取周期特征明显语音段的基音周期,针对少数基音周期划分倍频或半频问题,提出识别、修正方法,其识别、修正率极高。在对大量真实语音处理中,能够精确的检测出语音特征明显段的基音周期端点,基本没有倍频和半频划分,并且和AMDF、ACF算法作了对比。  相似文献   

11.
The auditory backward recognition masking (ABRM) and intensity discrimination (ID) thresholds of children with a specific language impairment and poor reading (SLI-poor readers), children with an SLI and average reading (SLI-average readers), children with a specific reading disability and average spoken language skills (SRD-average language), and children with normal spoken and written language (controls) were estimated with "child-friendly" psychophysical tasks. The pattern of ABRM and ID scores suggests that a subset of children with concomitant oral language and reading impairments has poor ABRM thresholds, and that a subgroup of children with an SLI or SRD has poorer ID thresholds than controls. The latter result warns against using rapid auditory processing tasks that do not actively control for auditory discrimination ability. Further, some unusually poor ABRM scores and ID scores question the validity of extreme scores produced by children on psychophysical tasks. Finally, the poor oral language scores of many of the children who had impaired reading highlight the need to test the oral language skills of SRD samples to ascertain how homogeneous and specifically disabled they really are.  相似文献   

12.
Lip contour tracking is an integral part of lip reading application. Fast and accurate lip tracking is an important step in lip reading. This paper uses a novel active contour model for lip tracking and proposes geometrical feature extraction approach for lip reading. Effect of individual features are compared and a joint feature model is obtained by combining weighted decision obtained by a feature vector of difference in inner area, height and width of lip. Ergodic hidden markov model (HMM) is used as a classifier. For each digit Markov Model is tested for 3 states and 5 states. Videos of English digit from 0 to 9 have been recorded for recognition test. Cuave database is used for comparison along with an in-house database. While doing computation of feature vectors, only significant frames are used to reduce the computation complexity. Results of experimentations on digit utterances are given to show that the maximum recognized digit can be used for important programming command of computerized numerical control machines.  相似文献   

13.
In this paper, we introduce a novel objective prior distribution levering on the connections between information, divergence and scoring rules. In particular, we do so from the starting point of convex functions representing information in density functions. This provides a natural route to proper local scoring rules using Bregman divergence. Specifically, we determine the prior which solves setting the score function to be a constant. Although in itself this provides motivation for an objective prior, the prior also minimizes a corresponding information criterion.  相似文献   

14.
Differential replication is a method to adapt existing machine learning solutions to the demands of highly regulated environments by reusing knowledge from one generation to the next. Copying is a technique that allows differential replication by projecting a given classifier onto a new hypothesis space, in circumstances where access to both the original solution and its training data is limited. The resulting model replicates the original decision behavior while displaying new features and characteristics. In this paper, we apply this approach to a use case in the context of credit scoring. We use a private residential mortgage default dataset. We show that differential replication through copying can be exploited to adapt a given solution to the changing demands of a constrained environment such as that of the financial market. In particular, we show how copying can be used to replicate the decision behavior not only of a model, but also of a full pipeline. As a result, we can ensure the decomposability of the attributes used to provide explanations for credit scoring models and reduce the time-to-market delivery of these solutions.  相似文献   

15.
This paper presents an automatic scoring method for p53 immunostained tissue images of oral cancer that consist of tissue image segmentation, splitting of clustered nuclei, feature extraction and classification. The tissue images are segmented using entropy thresholding technique in which the optimum threshold value to each color component is obtained by maximizing the global entropy of its gray-level co-occurrence matrix and clustered cells are separated by selectively applying marker-controlled watershed transform. Cell nuclei feature is extracted by maximal separation technique (MS) based on blue component of tissue image and subsequently, each cell is classified into one of four categories using multi-level thresholding. Finally, IHC score of tissue images have been determined using Allred method. A statistical analysis is performed between immuno-score of manual and automatic method, and compared with the scores that have obtained using other MS techniques. According to the performance evaluation, IHC score based on blue component that has high correlation coefficients (CC) of 0.95, low mean difference (MD) of 0.15, and a very close range of 95% confidence interval with manual scores. Therefore, automatic scoring method presented in this paper has high potential to help the pathologist in IHC scoring of tissue images.  相似文献   

16.
倪崇嘉  刘文举  徐波 《声学学报》2012,37(5):553-560
虽然汉语和英语的重音自动标注被广泛的研究,但是关于汉语和英语的重音自动标注之间对比的研究还鲜有报道。基于汉语韵律标注库ASCCD和英语韵律标注库Boston University Radio News Corpus,对汉语和英语的重音自动标注的异同进行对比,考察不同的特征在不同语言的语料库上的泛化性能。通过基于集成分类回归树的重音自动标注实验、特征分析及基于互信息的重音自动标注的声学对比,得到如下结论:在相同的条件下,汉语重音自动标注的正确率比英语重音自动标注的正确率要低;在重音自动标注中,词典语法相关特征比声学相关的特征更重要;不同的声学信息源在重音自动标注中所起的作用不同,时长相关的特征对汉语和英语重音自动标注都很重要;英语中大部分特征提供的互信息要比汉语相应的特征提供的互信息要高。   相似文献   

17.
This study examined production of word-final English /p/ and /b/ by subjects whose native language does not possess voiced stops in word-final position. Native Chinese adults resembled native English adults, native English children, and native Chinese children in producing /p/ with greater peak oral air pressure than /b/. However, unlike subjects in the other groups, the Chinese adults' /b/ was sometimes misidentified as /p/. This may have occurred, at least in part, because the Chinese adults produced a much smaller difference between /p/ and /b/ in labial closure duration and voicing than the other subjects. The English adults sustained voicing in /b/ significantly longer than subjects in the other three groups. To help determine the basis for this ability, the shape of oral air pressure waveforms was examined systematically. The percentage of "delayed" and "bimodal" waveforms, in which pressure stopped increasing, or decreased, prior to the release of labial constriction, was calculated for each group. Only the English adults showed more such waveforms for /b/ than /p/. Voicing continued 18 ms longer in /b/ tokens with delayed and bimodal waveforms than in tokens in which oral pressure increased continuously. The duration of closure voicing was correlated with the rate at which pressure increased in the English adults' /b/ waveforms. Previous aerodynamic modeling has shown that delayed and bimodal waveforms may result from an active enlargement of the supraglottal cavity. This, together with the pattern of between-group differences observed here, suggests that the English adults learned to enlarge the supraglottal cavity to sustain voicing in /b/. It appears that neither the children nor the Chinese adults had as yet acquired this skill.  相似文献   

18.
对地物高光谱进行特征分析是高光谱影像用于目标识别和地物分类的基础.基于数学形态学的Top-Hat变换提出了一种光谱吸收峰增强算法.该方法在增强吸收峰的同时还保持了吸收谱带的波形特征.从美国地质调查局USGS光谱数据库选取的11条不同矿物的反射光谱曲线,对其吸收峰增强曲线和原始光谱曲线进行了K-means聚类分析.结果表明:吸收峰增强曲线的聚类结果在波形上和地质背景上都优于原始光谱曲线;且将吸收峰增强曲线的聚类的结果用矿物光谱的ASTER影像采样光谱曲线显示时,能总结出各组矿物的ASTER光谱典型特征.说明吸收峰增强曲线很好地增强了矿物光谱的吸收特征,提高了高光谱的可分性,同时还能为基于多光谱数据的遥感信息提取提供参考,是十分有用的高光谱分析方法.  相似文献   

19.
To simplify the L-[1-13C]phenylalanine breath test which is used to assess liver function the tracer is usually given orally, and CO2 production rate is estimated. In 12 healthy volunteers and 10 liver cirrhotics we compared the oral approach with i.v. tracer administration combined with measurement of individual CO2 production rate. The 13CO2/12CO2 enrichment was assessed by isotope-ratio mass spectrometry. After i.v. [1-13C]phenylalanine application exhaled 13C recovery per minute peaked within 10 minutes (controls: 0.17 +/- 0.06%; cirrhotics: 0.05 +/- 0.02%, p < 0.01). The oral approach yielded comparable separation between 30-60 minutes, with average peak values being 0.18 +/- 0.03% and 0.06 +/- 0.03% (p < 0.01), respectively. Variable gastrointestinal resorption kinetics after oral application probably causes this difference.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号