首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
基于听觉模型的耳语音的声韵切分   总被引:5,自引:0,他引:5       下载免费PDF全文
丁慧  栗学丽  徐柏龄 《应用声学》2004,23(2):20-25,44
本文分析了耳语音的特点,并根据生理声学及心理声学的基本理论与实验资料,提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次:耳蜗对声音频率的分解机理;听觉系统的时域和频域非线性变化;中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性,因而适于耳语音识别,在耳语音声韵母切分实验中得到了满意的结果。  相似文献   

2.
提出了全局谱参数下的耳语说话人状态因子分析方法。首先,根据耳语听辨实验结果,提出导入唤醒度-愉悦度因子对说话人状态进行三级度量;其次,提取耳语音正弦模型、人耳听觉模型下的谱参数,结合其他短时频谱参量,进行轨迹跟踪并计算各参数的全局统计变量,作为特征参数来实现耳语说话人状态的分类。实验结果显示,正弦模型及人耳听觉模型的全局谱参数可将耳语说话人状态因子分类系统的准确率提高至90%。该分类方法及状态因子描述方案提供了耳语音说话人状态分析的有效途径。   相似文献   

3.
基于掩蔽特性的噪声品质评估研究   总被引:1,自引:0,他引:1  
针对宽带噪声的频谱特点和人耳听觉的掩蔽特性,提出了一种噪声掩蔽的等效原则,以及一种新的烦恼度指数的计算方法。在采用时频分析的方法,结合统计学规律,辨识出噪声中致人烦恼的频率成份之后,利用该烦恼度指数可度量各频率成份的烦恼程度,从而建立了一种噪声品质评估模型。试验研究的结果表明:基于掩蔽特性的噪声品质评估模型可以快捷而准确地识别噪声中烦恼的频率成份,并与人耳主观辨识的结果具有良好的一致性。  相似文献   

4.
用于无监督语音降噪的听觉感知鲁棒主成分分析法   总被引:2,自引:0,他引:2       下载免费PDF全文
闵刚  邹霞  韩伟  张雄伟  谭薇 《声学学报》2017,42(2):246-256
针对现有稀疏低秩分解语音降噪方法对人耳听觉感知特性应用不充分、语音失真易被感知的问题,提出了一种用于语音降噪的听觉感知鲁棒主成分分析法。由于耳蜗基底膜对于频率感知具有非线性特性,该方法采用耳蜗谱图作为语噪分离的基础。此外,选用符合人耳听觉感知特性的板仓-斋田距离度量作为优化目标函数,在稀疏低秩建模过程中引入非负约束以使分解分量更符合实际物理含义,并在交替方向乘子法框架下推导了具有闭合解形式的迭代优化算法。文中方法在语音降噪时是完全无监督的,无需预先训练语音或噪声模型。多种类型噪声和不同信噪比条件下的仿真实验验证了该方法的有效性,噪声抑制效果较目前同类算法更为显著,且降噪后语音的可懂度和总体质量有所提高、至少相当。   相似文献   

5.
以发声生理和病理为基础,针对唇端语音、线性预测反滤波处理后的声门气流体积速度波和颈前体表传导嗓音信号分别采用独立分量分析,分离出谐波与噪声信号,并估计其谐波噪声比。在不同元音、不同发声方式以及不同病变条件下给出了实验结果,并与传统的谐波噪声比计算方法进行综合比较。分别证实了在正常和病变条件下更客观、准确地判断嘶音程度和喉病的适应方法。对发声基础研究、语言声学和喉病检测等领域有重要意义。  相似文献   

6.
李皓  唐朝京 《声学学报》2012,37(3):339-345
为实现鲁棒的声韵母切分,以满足大词汇量连续语音识别系统的需求,提出一种建立损失函数,并利用浊音的“准”周期性和声母时长进行声韵母切分的方法。首先计算语音的自相关函数,接着建立代价损失函数,对计算结果采用动态规划方法检测浊音,然后根据声母时长分布规律确定声母的检测范围,最后在检测范围内对浊音段起始点前后采用听觉事件检测方法分割出声韵母。实验结果表明,采用动态规划方法相对于阈值方法提高了浊音段的检测性能,在浊音段的基础上对声韵母进行切分能够提高切分的正确率,减少噪声及汉语音变现象的影响,切分性能受声母发音方式影响较小。   相似文献   

7.
感知线性预测在水下目标分类中的应用研究   总被引:5,自引:0,他引:5  
提出了基于感知线性预测(PLP)的模仿人耳听觉特性来提取水声信号鲁棒特征的方法。运用听觉心理学的三个概念: (1)临界带谱分析、(2)等响度曲线、(3)强度响度听觉幂率,形成估计听觉谱的方法,可获得一个12阶全极点模型的鲁棒特征矢量。运用这一特征矢量进行训练和识别的实验结果表明:(1)在不同的频率段内,人耳对6类目标辐射噪声信号敏感程度是不同的。(2)提取的基于听觉感知水下目标特征具有鲁棒性。(3)通过此方法提取的特征维数较低,运算速度快,识别的正确率比以往有所提高。  相似文献   

8.
基于机载红外搜索跟踪(IRST)系统噪声源的特点,建立了一种探测概率模型,分析了探测过程中阈噪比对探测概率、虚警概率的影响。基于作战背景,分析了探测距离、目标速度、作战环境对探测概率的影响。在探测过程中动态设置阈噪比,给出了具体求解方法,并对不同任务需求下的阈噪比取值范围进行仿真分析。利用求解的阈噪比值,仿真得出了变阈噪比情形下的探测概率包线。研究结果表明,与恒定阈噪比方法相比,采用变阈噪比求得的概率包线能够适应实时变化的空战条件,普适性更强,探测距离明显增大。  相似文献   

9.
一般使用信噪比的提高评估带噪语音的增强算法效果,本文提出一种新的信噪比计算方法——高频信噪比(SNRH),即用经过高通滤波的各信号计算信噪比。实验表明新方法比传统方法能更准确地评估语音增强算法的好坏和有效范围,是更合理的语音增强算法度量。  相似文献   

10.
听觉模型已应用于语音信号处理的许多方面,并已取得了较好的结果。论文根据目前应用较为广泛的听觉模型,提出了模型中各部分的逆变换。首先通过对自相关图谱逆变换并经迭代运算重构相位信息恢复神经发放率,再经半波整流逆运算恢复负半部分信号,最后对描述内毛细胞、突触模型的相关方程进行逆运算和Gammatone逆滤波,构成整个听觉模型的反演方法。作为应用论文提出了一种基于听觉模型逆变换的含噪语音增强方法。实验结果表明,该方法对含噪语音有很好的降噪效果,特别是当信噪比较低时,该方法较常用的方法更为有效。论文提出的听觉模型逆变换方法可应用于语音增强等领域。  相似文献   

11.
This study examines the neural representation of the vowel /epsilon/ in the auditory nerve of acoustically traumatized cats and asks whether spectral modifications of the vowel can restore a normal neural representation. Four variants of /epsilon/, which differed primarily in the frequency of the second formant (F2), were used as stimuli. Normally, the rate-place code provides a robust representation of F2 for these vowels, in the sense that rate changes encode changes in F2 frequency [Conley and Keilson, J. Acoust. Soc. Am. 98, 3223 (1995)]. This representation is lost after acoustic trauma [Miller et al., J. Acoust. Soc. Am. 105, 311 (1999)]. Here it is shown that an improved representation of the F2 frequency can be gained by a form of high-frequency emphasis that is determined by both the hearing-loss profile and the spectral envelope of the vowel. Essentially, the vowel was high-pass filtered so that the F2 and F3 peaks were amplified without amplifying frequencies in the trough between F1 and F2. This modification improved the quality of the rate and temporal tonotopic representations of the vowel and restored sensitivity to the F2 frequency. Although a completely normal representation was not restored, this method shows promise as an approach to hearing-aid signal processing.  相似文献   

12.
Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.  相似文献   

13.
Responses of "high-spontaneous" single auditory-nerve fibers in anesthetized cat to nine different spoken stop and nasal consonant-vowel syllables presented in four different levels of speech-shaped noise are reported. The temporal information contained in the responses was analyzed using "composite" spectrograms and pseudo-3D spatial-frequency plots. Spectral characteristics of both consonant and vowel segments of the CV syllables were strongly encoded at S/N ratios of 30 and 20 dB. At S/N = 10 dB, formant information during the vowel segments was all that was reliably detectable in most cases. Even at S/N = 0 dB, most vowel formants were detectable, but only with relatively long analysis windows (40 ms). The increases (and decreases) in discharge rate during various phases of the responses were also determined. The rate responses to the "release" and to the voicing of the stop-consonant syllables were quite robust, being detectable at least half of the time, even at the highest noise level. Comparisons with psychoacoustic studies using similar stimuli are made.  相似文献   

14.
15.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.  相似文献   

16.
A geometrical method for computing overlap between vowel distributions, the spectral overlap assessment metric (SOAM), is applied to an investigation of spectral (F1, F2) and temporal (duration) relations in three different types of systems: one claimed to exhibit primary quality (American English), one primary quantity (Jamaican Creole), and one about which no claims have been made (Jamaican English). Shapes, orientations, and proximities of pairs of vowel distributions involved in phonological oppositions are modeled using best-fit ellipses (in F1 x F2 space) and ellipsoids (F1 x F2 x duration). Overlap fractions computed for each pair suggest that spectral and temporal features interact differently in the three varieties and oppositions. Under a two-dimensional analysis, two of three American English oppositions show no overlap; the third shows partial overlap. All Jamaican Creole oppositions exhibit complete overlap when F1 and F2 alone are modeled, but no or partial overlap with incorporation of a factor for duration. Jamaican English three-dimensional overlap fractions resemble two-dimensional results for American English. A multidimensional analysis tool such as SOAM appears to provide a more objective basis for simultaneously investigating spectral and temporal relations within vowel systems. Normalization methods and the SOAM method are described in an extended appendix.  相似文献   

17.
This study examined the temporal phasing of tongue and lip movements in vowel-consonant-vowel sequences where the consonant is a bilabial stop consonant /p, b/ and the vowels one of /i, a, u/; only asymmetrical vowel contexts were included in the analysis. Four subjects participated. Articulatory movements were recorded using a magnetometer system. The onset of the tongue movement from the first to the second vowel almost always occurred before the oral closure. Most of the tongue movement trajectory from the first to the second vowel took place during the oral closure for the stop. For all subjects, the onset of the tongue movement occurred earlier with respect to the onset of the lip closing movement as the tongue movement trajectory increased. The influence of consonant voicing and vowel context on interarticulator timing and tongue movement kinematics varied across subjects. Overall, the results are compatible with the hypothesis that there is a temporal window before the oral closure for the stop during which the tongue movement can start. A very early onset of the tongue movement relative to the stop closure together with an extensive movement before the closure would most likely produce an extra vowel sound before the closure.  相似文献   

18.
Several experiments have found that changing the intrinsic f0 of a vowel can have an effect on perceived vowel quality. It has been suggested that these shifts may occur because f0 is involved in the specification of vowel quality in the same way as the formant frequencies. Another possibility is that f0 affects vowel quality indirectly, by changing a listener's assumptions about characteristics of a speaker who is likely to have uttered the vowel. In the experiment outlined here, participants were asked to listen to vowels differing in terms of f0 and their formant frequencies and report vowel quality and the apparent speaker's gender and size on a trial-by-trial basis. The results presented here suggest that f0 affects vowel quality mainly indirectly via its effects on the apparent-speaker characteristics; however, f0 may also have some residual direct effects on vowel quality. Furthermore, the formant frequencies were also found to have significant indirect effects on vowel quality by way of their strong influence on the apparent speaker.  相似文献   

19.
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.  相似文献   

20.
An important speech cue is that of voice onset time (VOT), a cue for the perception of voicing and aspiration in word-initial stops. Preaspiration, an [h]-like sound between a vowel and the following stop, can be cued by voice offset time, a cue which in most respects mirrors VOT. In Icelandic VOffT is much more sensitive to the duration of the preceding vowel than is VOT to the duration of the following vowel. This has been explained by noting that preaspiration can only follow a phonemically short vowel. Lengthening of the vowel, either by changing its duration or by moving the spectrum towards that appropriate for a long vowel, will thus demand a longer VOffT to cue preaspiration. An experiment is reported showing that this greater effect that vowel quantity has on the perception of VOffT than on the perception of VOT cannot be explained by the effect of F1 frequency at vowel offset.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号