首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 390 毫秒
1.
本讨论了引入人耳听觉特性的迭代维纳滤波在语音分离中的应用,即用矢量量化形成的码本反映目标话的语音特征,通过计算滤波结果与这一特征的匹配度来模拟人耳在“鸡尾酒会效应”中的注意力机制。实验结果表明这一方法有很好的效果。  相似文献   

2.
本文根据厅堂内调制转移函数的一些有用特性和不同信噪比条件下汉语清晰度试验获得的结果初步讨论了汉语可懂度和语言传输指数的关系。  相似文献   

3.
文中讨论了汉语单词的结构规律及其对清晰度得分的影响.试验表明,清晰度得分不仅与听音人收到的言语信号的物理特性——外部信息有关,而且强烈地依赖于语言本身的结构规律——内部信息. 本文提出了一种新的统计关系,它更好地符合大量试验的数据.  相似文献   

4.
汉语普通话调制函数的实验研究   总被引:1,自引:0,他引:1       下载免费PDF全文
《应用声学》1989,8(5):11-16
通过测量传输通路的调制转移函数而导出语言传输指数可以客观评价厅堂或扩声系统的语言清晰度。本文介绍了测得的汉语普通话的调制转移函数,比较了语言传输指数和快速语言传输指数的相关性并绘出汉语普通话音节清晰度和语言传输指数的关系曲线。  相似文献   

5.
汉语语音合成系统评价方法   总被引:1,自引:0,他引:1  
从1994年开始,对汉语语音合成系统的工作性能定期举行全国评测.采用语言清晰度测试方法,1994年对五个不同的合成系统进行了评测和诊断.听音人为16名大学生(男8,女8),对合成言语没有经验.听音人响应是开放的听音记录.同时,还采用十点主观评价(MOS)测定言语自然度.为给出各合成系统音段层的诊断信息,对合成语音的辅音知觉混淆矩阵进行了分析.借助于对比自然言语和合成言语在不同语言层次上清晰度试验得分间的统计关系,来考察合成系统韵律特征处理的缺陷.结果表明,采用上述方法可得到评测合成系统工作性能的稳定合理的指标.有关韵律特征的评价方法有待于进一步发展.  相似文献   

6.
张家■:中国科学院声学所研究 员,中国声学学会常务理事,语 言、听觉和音乐声学分会主任, 《应用声学》副主编。主要从事 言语科学和言语技术研究。设计 了汉语普通话清晰度试验方法和建立了汉语可懂度理论基础;导出了汉语清晰度指数;建立了不同语言单位清晰度试验得分之间的统计关系,并且证明了汉语音节结构有助于提高可懂度。在不同语速、不同声级下测得了远场和近场的语言长时平均频谱。定量地证明了汉语声调对提高可懂度的作用。揭示汉语元音的内在音高规律并实验研究语音产生中的相互作用。曾获国家自然科学三等奖,中国科学院…  相似文献   

7.
借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。   相似文献   

8.
借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。  相似文献   

9.
以前有关室内声场中语言清晰度问题的研究,都仅涉及混响时间“T”对语言清晰度“S”的变化关系,至于同一个“T”而不同声学比“R”的影响考虑很少。本文专为对各个不同T的条件下,“R”对“S”的影响进行定量研究。得到清晰度S与混响时间T乘声学比R的归一化实验曲线,并用声学比-混响系数K_(rR)的形式表示为K_(rR)=1.0-0.22lg(TR)或S%=98.7-21.7lg(TR)。这个结果说明:混响场中的语言清晰度S与(TR)乘积的对数成反比。令人惊奇的是,这个实验所得的物理律的形式类似于Weber-Fechner定律,但要比它复杂得多。  相似文献   

10.
在低背景噪声的野外环境中,采用小闭集汉语(声母)清晰度测试方法,试验比较了四种防毒面具的清晰度水平。测试结果证实:与不佩戴面具相比,佩戴面具后语言清晰度得分严重降低,并随通话距离的增加而进一步恶化;以75%清晰度得分作为通话性能的可接受限度,那么,不佩戴面具及佩戴四种面具的有效通话距离分别为63.6、15.7、18.6、25.0和26.9m。此外,结合对四种面具传声特性测定结果,本文还分析了清晰度测试方法及其结果的合理性。  相似文献   

11.
When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions.  相似文献   

12.
When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments.  相似文献   

13.
Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task.  相似文献   

14.
The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.  相似文献   

15.
The speech intelligibility in classroom can be influenced by background-noise levels, speech sound pressure level (SSPL), reverberation time and signal-to-noise ratio (SNR). The relationship between SSPL and subjective Chinese Mandarin speech intelligibility and the effect of different SNRs on Chinese Mandarin speech intelligibility in the simulated classroom were investigated through room acoustical simulation, auralisation technique and subjective evaluation. Chinese speech intelligibility test signals recorded in anechoic chamber were convolved with the simulated binaural room impulse responses, and then reproduced through the headphone by different SSPLs and SNRs. The results show that Chinese Mandarin speech intelligibility scores increase with increasing of SSPLs and SNRs within a certain range in simulated classrooms. Chinese Mandarin speech intelligibility scores have no significant difference with SNRs of no less than 15 dBA under the same reverberation time condition.  相似文献   

16.
Subjective speech intelligibility can be assessed by speech recorded in an anechoic chamber and then convolved with room impulse responses that can be created by acoustic simulation. The speech intelligibility (SI) assessment based on auralization was validated in three rooms. The articulation scores obtained from simulated sound field were compared with the ones from measured sound field and from direct listening in rooms. Results show that the speech intelligibility prediction based on auralization technique with simulated binaural room impulse responses (BRIRs) is in agreement with reality and results from measured BRIRs. When this technique is used with simulated and measured monaural room impulse responses (MRIRs), the predicted results underestimate the reality. It has been shown that auralization technique with simulated BRIRs is capable of assessing subjective speech intelligibility of listening positions in the room.  相似文献   

17.
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.  相似文献   

18.
The role of transient speech components on speech intelligibility was investigated. Speech was decomposed into two components--quasi-steady-state (QSS) and transient--using a set of time-varying filters whose center frequencies and bandwidths were controlled to identify the strongest formant components in speech. The relative energy and intelligibility of the QSS and transient components were compared to original speech. Most of the speech energy was in the QSS component, but this component had low intelligibility. The transient component had much lower energy but was almost as intelligible as the original speech, suggesting that the transient component included speech elements important to speech perception. A modified version of speech was produced by amplifying the transient component and recombining it with the original speech. The intelligibility of the modified speech in background noise was compared to that of the original speech, using a psychoacoustic procedure based on the modified rhyme protocol. Word recognition rates for the modified speech were significantly higher at low signal-to-noise ratios (SNRs), with minimal effect on intelligibility at higher SNRs. These results suggest that amplification of transient information may improve the intelligibility of speech in noise and that this improvement is more effective in severe noise conditions.  相似文献   

19.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners.  相似文献   

20.
The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was derived from the speech intelligibility results and the efficiency of the ERs was determined as the ratio of the useful ER energy to the total ER energy. Even though ER energy contributed to speech intelligibility, DS energy was always more efficient, leading to better speech intelligibility for both groups of listeners. The efficiency loss for the ERs was mainly ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号