期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

从1994年开始，对汉语语音合成系统的工作性能定期举行全国评测．采用语言清晰度测试方法，1994年对五个不同的合成系统进行了评测和诊断．听音人为16名大学生（男8，女8），对合成言语没有经验．听音人响应是开放的听音记录．同时，还采用十点主观评价（MOS）测定言语自然度．为给出各合成系统音段层的诊断信息，对合成语音的辅音知觉混淆矩阵进行了分析．借助于对比自然言语和合成言语在不同语言层次上清晰度试验得分间的统计关系，来考察合成系统韵律特征处理的缺陷．结果表明，采用上述方法可得到评测合成系统工作性能的稳定合理的指标．有关韵律特征的评价方法有待于进一步发展．相似文献

6.

中国声学学会会士简介

《声学学报》2001,(3)

张家■：中国科学院声学所研究员,中国声学学会常务理事,语言、听觉和音乐声学分会主任, 《应用声学》副主编。主要从事言语科学和言语技术研究。设计了汉语普通话清晰度试验方法和建立了汉语可懂度理论基础;导出了汉语清晰度指数;建立了不同语言单位清晰度试验得分之间的统计关系,并且证明了汉语音节结构有助于提高可懂度。在不同语速、不同声级下测得了远场和近场的语言长时平均频谱。定量地证明了汉语声调对提高可懂度的作用。揭示汉语元音的内在音高规律并实验研究语音产生中的相互作用。曾获国家自然科学三等奖,中国科学院… 相似文献

7.

声学头模对语言清晰度测量的影响

下载免费PDF全文

章斯宇郑晓林孟子厚《声学学报》2015,40(6):894-901

借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。相似文献

8.

声学头模对语言清晰度测量的影响

《声学学报：英文版》2015,(6)

借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。相似文献

9.

关于声学比—混响时间—语言清晰度关系的实验与理论计算

饶宇安《声学学报》1981,(1)

以前有关室内声场中语言清晰度问题的研究,都仅涉及混响时间“T”对语言清晰度“S”的变化关系,至于同一个“T”而不同声学比“R”的影响考虑很少。本文专为对各个不同T的条件下,“R”对“S”的影响进行定量研究。得到清晰度S与混响时间T乘声学比R的归一化实验曲线,并用声学比-混响系数K_(rR)的形式表示为K_(rR)=1.0-0.22lg(TR)或S％=98.7-21.7lg(TR)。这个结果说明:混响场中的语言清晰度S与(TR)乘积的对数成反比。令人惊奇的是,这个实验所得的物理律的形式类似于Weber-Fechner定律,但要比它复杂得多。相似文献

10.

防毒面具语言清晰度试验研究

下载免费PDF全文

丁松涛刘辉仁李小银袁晓华朱贤森《应用声学》2000,19(6):4-8

在低背景噪声的野外环境中,采用小闭集汉语（声母）清晰度测试方法,试验比较了四种防毒面具的清晰度水平。测试结果证实：与不佩戴面具相比,佩戴面具后语言清晰度得分严重降低,并随通话距离的增加而进一步恶化;以７５％清晰度得分作为通话性能的可接受限度,那么,不佩戴面具及佩戴四种面具的有效通话距离分别为６３．６、１５．７、１８．６、２５．０和２６．９ｍ。此外,结合对四种面具传声特性测定结果,本文还分析了清晰度测试方法及其结果的合理性。相似文献

11.

Improvement of intelligibility of ideal binary-masked noisy speech by adding background noise

Cao S Li L Wu X 《The Journal of the Acoustical Society of America》2011,129(4):2227-2236

When a target-speech/masker mixture is processed with the signal-separation technique, ideal binary mask (IBM), intelligibility of target speech is remarkably improved in both normal-hearing listeners and hearing-impaired listeners. Intelligibility of speech can also be improved by filling in speech gaps with un-modulated broadband noise. This study investigated whether intelligibility of target speech in the IBM-treated target-speech/masker mixture can be further improved by adding a broadband-noise background. The results of this study show that following the IBM manipulation, which remarkably released target speech from speech-spectrum noise, foreign-speech, or native-speech masking (experiment 1), adding a broadband-noise background with the signal-to-noise ratio no less than 4 dB significantly improved intelligibility of target speech when the masker was either noise (experiment 2) or speech (experiment 3). The results suggest that since adding the noise background shallows the areas of silence in the time-frequency domain of the IBM-treated target-speech/masker mixture, the abruption of transient changes in the mixture is smoothed and the perceived continuity of target-speech components becomes enhanced, leading to improved target-speech intelligibility. The findings are useful for advancing computational auditory scene analysis, hearing-aid/cochlear-implant designs, and understanding of speech perception under "cocktail-party" conditions. 相似文献

12.

Monaural speech segregation using synthetic speech signals

Brungart DS Iyer N Simpson BD 《The Journal of the Acoustical Society of America》2006,119(4):2327-2333

When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments. 相似文献

13.

Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task

Brungart DS Simpson BD Darwin CJ Arbogast TL Kidd G 《The Journal of the Acoustical Society of America》2005,117(1):292-304

Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task. 相似文献

14.

Quantifying the intelligibility of speech in noise for non-native talkers

van Wijngaarden SJ Steeneken HJ Houtgast T 《The Journal of the Acoustical Society of America》2002,112(6):3004-3013

The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures. 相似文献

15.

Chinese speech intelligibility at different speech sound pressure levels and signal-to-noise ratios in simulated classrooms

Peng Jianxin 《Applied Acoustics》2010,71(4):386-390

The speech intelligibility in classroom can be influenced by background-noise levels, speech sound pressure level (SSPL), reverberation time and signal-to-noise ratio (SNR). The relationship between SSPL and subjective Chinese Mandarin speech intelligibility and the effect of different SNRs on Chinese Mandarin speech intelligibility in the simulated classroom were investigated through room acoustical simulation, auralisation technique and subjective evaluation. Chinese speech intelligibility test signals recorded in anechoic chamber were convolved with the simulated binaural room impulse responses, and then reproduced through the headphone by different SSPLs and SNRs. The results show that Chinese Mandarin speech intelligibility scores increase with increasing of SSPLs and SNRs within a certain range in simulated classrooms. Chinese Mandarin speech intelligibility scores have no significant difference with SNRs of no less than 15 dBA under the same reverberation time condition. 相似文献

16.

Feasibility of subjective speech intelligibility assessment based on auralization

Jianxin Peng 《Applied Acoustics》2005,66(5):591-601

Subjective speech intelligibility can be assessed by speech recorded in an anechoic chamber and then convolved with room impulse responses that can be created by acoustic simulation. The speech intelligibility (SI) assessment based on auralization was validated in three rooms. The articulation scores obtained from simulated sound field were compared with the ones from measured sound field and from direct listening in rooms. Results show that the speech intelligibility prediction based on auralization technique with simulated binaural room impulse responses (BRIRs) is in agreement with reality and results from measured BRIRs. When this technique is used with simulated and measured monaural room impulse responses (MRIRs), the predicted results underestimate the reality. It has been shown that auralization technique with simulated BRIRs is capable of assessing subjective speech intelligibility of listening positions in the room. 相似文献

17.

A detailed study on the effects of noise on speech intelligibility

Dubbelboer F Houtgast T 《The Journal of the Acoustical Society of America》2007,122(5):2865-2871

A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility. 相似文献

18.

Speech signal modification to increase intelligibility in noisy environments

Yoo SD Boston JR El-Jaroudi A Li CC Durrant JD Kovacyk K Shaiman S 《The Journal of the Acoustical Society of America》2007,122(2):1138-1149

The role of transient speech components on speech intelligibility was investigated. Speech was decomposed into two components--quasi-steady-state (QSS) and transient--using a set of time-varying filters whose center frequencies and bandwidths were controlled to identify the strongest formant components in speech. The relative energy and intelligibility of the QSS and transient components were compared to original speech. Most of the speech energy was in the QSS component, but this component had low intelligibility. The transient component had much lower energy but was almost as intelligible as the original speech, suggesting that the transient component included speech elements important to speech perception. A modified version of speech was produced by amplifying the transient component and recombining it with the original speech. The intelligibility of the modified speech in background noise was compared to that of the original speech, using a psychoacoustic procedure based on the modified rhyme protocol. Word recognition rates for the modified speech were significantly higher at low signal-to-noise ratios (SNRs), with minimal effect on intelligibility at higher SNRs. These results suggest that amplification of transient information may improve the intelligibility of speech in noise and that this improvement is more effective in severe noise conditions. 相似文献

19.

The effect of reduced vowel working space on speech intelligibility in Mandarin-speaking young adults with cerebral palsy

Liu HM Tsao FM Kuhl PK 《The Journal of the Acoustical Society of America》2005,117(6):3879-3889

The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners. 相似文献

20.

The influence of spectral characteristics of early reflections on speech intelligibility

Arweiler I Buchholz JM 《The Journal of the Acoustical Society of America》2011,130(2):996-1005

The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was derived from the speech intelligibility results and the efficiency of the ERs was determined as the ratio of the useful ER energy to the total ER energy. Even though ER energy contributed to speech intelligibility, DS energy was always more efficient, leading to better speech intelligibility for both groups of listeners. The efficiency loss for the ERs was mainly ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed. 相似文献