首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
单通道语音增强算法对汉语语音可懂度影响的研究   总被引:1,自引:0,他引:1  
杨琳  张建平  颜永红 《声学学报》2010,35(2):248-253
考察了当前常用的几种单通道语音增强算法对汉语语音可懂度的影响。受不同类型噪音干扰的语音经过5种单通道语音增强算法的处理后,播放给具有正常听力水平的被试进行听辩,考察增强后语音的可懂度。实验结果表明,语音增强算法并不能改进语音的可懂度水平;通过分析具体的错误原因,发现听辩错误主要来自于音素错误,与声调关系不大;而且,同英文的辨识结果相比,一些增强算法对于中、英文可懂度影响差异显著。   相似文献   

2.
Most noise-reduction algorithms used in hearing aids apply a gain to the noisy envelopes to reduce noise interference. The present study assesses the impact of two types of speech distortion introduced by noise-suppressive gain functions: amplification distortion occurring when the amplitude of the target signal is over-estimated, and attenuation distortion occurring when the target amplitude is under-estimated. Sentences corrupted by steady noise and competing talker were processed through a noise-reduction algorithm and synthesized to contain either amplification distortion, attenuation distortion or both. The attenuation distortion was found to have a minimal effect on speech intelligibility. In fact, substantial improvements (>80 percentage points) in intelligibility, relative to noise-corrupted speech, were obtained when the processed sentences contained only attenuation distortion. When the amplification distortion was limited to be smaller than 6 dB, performance was nearly unaffected in the steady-noise conditions, but was severely degraded in the competing-talker conditions. Overall, the present data suggest that one reason that existing algorithms do not improve speech intelligibility is because they allow amplification distortions in excess of 6 dB. These distortions are shown in this study to be always associated with masker-dominated envelopes and should thus be eliminated.  相似文献   

3.
This paper presents an amplitude-based single-channel direction finding system for automotive applications and compares its performance against two different phase-based single-channel direction finding algorithms in a complex reflecting environment (parking garage) at 2400 MHz. All three direction finding algorithms utilize a multi-element receiving antenna array placed at two locations on a vehicle. The received complex electric fields at each antenna element within the receiving antenna array are used as inputs to the three direction finding algorithms, resulting in a direction of arrival estimate. The results from this research provide insightful information on the performance of various direction finding algorithms as a function of complex reflecting environment, transmitter height, receiving antenna array location, number of receiving antenna elements and pass rate acceptance criteria.  相似文献   

4.
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.  相似文献   

5.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

6.
This paper presents a fringe pattern normalization and noise-reduction algorithm. Locally the background noise is suppressed, the modulation normalized and the noise smoothed. An expression to calculate the cosine-only term is formulated. It is related to the directional derivatives of the intensity fringes. Two-dimensional Fourier series are used to calculate the parameters needed for the algorithm. Experimental work is presented using diffraction and ESPI images. The programming is relatively simple and involves mainly local convolutions. The processing time using a 2 GHz computer to normalize an image of 256 × 256 pixels is approximately one second.  相似文献   

7.
蒋斌  匡正  吴鸣  杨军 《声学学报》2012,37(6):659-666
实验研究了帧长对汉语音段反转言语可懂度的影响。实验结果表明,帧长在64 ms以下,汉语音段反转言语具有较高的可懂度;帧长在64~203 ms之间,可懂度随帧长的增加逐渐降低;帧长在203 ms以上,可懂度为0。在帧长8 ms时,汉语的声调失真导致可懂度下降。原始语音信号和音段反转言语的调制谱的分析表明,调制谱失真大小和可懂度密切相关。因此,用原始语音信号和音段反转言语的窄带包络间的归一化相关值可以衡量调制谱失真大小,基于语音的语言传输指数法计算的客观值和实验结果显著相关(r=0.876,p<0.01)。研究表明,语言可懂度与窄带包络有关,音段反转言语的可懂度和保留原始语音信号的窄带包络密切相关。  相似文献   

8.
Using a "noise-vocoder" cochlear implant simulator [Shannon et al., Science 270, 303-304 (1995)], the effect of the speed of dynamic range compression on speech intelligibility was assessed, using normal-hearing subjects. The target speech had a level 5 dB above that of the competing speech. Initially, baseline performance was measured with no compression active, using between 4 and 16 processing channels. Then, performance was measured using a fast-acting compressor and a slow-acting compressor, each operating prior to the vocoder simulation. The fast system produced significant gain variation over syllabic timescales. The slow system produced significant gain variation only over the timescale of sentences. With no compression active, about six channels were necessary to achieve 50% correct identification of words in sentences. Sixteen channels produced near-maximum performance. Slow-acting compression produced no significant degradation relative to the baseline. However, fast-acting compression consistently reduced performance relative to that for the baseline, over a wide range of performance levels. It is suggested that fast-acting compression degrades performance for two reasons: (1) because it introduces correlated fluctuations in amplitude in different frequency bands, which tends to produce perceptual fusion of the target and background sounds and (2) because it reduces amplitude modulation depth and intensity contrasts.  相似文献   

9.
This paper examines the impact of room acoustic conditions on the speech intelligibility of four languages (English, Polish, Arabic and Mandarin). Listening test scores (diagnostic rhyme tests, phonemically balanced word tests and phonemically balanced sentence tests) of the four languages were compared under four room acoustic conditions defined by their speech transmission index (STI = 0.2, 0.4, 0.6 and 0.8). The results obtained indicated that there was a statistically significant difference between the word intelligibility scores of languages under all room acoustic conditions, apart from the STI = 0.8 condition. English was the most intelligible language under all conditions, and differences with other languages were larger when conditions were poor (maximum difference of 29% at STI = 0.2, 33% at STI = 0.4 and 14% at STI = 0.6). Results also showed that Arabic and Polish were particularly sensitive to background noise, and that Mandarin was significantly more intelligible than those languages at STI = 0.4. Consonant-to-vowel ratios and languages’ distinctive features and acoustical properties explained some of the scores obtained. Sentence intelligibility scores confirmed variations between languages, but these variations were statistically significant only at the STI = 0.4 condition (sentence tests being less sensitive to very good and very poor room acoustic conditions). Overall, the results indicate that large variations between the speech intelligibility of different languages can occur, especially for spaces that are expected to be challenging in terms of room acoustic conditions. Recommendations solely based on room acoustic parameters (e.g. STI) might then prove to be insufficient for designing a multilingual environment.  相似文献   

10.
Due to the limited number of cochlear implantees speaking Mandarin Chinese, it is extremely difficult to evaluate new speech coding algorithms designed for tonal languages. Access to an intelligibility index that could reliably predict the intelligibility of vocoded (and non-vocoded) Mandarin Chinese is a viable solution to address this challenge. The speech-transmission index (STI) and coherence-based intelligibility measures, among others, have been examined extensively for predicting the intelligibility of English speech but have not been evaluated for vocoded or wideband (non-vocoded) Mandarin speech despite the perceptual differences between the two languages. The results indicated that the coherence-based measures seem to be influenced by the characteristics of the spoken language. The highest correlation (r = 0.91-0.97) was obtained in Mandarin Chinese with a weighted coherence measure that included primarily information from high-intensity voiced segments (e.g., vowels) containing F0 information, known to be important for lexical tone recognition. In contrast, in English, highest correlation was obtained with a coherence measure that included information from weak consonants and vowel/consonant transitions. A band-importance function was proposed that captured information about the amplitude envelope contour. A higher modulation rate (100 Hz) was found necessary for the STI-based measures for maximum correlation (r = 0.94-0.96) with vocoded Mandarin and English recognition.  相似文献   

11.
初敏  吕士楠 《声学学报》1996,21(S1):639-647
以基音同步叠加技术为基础,以汉语单音节为合成单元,有一包含词调模式、重音模式和句调模式的韵律规则库的汉语文语转换系统,可合成出高清晰度和高自然度的汉语语音。研究表明,影响汉语合成语音的自然度的主要因素是音高和音强随时间的变化、各音节的音长分布以及音节间的协同发音,其中以音高和音长的影响最为显著。时域基音同步叠加技术提供了一种在时域改变语音波形的音高和音长的方法,从而使在用波形拼接法合成汉语时,进行词一级和句一级的韵律调节成为可能。对新闻广播语言的声学特征的分析,为建立汉语合成的韵律调节规则提供了理论依据。本文介绍新的汉语文语转换系统的结构及流程、对广播语言韵律特征的初步研究结果、汉语合成规则及合成系统语音质量的评测结果。  相似文献   

12.
We show that the freezing phenomenon,exhibited by a specific class of two-qubit state under local nondissipative decoherent evolutions,is a common feature of the relative entropy measure of quantum coherence and correlation.All those measurement outcomes,preserve a constant value in the considered noisy channels,but the condition,property and mechanism of the freezing phenomenon for quantum coherence are different from those of the quantum correlation.  相似文献   

13.
几种人眼波前像差重建算法的对比研究   总被引:1,自引:0,他引:1  
介绍了基于Hartmann-Shack传感器的Zernike模式法人眼波前像差重建模型和包括Gram-Schmidt正交变换?Householder变换和奇异值分解在内的3种算法的具体推导过程以及对比分析结论,并利用实验测量数据进行了仿真,结果表明3种算法精度相当,奇异值分解算法是较为理想的人眼波前像差重建算法。  相似文献   

14.
目的:探索随机振动和正弦振动因素下生成语音在听觉效果上的变化规律。方法:随机振动采用频率范围2-20Hz,加速度为0.3G、0.5G、0.7G(有效值,下同),正弦振动采用频率4、6、8、10、12Hz,加速度为0.3G、0.5G;在安静及信噪比分别为0dB和-6dB三种状态下对随机振动组、正弦振组及对照组3个组的语音材料进行清晰度测试。结果:和对照组相比,随机振动组,清晰度几科没有变化,正弦振动组,0.3G时4Hz、0.5G时6Hz和8Hz作用下语音清晰度有明显降低,检验结果非常显著。研究还发现,清晰度的降低随听音环境的信噪比的降低而变得严重;结论:正弦振动对发音人发音的影响,会使通话效果变差,并且在听音环境恶劣时尤为突出。  相似文献   

15.
This study investigated acoustic-phonetic correlates of intelligibility for adult and child talkers, and whether the relative intelligibility of different talkers was dependent on listener characteristics. In experiment 1, word intelligibility was measured for 45 talkers (18 women, 15 men, 6 boys, 6 girls) from a homogeneous accent group. The material consisted of 124 words familiar to 7-year-olds that adequately covered all frequent consonant confusions; stimuli were presented to 135 adult and child listeners in low-level background noise. Seven-to-eight-year-old listeners made significantly more errors than 12-year-olds or adults, but the relative intelligibility of individual talkers was highly consistent across groups. In experiment 2, listener ratings on a number of voice dimensions were obtained for the adults talkers identified in experiment 1 as having the highest and lowest intelligibility. Intelligibility was significantly correlated with subjective dimensions reflecting articulation, voice dynamics, and general quality. Finally, in experiment 3, measures of fundamental frequency, long-term average spectrum, word duration, consonant-vowel intensity ratio, and vowel space size were obtained for all talkers. Overall, word intelligibility was significantly correlated with the total energy in the 1- to 3-kHz region and word duration; these measures predicted 61% of the variability in intelligibility. The fact that the relative intelligibility of individual talkers was remarkably consistent across listener age groups suggests that the acoustic-phonetic characteristics of a talker's utterance are the primary factor in determining talker intelligibility. Although some acoustic-phonetic correlates of intelligibility were identified, variability in the profiles of the "best" talkers suggests that high intelligibility can be achieved through a combination of different acoustic-phonetic characteristics.  相似文献   

16.
CuFe2O4 nanocrystals were synthesized by the sol–gel method (SGM) and microwave method (MM) by using sucrose as a fuel. The structural, morphological, optical and magnetic properties of the products were determined and characterized in detail by X-ray diffraction (XRD), high resolution scanning electron microscopy (HR-SEM), photoluminescence (PL) spectroscopy and vibrating sample magnetometer (VSM). The XRD results confirmed the formation of cubic phase CuFe2O4. The formation of CuFe2O4 nano and microstructures were confirmed by HR-SEM. Photoluminescence emissions were determined by PL spectra, respectively. The relatively high saturation magnetization (78.22 emu/g) of CuFe2O4-MM shows that it is ferromagnetic and low saturation magnetization (35.98 emu/g) of CuFe2O4O-SGM confirms the super paramagnetic behavior.  相似文献   

17.
Speech produced in the presence of noise-Lombard speech-is more intelligible in noise than speech produced in quiet, but the origin of this advantage is poorly understood. Some of the benefit appears to arise from auditory factors such as energetic masking release, but a role for linguistic enhancements similar to those exhibited in clear speech is possible. The current study examined the effect of Lombard speech in noise and in quiet for Spanish learners of English. Non-native listeners showed a substantial benefit of Lombard speech in noise, although not quite as large as that displayed by native listeners tested on the same task in an earlier study [Lu and Cooke (2008), J. Acoust. Soc. Am. 124, 3261-3275]. The difference between the two groups is unlikely to be due to energetic masking. However, Lombard speech was less intelligible in quiet for non-native listeners than normal speech. The relatively small difference in Lombard benefit in noise for native and non-native listeners, along with the absence of Lombard benefit in quiet, suggests that any contribution of linguistic enhancements in the Lombard benefit for natives is small.  相似文献   

18.
19.
The word recognition ability of 4 normal-hearing and 13 cochlearly hearing-impaired listeners was evaluated. Filtered and unfiltered speech in quiet and in noise were presented monaurally through headphones. The noise varied over listening situations with regard to spectrum, level, and temporal envelope. Articulation index theory was applied to predict the results. Two calculation methods were used, both based on the ANSI S3.5-1969 20-band method [S3.5-1969 (American National Standards Institute, New York)]. Method I was almost identical to the ANSI method. Method II included a level- and hearing-loss-dependent calculation of masking of stationary and on-off gated noise signals and of self-masking of speech. Method II provided the best prediction capability, and it is concluded that speech intelligibility of cochlearly hearing-impaired listeners may also, to a first approximation, be predicted from articulation index theory.  相似文献   

20.
The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号