首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 922 毫秒
1.
单通道语音增强算法对汉语语音可懂度影响的研究   总被引:1,自引:0,他引:1  
杨琳  张建平  颜永红 《声学学报》2010,35(2):248-253
考察了当前常用的几种单通道语音增强算法对汉语语音可懂度的影响。受不同类型噪音干扰的语音经过5种单通道语音增强算法的处理后,播放给具有正常听力水平的被试进行听辩,考察增强后语音的可懂度。实验结果表明,语音增强算法并不能改进语音的可懂度水平;通过分析具体的错误原因,发现听辩错误主要来自于音素错误,与声调关系不大;而且,同英文的辨识结果相比,一些增强算法对于中、英文可懂度影响差异显著。   相似文献   

2.
Quantifying the intelligibility of speech in noise for non-native listeners   总被引:3,自引:0,他引:3  
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.  相似文献   

3.
Internal noise generated by hearing-aid circuits can be audible and objectionable to aid users, and may lead to the rejection of hearing aids. Two expansion algorithms were developed to suppress internal noise below a threshold level. The multiple-channel algorithm's expansion thresholds followed the 55-dB SPL long-term average speech spectrum, while the single-channel algorithm suppressed sounds below 45 dBA. With the recommended settings in static conditions, the single-channel algorithm provided lower noise levels, which were perceived as quieter by most normal-hearing participants. However, in dynamic conditions "pumping" noises were more noticeable with the single-channel algorithm. For impaired-hearing listeners fitted with the ADRO amplification strategy, both algorithms maintained speech understanding for words in sentences presented at 55 dB SPL in quiet (99.3% correct). Mean sentence reception thresholds in quiet were 39.4, 40.7, and 41.8 dB SPL without noise suppression, and with the single- and multiple-channel algorithms, respectively. The increase in the sentence reception threshold was statistically significant for the multiple-channel algorithm, but not the single-channel algorithm. Thus, both algorithms suppressed noise without affecting the intelligibility of speech presented at 55 dB SPL, with the single-channel algorithm providing marginally greater noise suppression in static conditions, and the multiple-channel algorithm avoiding pumping noises.  相似文献   

4.
Most noise-reduction algorithms used in hearing aids apply a gain to the noisy envelopes to reduce noise interference. The present study assesses the impact of two types of speech distortion introduced by noise-suppressive gain functions: amplification distortion occurring when the amplitude of the target signal is over-estimated, and attenuation distortion occurring when the target amplitude is under-estimated. Sentences corrupted by steady noise and competing talker were processed through a noise-reduction algorithm and synthesized to contain either amplification distortion, attenuation distortion or both. The attenuation distortion was found to have a minimal effect on speech intelligibility. In fact, substantial improvements (>80 percentage points) in intelligibility, relative to noise-corrupted speech, were obtained when the processed sentences contained only attenuation distortion. When the amplification distortion was limited to be smaller than 6 dB, performance was nearly unaffected in the steady-noise conditions, but was severely degraded in the competing-talker conditions. Overall, the present data suggest that one reason that existing algorithms do not improve speech intelligibility is because they allow amplification distortions in excess of 6 dB. These distortions are shown in this study to be always associated with masker-dominated envelopes and should thus be eliminated.  相似文献   

5.
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.  相似文献   

6.
The interlanguage speech intelligibility benefit   总被引:1,自引:0,他引:1  
This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication.  相似文献   

7.
In this paper, two speech enhancement algorithms (SEAs) based on spectral subtraction (SS) principle have been evaluated for bilateral cochlear implant (BCI) users. Specifically, dual-channel noise power spectral estimation algorithm using power spectral densities (PSD) and cross power spectral density (CPSD) of the observed signals was studied. The enhanced speech signals were obtained using either Dual Channel Non Linear Spectral Subtraction ‘DC-NLSS’ or Dual-Channel Multi-Band Spectral Subtraction ‘DC-MBSS’ algorithms. For performance evaluation, some objective speech assessment tests relying on Perceptual Evaluation of Speech Quality (PESQ) score and speech Itakura-Saito (IS) distortion measurement were performed to fix the optimal number of frequency band needed in DC-MBSS algorithm. In order to evaluate the speech intelligibility, subjective listening tests were assessed with 50 normal hearing listeners using a specific BCI simulator and with three deafened BCI patients. Experimental results, obtained using French Lafon database corrupted by an additive babble noise at different Signal-to-Noise Ratios (SNR), showed that DC-MBSS algorithm improves speech understanding better than DC-NLSS algorithm for single and multiple interfering noise sources.  相似文献   

8.
Speech intelligibility in classrooms affects the learning efficiency of students directly, especially for the students who are using a second language. The speech intelligibility value is determined by many factors such as speech level, signal to noise ratio, and reverberation time in the rooms. This paper investigates the contributions of these factors with subjective tests, especially speech level, which is required for designing the optimal gain for sound amplification systems in classrooms. The test material was generated by mixing the convolution output of the English Coordinate Response Measure corpus and the room impulse responses with the background noise. The subjects are all Chinese students who use English as a second language. It is found that the speech intelligibility increases first and then decreases with the increase of speech level, and the optimal English speech level is about 71 dBA in classrooms for Chinese listeners when the signal to noise ratio and the reverberation time keep constant. Finally, a regression equation is proposed to predict the speech intelligibility based on speech level, signal to noise ratio, and reverberation time.  相似文献   

9.
Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.  相似文献   

10.
In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners.  相似文献   

11.
Studies of speech perception in various types of background noise have shown that noise with linguistic content affects listeners differently than nonlinguistic noise [e.g., Simpson, S. A., and Cooke, M. (2005). "Consonant identification in N-talker babble is a nonmonotonic function of N," J. Acoust. Soc. Am. 118, 2775-2778; Sperry, J. L., Wiley, T. L., and Chial, M. R. (1997). "Word recognition performance in various background competitors," J. Am. Acad. Audiol. 8, 71-80] but few studies of multi-talker babble have employed background babble in languages other than the target speech language. To determine whether the adverse effect of background speech is due to the linguistic content or to the acoustic characteristics of the speech masker, this study assessed speech-in-noise recognition when the language of the background noise was either the same or different from the language of the target speech. Replicating previous findings, results showed poorer English sentence recognition by native English listeners in six-talker babble than in two-talker babble, regardless of the language of the babble. In addition, our results showed that in two-talker babble, native English listeners were more adversely affected by English babble than by Mandarin Chinese babble. These findings demonstrate informational masking on sentence-in-noise recognition in the form of "linguistic interference." Whether this interference is at the lexical, sublexical, and/or prosodic levels of linguistic structure and whether it is modulated by the phonetic similarity between the target and noise languages remains to be determined.  相似文献   

12.
Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate in cases where noisy speech is processed by a time-frequency weighting. To this end, an extensive evaluation is presented of objective measure for intelligibility prediction of noisy speech processed with a technique called ideal time frequency (TF) segregation. In total 17 measures are evaluated, including four advanced speech-intelligibility measures (CSII, CSTI, NSEC, DAU), the advanced speech-quality measure (PESQ), and several frame-based measures (e.g., SSNR). Furthermore, several additional measures are proposed. The study comprised a total number of 168 different TF-weightings, including unprocessed noisy speech. Out of all measures, the proposed frame-based measure MCC gave the best results (ρ?=?0.93). An additional experiment shows that the good performing measures in this study also show high correlation with the intelligibility of single-channel noise reduced speech.  相似文献   

13.
The goal of cross-language voice conversion is to preserve the speech characteristics of one speaker when that speaker's speech is translated and used to synthesize speech in another language. In this paper, two preliminary studies, i.e., a statistical analysis of spectrum differences in different languages and the first attempt at a cross-language voice conversion, are reported. Speech uttered by a bilingual speaker is analyzed to examine spectrum difference between English and Japanese. Experimental results are (1) the codebook size for mixed speech from English and Japanese should be almost twice the codebook size of either English or Japanese; (2) although many code vectors occurred in both English and Japanese, some have a tendency to predominate in one language or the other; (3) code vectors that predominantly occurred in English are contained in the phonemes /r/, /ae/, /f/, /s/, and code vectors that predominantly occurred in Japanese are contained in /i/, /u/, /N/; and (4) judged from listening tests, listeners cannot reliably indicate the distinction between English speech decoded by a Japanese codebook and English speech decoded by an English codebook. A voice conversion algorithm based on codebook mapping was applied to cross-language voice conversion, and its performance was somewhat less effective than for voice conversion in the same language.  相似文献   

14.
Due to the limited number of cochlear implantees speaking Mandarin Chinese, it is extremely difficult to evaluate new speech coding algorithms designed for tonal languages. Access to an intelligibility index that could reliably predict the intelligibility of vocoded (and non-vocoded) Mandarin Chinese is a viable solution to address this challenge. The speech-transmission index (STI) and coherence-based intelligibility measures, among others, have been examined extensively for predicting the intelligibility of English speech but have not been evaluated for vocoded or wideband (non-vocoded) Mandarin speech despite the perceptual differences between the two languages. The results indicated that the coherence-based measures seem to be influenced by the characteristics of the spoken language. The highest correlation (r = 0.91-0.97) was obtained in Mandarin Chinese with a weighted coherence measure that included primarily information from high-intensity voiced segments (e.g., vowels) containing F0 information, known to be important for lexical tone recognition. In contrast, in English, highest correlation was obtained with a coherence measure that included information from weak consonants and vowel/consonant transitions. A band-importance function was proposed that captured information about the amplitude envelope contour. A higher modulation rate (100 Hz) was found necessary for the STI-based measures for maximum correlation (r = 0.94-0.96) with vocoded Mandarin and English recognition.  相似文献   

15.
This article reports on the performance of an adaptive subband noise cancellation scheme, which performs binaural preprocessing of speech signals for a hearing-aid application. The multi-microphone subband adaptive (MMSBA) signal processing scheme uses the least mean squares (LMS) algorithm in frequency-limited subbands. The use of subbands enables a diverse processing mechanism to be employed, splitting the two-channel wide-band signal into smaller frequency-limited subbands, which can be processed according to their individual signal characteristics. The frequency delimiting used a linear- or cochlear-spaced subband distribution. The effect of the processing scheme on speech intelligibility was assessed in a trial involving 15 hearing-impaired volunteers with moderate sensorineural hearing loss. The acoustic material consisted of speech and speech-shaped noise signals, generated using simulated and real-room acoustic environments, at signal-to-noise ratios (SNRs) in the range -6 to +3 dB. The results show that the MMSBA scheme delivered average speech intelligibility improvements of 11.5%, with a maximum of 37.25%, in noisy reverberant conditions. There was no significant reduction in mean speech intelligibility due to processing, in any of the test conditions.  相似文献   

16.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。  相似文献   

17.
Previous work has shown that the intelligibility of speech in noise is degraded if the speaker and listener differ in accent, in particular when there is a disparity between native (L1) and nonnative (L2) accents. This study investigated how this talker-listener interaction is modulated by L2 experience and accent similarity. L1 Southern British English, L1 French listeners with varying L2 English experience, and French-English bilinguals were tested on the recognition of English sentences mixed in speech-shaped noise that was spoken with a range of accents (French, Korean, Northern Irish, and Southern British English). The results demonstrated clear interactions of accent and experience, with the least experienced French speakers being most accurate with French-accented English, but more experienced listeners being most accurate with L1 Southern British English accents. An acoustic similarity metric was applied to the speech productions of the talkers and the listeners, and significant correlations were obtained between accent similarity and sentence intelligibility for pairs of individuals. Overall, the results suggest that L2 experience affects talker-listener accent interactions, altering both the intelligibility of different accents and the selectivity of accent processing.  相似文献   

18.
Recent research results show that combined electric and acoustic stimulation (EAS) significantly improves speech recognition in noise, and it is generally established that access to the improved F0 representation of target speech, along with the glimpse cues, provide the EAS benefits. Under noisy listening conditions, noise signals degrade these important cues by introducing undesired temporal-frequency components and corrupting harmonics structure. In this study, the potential of combining noise reduction and harmonics regeneration techniques was investigated to further improve speech intelligibility in noise by providing improved beneficial cues for EAS. Three hypotheses were tested: (1) noise reduction methods can improve speech intelligibility in noise for EAS; (2) harmonics regeneration after noise reduction can further improve speech intelligibility in noise for EAS; and (3) harmonics sideband constraints in frequency domain (or equivalently, amplitude modulation in temporal domain), even deterministic ones, can provide additional benefits. Test results demonstrate that combining noise reduction and harmonics regeneration can significantly improve speech recognition in noise for EAS, and it is also beneficial to preserve the harmonics sidebands under adverse listening conditions. This finding warrants further work into the development of algorithms that regenerate harmonics and the related sidebands for EAS processing under noisy conditions.  相似文献   

19.
路成  田猛  周健  王华彬  陶亮 《声学学报》2017,42(3):377-384
为了刻画语音信号帧间相关性和使用更少的语音基表示语音特征,提出一种采用L1/2稀疏约束的卷积非负矩阵分解方法进行单通道语音增强。首先,进行噪声学习得到噪声基;然后,以噪声基为先验信息结合L1/2稀疏约束卷积非负矩阵分解方法学习含噪语音中的语音基成分;最后,利用学习到的语音基和系数重建出干净语音信号。在不同噪声环境下进行的实验结果表明,本文方法优于采用L1稀疏约束的卷积非负矩阵方法及传统的统计语音增强方法。   相似文献   

20.
Intonation perception of English speech was examined for English- and Chinese-native listeners. F0 contour was manipulated from falling to rising patterns for the final words of three sentences. Listener's task was to identify and discriminate the intonation of each sentence (question versus statement). English and Chinese listeners had significant differences in the identification functions such as the categorical boundary and the slope. In the discrimination functions, Chinese listeners showed greater peakedness than English peers. The cross-linguistic differences in intonation perception were similar to the previous findings in perception of lexical tones, likely due to listeners' language background differences.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号