首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Frequency response characteristics were selected for 14 hearing-impaired ears, according to six procedures. Three procedures were based on MCL measurements with speech bands of three bandwidths (1/3 octave, 1 octave, and 1 2/3 octaves). The other procedures were based on hearing thresholds, pure-tone MCLs, and pure-tone LDLs. The procedures were evaluated by speech discrimination testing, using nonsense syllables in noise, and by paired comparison judgments of the intelligibility and pleasantness of running speech. Speech discrimination testing showed significant differences between pairs of responses for only seven test ears. Nasals and glides were most affected by frequency response variations. Both intelligibility and pleasantness judgments showed significant differences for all test ears. Intelligibility in noise was less affected by frequency response differences than was intelligibility in quiet or pleasantness in quiet or in noise. For some ears, the ranking of responses depended on whether intelligibility or pleasantness was being judged and on whether the speech was in quiet or in noise. Overall, the three speech band MCL procedures were far superior to the others. Thus the studies strongly support the frequency response selection rationale of amplifying all frequency bands of speech to MCL. They also highlight some of the complications involved in achieving this aim.  相似文献   

2.
3.
During the acoustical design of, e.g., auditoria or open-plan offices, it is important to know how speech can be perceived in various parts of the room. Different objective methods have been developed to measure and predict speech intelligibility, and these have been extensively used in various spaces. In this study, two such methods were compared, the speech transmission index (STI) and the speech intelligibility index (SII). Also the simplification of the STI, the room acoustics speech transmission index (RASTI), was considered. These quantities are all based on determining an apparent speech-to-noise ratio on selected frequency bands and summing them using a specific weighting. For comparison, some data were needed on the possible differences of these methods resulting from the calculation scheme and also measuring equipment. Their prediction accuracy was also of interest. Measurements were made in a laboratory having adjustable noise level and absorption, and in a real auditorium. It was found that the measurement equipment, especially the selection of the loudspeaker, can greatly affect the accuracy of the results. The prediction accuracy of the RASTI was found acceptable, if the input values for the prediction are accurately known, even though the studied space was not ideally diffuse.  相似文献   

4.
This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.  相似文献   

5.
The intelligibility of syllables whose cepstral trajectories were temporally filtered was measured. The speech signals were transformed to their LPC cepstral coefficients, and these coefficients were passed through different filters. These filtered trajectories were recombined with the residuals and the speech signal reconstructed. The intelligibility of the reconstructed speech segments was then measured in two perceptual experiments for Japanese syllables. The effect of various low-pass, high-pass, and bandpass filtering is reported, and the results summarized using a theoretical approach based on the independence of the contributions in different modulation bands. The overall results suggest that speech intelligibility is not severely impaired as long as the filtered spectral components have a rate of change between 1 and 16 Hz.  相似文献   

6.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

7.
A wavelet representation of speech was used to display the instantaneous amplitude and phase within 14 octave frequency bands, representing the envelope and the carrier within each band. Adding stationary noise alters the wavelet pattern, which can be understood as a combination of three simultaneously occurring subeffects: two effects on the wavelet levels (one systematic and one stochastic) and one effect on the wavelet phases. Specific types of signal processing were applied to speech, which allowed each effect to be either included or excluded. The impact of each effect (and of combinations) on speech intelligibility was measured with CVC's. It appeared that the systematic level effect (i.e., the increase of each speech wavelet intensity with the mean noise intensity) has the most degrading effect on speech intelligibility, which is in accordance with measures such as the modulation transfer function and the speech transmission index. However, also the introduction of stochastic level fluctuations and disturbance of the carrier phase seriously contribute to reduced intelligibility in noise. It is argued that these stochastic effects are responsible for the limited success of spectral subtraction as a means to improve speech intelligibility. Results can provide clues for effective noise suppression with respect to intelligibility.  相似文献   

8.
The normalized covariance measure (NCM) has been shown previously to predict reliably the intelligibility of noise-suppressed speech containing non-linear distortions. This study analyzes a simplified NCM measure that requires only a small number of bands (not necessarily contiguous) and uses simple binary (1 or 0) weighting functions. The rationale behind the use of a small number of bands is to account for the fact that the spectral information contained in contiguous or nearby bands is correlated and redundant. The modified NCM measure was evaluated with speech intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech corrupted by four different types of maskers (car, babble, train, and street interferences). High correlation (r = 0.8) was obtained with the modified NCM measure even when only one band was used. Further analysis revealed a masker-specific pattern of correlations when only one band was used, and bands with low correlation signified the corresponding envelopes that have been severely distorted by the noise-suppression algorithm and/or the masker. Correlation improved to r = 0.84 when only two disjoint bands (centered at 325 and 1874 Hz) were used. Even further improvements in correlation (r = 0.85) were obtained when three or four lower-frequency (<700 Hz) bands were selected.  相似文献   

9.
A number of objective evaluation methods are currently used to quantify the speech intelligibility in a built environment, including the speech transmission index (STI), rapid speech transmission index (RASTI), articulation index (AI), and the percent articulation loss of consonants (%ALCons). Certain software programs can quickly evaluate STI, RASTI, and %ALCons from a measured room impulse response. In this project, two impulse-response-based software packages (WinMLS and SIA-Smaart Acoustic Tools) were evaluated for their ability to determine intelligibility accurately. In four different spaces with background noise levels less than NC 45, speech intelligibility was measured via three methods: (1) with WinMLS 2000; (2) with SIA-Smaart Acoustic Tools (v4.0.2); and (3) from listening tests with humans. The study found that WinMLS measurements of speech intelligibility based on STI, RASTI, and %ALCons corresponded well with performance on the listening tests. SIA-Smaart results were correlated to human responses, but tended to under-predict intelligibility based on STI and RASTI, and over-predict intelligibility based on %ALCons.  相似文献   

10.
Sound attenuation of air due to climatic conditions is often assumed to be constant and/or negligible in the electro acoustic design of voice alarm (VA) systems. However, air attenuation variations can be significant in large underground spaces and particularly as the frequency increases to the mid to high frequencies which are the most relevant to speech intelligibility. This investigation evaluates and quantifies the impact of the variability of the most influential climatic parameters, air temperature and relative humidity, on the performance of VA systems in underground stations. Computer simulations were employed to predict the effect of varying these climatic parameters on key performance metrics. Results demonstrated a significant increase in the values of reverberation time parameters with both temperature and humidity, at frequencies critical for speech intelligibility. Consequently the values of speech intelligibility related metrics decreased with rising temperatures and humidity. Hence, the study shows how ignoring temperature and humidity effects can lead to calculation errors in the design of VA systems. These errors could cause over-specification of the absorption required of surface materials, and the inaccurate prediction of acoustic and speech intelligibility related parameters.  相似文献   

11.
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.  相似文献   

12.
We proposed and evaluated an estimation method for the forced selection speech intelligibility tests. Our proposal takes into account the forced selection manner of the Diagnostic Rhyme Test (DRT), which forces selection from a pair of rhyming words. A distance measure is calculated between the test word and the two candidate words, respectively, and the distance is compared to select the most likely word. We compared two distance measures. The first objective distance measure used here was based on the Articulation index Band Correlation (ABC). The ABC is the correlation of time–frequency (T–F) patterns between the test word and the template word speech of the two words in the candidate word pair. The word with the higher correlation was decided to be the likely candidate word. The T–F pattern was calculated in the Articulation Index (AI) bands, and the correlation was calculated between the corresponding bands of the test and candidate word sample. In order to estimate the intelligibility, we calculate the ratio of the number of bands in which higher correlation is seen for the correct word vs. the total number of bands (named ABC-est). This ratio quantifies how well the test word matches the correct word in the word pair. For the second objective distance, we used a measure based on the frequency-weighted segmental SNR (fwSNRseg). Segmental SNR (SNRseg) was calculated in AI bands, and compared among the candidate word templates. We then calculated the frequency-weighted ratio of the number of bands in which higher SNRseg was observed for the correct word vs. the total number of bands (named fwSNRseg-est), again to quantify how well the test word matches the selected candidate word in the pair. We estimated a logistic mapping function from the above two ratios to intelligibility scores using speech mixed with known noise. The mapping functions were then used to estimate the intelligibility of speech mixed with unknown noise. This estimation was compared to another measure that we previously evaluated, the conventional fwSNRseg, which directly maps the measure to intelligibility. Both proposed measures were proven to be significantly more accurate than conventional fwSNRseg. For most cases, the accuracy was comparable between the two proposed distance measures, ABC-est and fwSNRseg-est, with the latter showing correlation between the subjective and estimated intelligibility as high as 0.97, and root mean square as low as 0.11 for one of the test sets, but not as accurate for other sets. The ABC-est showed more stable accuracy for all sets. However, both measures show practical accuracies in all conditions tested. Thus, it should be possible to “screen” the intelligibility in many of the noise conditions to be tested, and cut down on the scale of the subjective test needed.  相似文献   

13.
提出采用正弦模型改善患者高频听觉的非线性降频方法。正弦模型语音分解得到的幅度、频率和相位是算法三个主要的处理参数。为了避免谱失真,将语音频谱按倍频程划分为6个部分。最接近并低于患者门限频率的部分,只做幅度放大处理。按照不同频段对于语音理解度的贡献程度,将患者门限频率以上的频率段压缩并转移到患者的可听频段,并将对应相位信息变为最接近的对应低频相位。在本研究中,10个受试者进行了语音理解度测试。测试结果显示,经过训练后,患者的平均理解率至少提高45%。下一步的研究应增加受试者数量,并增加对患者的听损情况的详细分析,从而设计出更合理,更细致的降频助听算法。   相似文献   

14.
Principal-component amplitude compression for the hearing impaired   总被引:1,自引:0,他引:1  
Principal-component amplitude compression, a means for matching speech to the reduced dynamic range in sensorineural hearing impairments, is a multiband approach aimed at preserving details of spectral shape while reducing overall level variation. The effect of compression has been studied for the first and second principal components (PC1 an PC2) of the short-term speech spectrum, which are roughly representative of overall level and spectral tilt, respectively. Compression of PC1 roughly equalizes consonant and vowel levels while compression of PC2 provides time-varying high-frequency emphasis. The effect on speech intelligibility of sensorineural hearing-impaired listeners of two principal-component compression system implementations, compression of PC1 and compression of both PC1 and PC2, was compared to that of linear amplification (LA), independent compression of multiple bands (MBC), and wideband compression (WC). Results indicate that compression of overall level as provided by compression of PC1 and WC improved intelligibility relative to LA over a 10- to 15-dB range of input levels. While MBC was beneficial in some cases, it did not provide higher intelligibility than WC. Compression of PC2 did not benefit but rather degraded performance relative to LA. Error analyses and band-level measurements indicate that the highest intelligibility is obtained when audibility is improved and the relative spectral shapes of different speech sounds are preserved.  相似文献   

15.
For the purpose of improving speech transmission performance in a dome space, the acoustical properties in a dome having a diameter of 20 m were examined. The acoustical properties measured evenly on the floor of the dome were evaluated both objectively and subjectively and the interrelationship of the objective measures and subjective measures were also examined. Then, on the basis of the results of the study, simplified acoustical remedies were applied to the dome to improve speech intelligibility and the effect of the remedies was also examined. The following findings were obtained from this investigation.(1) The speech transmission performance in the dome space without treatment by absorptive materials varies greatly with the locations of sound sources and observation points: a range of 0.17-0.59 for RASTI value and a range of 30-97% for speech intelligibility test results. (2) There are peculiar observation points at which speech transmission quality is very high due to a considerable sum of the energy arriving in the first 0.06 s after the arrival of the direct sound. (3) Of all the measured acoustical parameters, RASTI, EDT in 1 kHz band, early-to-late arriving sound energy ratio, and Ts corresponded well to the speech intelligibility test scores. (4) Rubber tiles, cotton canvas 12 m in length, and glass wool board, are effective in improving speech intelligibility remarkably due to increased sound absorption and the diffusion effect.  相似文献   

16.
Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task.  相似文献   

17.
It is known that information contained within the filter skirts can provide cues important to speech intelligibility. However, the role of filter slope during temporal smoothing has received little attention. In experiment 1, smoothing filter slope angle was found to have a large effect on the intelligibility of sentences represented by three amplitude-modulated sinusoids. In experiment 2, the use of temporal cues above 16 Hz was examined across various regions of the spectrum. When increases in rate were presented to individual spectral bands, intelligibility only increased when presented in the higher spectral region. This result suggests a greater reliance on higher-rate cues in this region. However, intelligibility was greatest when these cues were distributed across the spectrum, indicating that their effective use is not restricted solely to this region.  相似文献   

18.
The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected.  相似文献   

19.
For ideal speech communication in public spaces, it is important to determine the optimum speech level for various background noise levels. However, speech intelligibility scores, which is conventionally used as the subjective listening test to measure the quality of speech communication, is near perfect in most everyday situations. For this reason, it is proposed to determine optimum speech levels for speech communication in public spaces by using listening difficulty ratings. Two kinds of listening test were carried out in this work. The results of the tests and our previous work [M. Morimoto, H. Sato, and M. Kobayashi, J. Acoust. Soc. Am. 116, 1607-1613 (2004)] are jointly discussed for suggesting the relation between the optimum speech level and background noise level. The results demonstrate that: (1) optimum speech level is constant when background noise level is lower than 40 dBA, (2) optimum speech level appears to be the level, which maintains around 15 dBA of SN ratio when the background noise level is more than 40 dBA, and (3) listening difficulty increases as speech level increases under the condition where SN ratio is good enough to keep intelligibility near perfect.  相似文献   

20.
Previously, almost all physical measures for estimating speech intelligibility in a room have been derived from only temporal-monaural criteria. This paper shows that speech intelligibility for a sound field with a single reflection depends not only on the temporal-monaural factor but also on the spatial-binaural factor of the sound field. Articulation tests for sound fields simulated with a single reflection of delay time delta t1 after the direct sound were conducted changing the horizontal incident angle xi of the reflection. Remarkable findings are as followings: (1) speech intelligibility (SI) decreases with increasing delay time delta t1, (2) SI increases when xi approaches 90 degrees; the horizontal angle of the reflection causes a significant effect on SI, and (3) the analysis of variance for articulation test scores clearly demonstrated that the effects of both delta t1 and xi on SI are fully independent. Concerning result (2), if listeners get a spatial separation of signals at the two ears, then the listener's capability for speech perception is assumed to be improved due to "adding" further information to the temporal pattern recognition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号