首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Conversations must be shielded from people in an adjacent room if they include confidential information. Word intelligibility tests were performed in a total of 185 sound fields to examine the relationship between sound insulation performance and the degree of conversation leakage. The parameters of the test sound fields were background noise level in the adjacent room and the level difference between the two rooms. The background noise level was varied from 30 to 50 dB (A-weighted). The level difference was parametrically varied in terms of eight frequency characteristics and 10 absolute values. The results showed that word intelligibility scores were strongly correlated with the A-weighted speech-to-noise ratio and SNRuni32. Equal-intelligibility contours, which can easily show the weighted level difference and A-weighted background noise level required to achieve a certain level of word intelligibility scores, were obtained from a multiple logistic regression analysis.  相似文献   

2.
The normalized covariance measure (NCM) has been shown previously to predict reliably the intelligibility of noise-suppressed speech containing non-linear distortions. This study analyzes a simplified NCM measure that requires only a small number of bands (not necessarily contiguous) and uses simple binary (1 or 0) weighting functions. The rationale behind the use of a small number of bands is to account for the fact that the spectral information contained in contiguous or nearby bands is correlated and redundant. The modified NCM measure was evaluated with speech intelligibility scores obtained by normal-hearing listeners in 72 noisy conditions involving noise-suppressed speech corrupted by four different types of maskers (car, babble, train, and street interferences). High correlation (r = 0.8) was obtained with the modified NCM measure even when only one band was used. Further analysis revealed a masker-specific pattern of correlations when only one band was used, and bands with low correlation signified the corresponding envelopes that have been severely distorted by the noise-suppression algorithm and/or the masker. Correlation improved to r = 0.84 when only two disjoint bands (centered at 325 and 1874 Hz) were used. Even further improvements in correlation (r = 0.85) were obtained when three or four lower-frequency (<700 Hz) bands were selected.  相似文献   

3.
蒋斌  匡正  吴鸣  杨军 《声学学报》2012,37(6):659-666
实验研究了帧长对汉语音段反转言语可懂度的影响。实验结果表明,帧长在64 ms以下,汉语音段反转言语具有较高的可懂度;帧长在64~203 ms之间,可懂度随帧长的增加逐渐降低;帧长在203 ms以上,可懂度为0。在帧长8 ms时,汉语的声调失真导致可懂度下降。原始语音信号和音段反转言语的调制谱的分析表明,调制谱失真大小和可懂度密切相关。因此,用原始语音信号和音段反转言语的窄带包络间的归一化相关值可以衡量调制谱失真大小,基于语音的语言传输指数法计算的客观值和实验结果显著相关(r=0.876,p<0.01)。研究表明,语言可懂度与窄带包络有关,音段反转言语的可懂度和保留原始语音信号的窄带包络密切相关。   相似文献   

4.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

5.
A 1000 consonant–vowel–consonant structure logatoms corpus (CVC-structure), grouped in 10 phonetically equally balanced lists of 100 words each, was developed to satisfy the need of subjective assessment of speech intelligibility in American Spanish speaking environments. This corpus was tested and correlated with the Speech Transmission Index (STI) measurements to compare its articulation intelligibility score with other lists’ scores.Through the development of this work it was determined that in two different acoustically poor rooms that have the same STI (with STI < 0.50), the intelligibility score is lower when the articulation test is performed in a quiet room with high reverberation time than when it is performed in a very noisy room with low reverberation time. The final correlation curve of the American Spanish CVC-structure corpus was around 10% points higher than the CVCEQB curve obtained by Steeneken and Houtgast in 2002.  相似文献   

6.
Listening difficulty ratings, using words with high word familiarity, are proposed as a new subjective measure for the evaluation of speech transmission in public spaces to provide realistic and objective results. Two listening tests were performed to examine their validity, compared with intelligibility scores. The tests included a reverberant signal and noise as detrimental sounds. The subject was asked to repeat each word and simultaneously to rate the listening difficulty into one of four categories: (1) not difficult, (2) a little difficult, (3) fairly difficult, and (4) extremely difficult. After the tests, the four categories were reclassified into, not difficult [response (1)] and some level of difficulty, (the other 3 responses). Listening difficulty is defined as the percentage of the total number of responses indicating some level of difficulty [i.e. not (1)]. The results of two listening tests demonstrated that listening difficulty ratings can evaluate speech transmission performance more accurately and sensitively than intelligibility scores for sound fields with higher speech transmission performance.  相似文献   

7.
The Signal-to-Noise Ratio devised by Lochner and Burger contributed an objective design index for predicting speech intelligibility. Their index provided a measure of useful and detrimental reflected speech energy according to the integration and masking characteristics of hearing, and enabled predictions to be made from impulse measurements in models. However, it was found necessary to extend the Signal-to-Noise Ratio theory to account for the effect of fluctuating ambient background noise on speech intelligibility. A modified Signal-to-Noise Ratio was derived from a best-fitting empirical correlation with speech intelligibility in a series of measurements in existing auditoria. In the modified Signal-to-Noise Ratio ambient background noise is no longer considered in terms of its steady state characteristics but more specifically in terms of its transient and spectral characteristics given by the concept of the L10 PNC level. The index has been applied as design criteria to prediction and to evaluation techniques.  相似文献   

8.
The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak.  相似文献   

9.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners.  相似文献   

10.
Objective measures were investigated as predictors of the speech security of closed offices and rooms. A new signal-to-noise type measure is shown to be a superior indicator for security than existing measures such as the Articulation Index, the Speech Intelligibility Index, the ratio of the loudness of speech to that of noise, and the A-weighted level difference of speech and noise. This new measure is a weighted sum of clipped one-third-octave-band signal-to-noise ratios; various weightings and clipping levels are explored. Listening tests had 19 subjects rate the audibility and intelligibility of 500 English sentences, filtered to simulate transmission through various wall constructions, and presented along with background noise. The results of the tests indicate that the new measure is highly correlated with sentence intelligibility scores and also with three security thresholds: the threshold of intelligibility (below which speech is unintelligible), the threshold of cadence (below which the cadence of speech is inaudible), and the threshold of audibility (below which speech is inaudible). The ratio of the loudness of speech to that of noise, and simple A-weighted level differences are both shown to be well correlated with these latter two thresholds (cadence and audibility), but not well correlated with intelligibility.  相似文献   

11.
For the purpose of improving speech transmission performance in a dome space, the acoustical properties in a dome having a diameter of 20 m were examined. The acoustical properties measured evenly on the floor of the dome were evaluated both objectively and subjectively and the interrelationship of the objective measures and subjective measures were also examined. Then, on the basis of the results of the study, simplified acoustical remedies were applied to the dome to improve speech intelligibility and the effect of the remedies was also examined. The following findings were obtained from this investigation.(1) The speech transmission performance in the dome space without treatment by absorptive materials varies greatly with the locations of sound sources and observation points: a range of 0.17-0.59 for RASTI value and a range of 30-97% for speech intelligibility test results. (2) There are peculiar observation points at which speech transmission quality is very high due to a considerable sum of the energy arriving in the first 0.06 s after the arrival of the direct sound. (3) Of all the measured acoustical parameters, RASTI, EDT in 1 kHz band, early-to-late arriving sound energy ratio, and Ts corresponded well to the speech intelligibility test scores. (4) Rubber tiles, cotton canvas 12 m in length, and glass wool board, are effective in improving speech intelligibility remarkably due to increased sound absorption and the diffusion effect.  相似文献   

12.
The literature on various parameters that appear in the articulation index-type calculations of speech intelligibility is reexamined. Based on the reported data, the best estimates of these parameters and the most appropriate procedures for their use are suggested. These included: (1) the analysis and specification of the importance of various frequency bands to speech intelligibility; (2) the procedures used for measuring threshold and the calculation of threshold-based parameters used for predicting intelligibility of low-level speech; and (3) the calculation and measurement of relevant speech parameters. All results are given so that the calculations can be performed either in critical bands, 1/3 octaves, or octaves.  相似文献   

13.
This paper reports on an evaluation of ratings of the sound insulation of simulated walls in terms of the intelligibility of speech transmitted through the walls. Subjects listened to speech modified to simulate transmission through 20 different walls with a wide range of sound insulation ratings, with constant ambient noise. The subjects' mean speech intelligibility scores were compared with various physical measures to test the success of the measures as sound insulation ratings. The standard Sound Transmission Class (STC) and Weighted Sound Reduction Index ratings were only moderately successful predictors of intelligibility scores, and eliminating the 8 dB rule from STC led to very modest improvements. Various previously established speech intelligibility measures (e.g., Articulation Index or Speech Intelligibility Index) and measures derived from them, such as the Articulation Class, were all relatively strongly related to speech intelligibility scores. In general, measures that involved arithmetic averages or summations of decibel values over frequency bands important for speech were most strongly related to intelligibility scores. The two most accurate predictors of the intelligibility of transmitted speech were an arithmetic average transmission loss over the frequencies from 200 to 2.5 kHz and the addition of a new spectrum weighting term to R(w) that included frequencies from 400 to 2.5 kHz.  相似文献   

14.
Frequency response characteristics were selected for 14 hearing-impaired ears, according to six procedures. Three procedures were based on MCL measurements with speech bands of three bandwidths (1/3 octave, 1 octave, and 1 2/3 octaves). The other procedures were based on hearing thresholds, pure-tone MCLs, and pure-tone LDLs. The procedures were evaluated by speech discrimination testing, using nonsense syllables in noise, and by paired comparison judgments of the intelligibility and pleasantness of running speech. Speech discrimination testing showed significant differences between pairs of responses for only seven test ears. Nasals and glides were most affected by frequency response variations. Both intelligibility and pleasantness judgments showed significant differences for all test ears. Intelligibility in noise was less affected by frequency response differences than was intelligibility in quiet or pleasantness in quiet or in noise. For some ears, the ranking of responses depended on whether intelligibility or pleasantness was being judged and on whether the speech was in quiet or in noise. Overall, the three speech band MCL procedures were far superior to the others. Thus the studies strongly support the frequency response selection rationale of amplifying all frequency bands of speech to MCL. They also highlight some of the complications involved in achieving this aim.  相似文献   

15.
16.
Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of environments and applications, it is essentially a monaural model. Advantages of binaural hearing in speech intelligibility are disregarded. In specific conditions, this leads to considerable mismatches between subjective intelligibility and the STI. A binaural version of the STI was developed based on interaural cross correlograms, which shows a considerably improved correspondence with subjective intelligibility in dichotic listening conditions. The new binaural STI is designed to be a relatively simple model, which adds only few parameters to the original standardized STI and changes none of the existing model parameters. For monaural conditions, the outcome is identical to the standardized STI. The new model was validated on a set of 39 dichotic listening conditions, featuring anechoic, classroom, listening room, and strongly echoic environments. For these 39 conditions, speech intelligibility [consonant-vowel-consonant (CVC) word score] and binaural STI were measured. On the basis of these conditions, the relation between binaural STI and CVC word scores closely matches the STI reference curve (standardized relation between STI and CVC word score) for monaural listening. A better-ear STI appears to perform quite well in relation to the binaural STI model; the monaural STI performs poorly in these cases.  相似文献   

17.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

18.
The possibility of early fire detection via lidar (light detection and ranging) technology implemented through a low-cost rangefinder is investigated. The evaluation is based on the variation of signal-to-noise ratio (SNR) with distance calculated on the basis of a theoretical model and determined experimentally. The theoretical SNR is obtained by combining a hydrodynamic model of the smoke plume taking into consideration the effect of wind (which enables calculation of smoke–particle distribution) and a lidar model that enables backscattered radiation intensity, detected power and, eventually, SNR to be assessed using Mie theory. The calculated values of SNR agree reasonably well with the experimental results obtained using small-scale experimental fires and show that in favourable conditions detection ranges up to about 4 km are achievable.  相似文献   

19.
The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected.  相似文献   

20.
In the n-of-m strategy, the signal is processed through m bandpass filters from which only the n maximum envelope amplitudes are selected for stimulation. While this maximum selection criterion, adopted in the advanced combination encoder strategy, works well in quiet, it can be problematic in noise as it is sensitive to the spectral composition of the input signal and does not account for situations in which the masker completely dominates the target. A new selection criterion is proposed based on the signal-to-noise ratio (SNR) of individual channels. The new criterion selects target-dominated (SNR > or = 0 dB) channels and discards masker-dominated (SNR<0 dB) channels. Experiment 1 assessed cochlear implant users' performance with the proposed strategy assuming that the channel SNRs are known. Results indicated that the proposed strategy can restore speech intelligibility to the level attained in quiet independent of the type of masker (babble or continuous noise) and SNR level (0-10 dB) used. Results from experiment 2 showed that a 25% error rate can be tolerated in channel selection without compromising speech intelligibility. Overall, the findings from the present study suggest that the SNR criterion is an effective selection criterion for n-of-m strategies with the potential of restoring speech intelligibility.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号