首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 703 毫秒
1.
Intelligibility of average talkers in typical listening environments   总被引:1,自引:0,他引:1  
Intelligibility of conversationally produced speech for normal hearing listeners was studied for three male and three female talkers. Four typical listening environments were used. These simulated a quiet living room, a classroom, and social events in two settings with different reverberation characteristics. For each talker, overall intelligibility and intelligibility for vowels, consonant voicing, consonant continuance, and consonant place were quantified using the speech pattern contrast (SPAC) test. Results indicated that significant intelligibility differences are observed among normal talkers even in listening environments that permit essentially full intelligibility for everyday conversations. On the whole, talkers maintained their relative intelligibility across the four environments, although there was one exception which suggested that some voices may be particularly susceptible to degradation due to reverberation. Consonant place was the most poorly perceived feature, followed by continuance, voicing, and vowel intelligibility. However, there were numerous significant interactions between talkers and speech features, indicating that a talker of average overall intelligibility may produce certain speech features with intelligibility that is considerably higher or lower than average. Neither long-term rms speech spectrum nor articulation rate was found to be an adequate single criterion for selecting a talker of average intelligibility. Ultimately, an average talker was chosen on the basis of four speech contrasts: initial consonant place, and final consonant place, voicing, and continuance.  相似文献   

2.
A number of objective evaluation methods are currently used to quantify the speech intelligibility in a built environment, including the speech transmission index (STI), rapid speech transmission index (RASTI), articulation index (AI), and the percent articulation loss of consonants (%ALCons). Certain software programs can quickly evaluate STI, RASTI, and %ALCons from a measured room impulse response. In this project, two impulse-response-based software packages (WinMLS and SIA-Smaart Acoustic Tools) were evaluated for their ability to determine intelligibility accurately. In four different spaces with background noise levels less than NC 45, speech intelligibility was measured via three methods: (1) with WinMLS 2000; (2) with SIA-Smaart Acoustic Tools (v4.0.2); and (3) from listening tests with humans. The study found that WinMLS measurements of speech intelligibility based on STI, RASTI, and %ALCons corresponded well with performance on the listening tests. SIA-Smaart results were correlated to human responses, but tended to under-predict intelligibility based on STI and RASTI, and over-predict intelligibility based on %ALCons.  相似文献   

3.
In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners.  相似文献   

4.
The purpose of this study was to describe the formant trajectories produced by males with amyotrophic lateral sclerosis (ALS), a degenerative neuromuscular disease that is typically associated with dysarthria. Formant trajectories of 25 males with ALS and 15 neurologically normal geriatric males were compared for 12 words selected from the speech intelligibility task developed by Kent et al. [J. Speech. Hear. Disord. 54, 482-499 (1989)]. The results indicated that speakers with ALS (1) produced formant transitions having shallower slopes than transitions of normal speakers, (2) tended to produce exaggerations of formant trajectories at the onset of vocalic nuclei, and (3) had greater interspeaker variability of formant transition characteristics than normal speakers. Within the group of ALS speakers, those subjects who were less than 70% intelligible produced distinctly more aberrant trajectory characteristics than subjects who were more than 70% intelligible. ALS subjects who were less than 70% intelligible produced many trajectories that were essentially flat, or that had very shallow slopes. These results are discussed in terms of the speech production deficit in the dysarthria associated with ALS, and with respect to the potential influence of aberrant trajectories on speech intelligibility.  相似文献   

5.
Three experiments were conducted to study the effect of segmental and suprasegmental corrections on the intelligibility and judged quality of deaf speech. By means of digital signal processing techniques, including LPC analysis, transformations of separate speech sounds, temporal structure, and intonation were carried out on 30 Dutch sentences spoken by ten deaf children. The transformed sentences were tested for intelligibility and acceptability by presenting them to inexperienced listeners. In experiment 1, LPC based reflection coefficients describing segmental characteristics of deaf speakers were replaced by those of hearing speakers. A complete segmental correction caused a dramatic increase in intelligibility from 24% to 72%, which, for a major part, was due to correction of vowels. Experiment 2 revealed that correction of temporal structure and intonation caused only a small improvement from 24% to about 34%. Combination of segmental and suprasegmental corrections yielded almost perfectly understandable sentences, due to a more than additive effect of the two corrections. Quality judgments, collected in experiment 3, were in close agreement with the intelligibility measures. The results show that, in order for these speakers to become more intelligible, improving their articulation is more important than improving their production of temporal structure and intonation.  相似文献   

6.
The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures.  相似文献   

7.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

8.
Speech intelligibility was investigated by varying the number of interfering talkers, level, and mean pitch differences between target and interfering speech, and the presence of tactile support. In a first experiment the speech-reception threshold (SRT) for sentences was measured for a male talker against a background of one to eight interfering male talkers or speech noise. Speech was presented diotically and vibro-tactile support was given by presenting the low-pass-filtered signal (0-200 Hz) to the index finger. The benefit in the SRT resulting from tactile support ranged from 0 to 2.4 dB and was largest for one or two interfering talkers. A second experiment focused on masking effects of one interfering talker. The interference was the target talker's own voice with an increased mean pitch by 2, 4, 8, or 12 semitones. Level differences between target and interfering speech ranged from -16 to +4 dB. Results from measurements of correctly perceived words in sentences show an intelligibility increase of up to 27% due to tactile support. Performance gradually improves with increasing pitch difference. Louder target speech generally helps perception, but results for level differences are considerably dependent on pitch differences. Differences in performance between noise and speech maskers and between speech maskers with various mean pitches are explained by the effect of informational masking.  相似文献   

9.
The evaluation of intelligibility of noise reduction algorithms is reported. IEEE sentences and consonants were corrupted by four types of noise including babble, car, street and train at two signal-to-noise ratio levels (0 and 5 dB), and then processed by eight speech enhancement methods encompassing four classes of algorithms: spectral subtractive, sub-space, statistical model based and Wiener-type algorithms. The enhanced speech was presented to normal-hearing listeners for identification. With the exception of a single noise condition, no algorithm produced significant improvements in speech intelligibility. Information transmission analysis of the consonant confusion matrices indicated that no algorithm improved significantly the place feature score, significantly, which is critically important for speech recognition. The algorithms which were found in previous studies to perform the best in terms of overall quality, were not the same algorithms that performed the best in terms of speech intelligibility. The subspace algorithm, for instance, was previously found to perform the worst in terms of overall quality, but performed well in the present study in terms of preserving speech intelligibility. Overall, the analysis of consonant confusion matrices suggests that in order for noise reduction algorithms to improve speech intelligibility, they need to improve the place and manner feature scores.  相似文献   

10.
Speech tempo (articulation rate) varies both between and within speakers. The present study investigates several factors affecting tempo in a corpus of spoken Dutch, consisting of interviews with 160 high-school teachers. Speech tempo was observed for each phrase separately, and analyzed by means of multilevel modeling of the speaker's sex, age, country, and dialect region (between speakers) and length, sequential position of phrase, and autocorrelated tempo (within speakers). Results show that speech tempo in this corpus depends mainly on phrase length, due to anticipatory shortening, and on the speaker's country, with different speaking styles in The Netherlands (faster, less varied) and in Flanders (slower, more varied). Additional analyses showed that phrase length itself is shorter in The Netherlands than in Flanders, and decreases with speaker's age. Older speakers tend to vary their phrase length more (within speakers), perhaps due to their accumulated verbal proficiency.  相似文献   

11.
To clarify the role of formant frequency in the perception of pitch in whispering, we conducted a preliminary experiment to determine (1.) whether speakers change their pitch during whispering; (2.) whether listeners can perceive differences in pitch; and (3.) what the acoustical features are when speakers change their pitch. The listening test of whispered Japanese speech demonstrates that one can determine the perceived pitch of vowel /a/ as ordinary, high, or low. Acoustical analysis revealed that the perception of pitch corresponds to some formant frequencies. Further data with synthesized whispered voice are necessary to confirm the importance of the formant frequencies in detail for perceived pitch of whispered vowels.  相似文献   

12.
This article introduces a new model that predicts speech intelligibility based on statistical decision theory. This model, which we call the speech recognition sensitivity (SRS) model, aims to predict speech-recognition performance from the long-term average speech spectrum, the masking excitation in the listener's ear, the linguistic entropy of the speech material, and the number of response alternatives available to the listener. A major difference between the SRS model and other models with similar aims, such as the articulation index, is this model's ability to account for synergetic and redundant interactions among spectral bands of speech. In the SRS model, linguistic entropy affects intelligibility by modifying the listener's identification sensitivity to the speech. The effect of the number of response alternatives on the test score is a direct consequence of the model structure. The SRS model also appears to predict the differential effect of linguistic entropy on filter condition and the interaction between linguistic entropy, signal-to-noise ratio, and language proficiency.  相似文献   

13.
Subjective speech intelligibility can be assessed by speech recorded in an anechoic chamber and then convolved with room impulse responses that can be created by acoustic simulation. The speech intelligibility (SI) assessment based on auralization was validated in three rooms. The articulation scores obtained from simulated sound field were compared with the ones from measured sound field and from direct listening in rooms. Results show that the speech intelligibility prediction based on auralization technique with simulated binaural room impulse responses (BRIRs) is in agreement with reality and results from measured BRIRs. When this technique is used with simulated and measured monaural room impulse responses (MRIRs), the predicted results underestimate the reality. It has been shown that auralization technique with simulated BRIRs is capable of assessing subjective speech intelligibility of listening positions in the room.  相似文献   

14.
The purpose of this study was to compare oral pressure (Po), nasal airflow (Vn), and velopharyngeal (VP) orifice area estimates from 12 tracheoesophageal (TE) and 12 laryngeal speakers as they produced /p/ and /m/ in syllable series. The findings were as follows: (1) TE speakers produced greater Po than the laryngeal speakers; (2) for /p/, TE speakers generated Vn, and VP orifice area estimates comparable with, or less than, the laryngeal speakers; and (3) for /m/, TE speakers had Vn and VP orifice area estimates greater than the laryngeal speakers. The elevated Po could be the result of several factors such as high source driving pressures and vocal tract volume changes postlaryngectomy. Attempts at more precise articulation, and subsequently less coarticulation, by the TE speakers may explain the Vn and VP orifice area estimates for /p/ and /m/. TE speakers may be limiting the oral-nasal cavity coupling for /p/ (smaller VP gap, less Vn) in an attempt to produce a very precise oral /p/. For /m/, TE speakers may be attempting to overtly mark the consonant as a nasal (greater Vn, larger VP gap). Further studies are needed to confirm/refute the explanations postulated here regarding the VP aerodynamic differences that were identified.  相似文献   

15.
Effects of noise on speech production: acoustic and perceptual analyses   总被引:4,自引:0,他引:4  
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.  相似文献   

16.
Synthesis (carrier) signals in acoustic models embody assumptions about perception of auditory electric stimulation. This study compared speech intelligibility of consonants and vowels processed through a set of nine acoustic models that used Spectral Peak (SPEAK) and Advanced Combination Encoder (ACE)-like speech processing, using synthesis signals which were representative of signals used previously in acoustic models as well as two new ones. Performance of the synthesis signals was determined in terms of correspondence with cochlear implant (CI) listener results for 12 attributes of phoneme perception (consonant and vowel recognition; F1, F2, and duration information transmission for vowels; voicing, manner, place of articulation, affrication, burst, nasality, and amplitude envelope information transmission for consonants) using four measures of performance. Modulated synthesis signals produced the best correspondence with CI consonant intelligibility, while sinusoids, narrow noise bands, and varying noise bands produced the best correspondence with CI vowel intelligibility. The signals that performed best overall (in terms of correspondence with both vowel and consonant attributes) were modulated and unmodulated noise bands of varying bandwidth that corresponded to a linearly varying excitation width of 0.4 mm at the apical to 8 mm at the basal channels.  相似文献   

17.
The objectives of this prospective and exploratory study are to determine: (1) na?ve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty na?ve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Na?ve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.  相似文献   

18.
在低背景噪声的野外环境中,采用小闭集汉语(声母)清晰度测试方法,试验比较了四种防毒面具的清晰度水平。测试结果证实:与不佩戴面具相比,佩戴面具后语言清晰度得分严重降低,并随通话距离的增加而进一步恶化;以75%清晰度得分作为通话性能的可接受限度,那么,不佩戴面具及佩戴四种面具的有效通话距离分别为63.6、15.7、18.6、25.0和26.9m。此外,结合对四种面具传声特性测定结果,本文还分析了清晰度测试方法及其结果的合理性。  相似文献   

19.
The interlanguage speech intelligibility benefit   总被引:1,自引:0,他引:1  
This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication.  相似文献   

20.
Annoyance ratings in speech intelligibility tests at 45 dB(A) and 55 dB(A) traffic noise were investigated in a laboratory study. Subjects were chosen according to their hearing acuity to be representative of 70-year-old men and women, and of noise-induced hearing losses typical for a great number of industrial workers. These groups were compared with normal hearing subjects of the same sex and, when possible, the same age. The subjects rated their annoyance on an open 100 mm scale. Significant correlations were found between annoyance expressed in millimetres and speech intelligibility in percent when all subjects were taken as one sample. Speech intelligibility was also calculated from physical measurements of speech and noise by using the articulation index method. Observed and calculated speech intelligibility scores are compared and discussed. Also treated is the estimation of annoyance by traffic noise at moderate noise levels via speech intelligibility scores.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号