首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
This research was designed to investigate the effects of selected vocal disguises upon speaker identification by listening. The experiment consisted of 360 pair discriminations presented in a fixed-sequence mode. The listeners were asked to decide whether two sentences were uttered by the same or different speakers and to rate their degree of confidence in each decision. The speakers produced two sentence sets utilizing their normal speaking mode and five selected disguises. One member of each stimulus pair in the listening task was always an undisguised speech sample; the other member was either disguised or undisguised. Two listener groups were trained for the task: a naive group of 24 undergraduate students, and a sophisticated group of three doctoral students and three professors of Speech and Hearing Sciences. Both groups of listeners were able to discriminate speakers with a moderately high degree of accuracy (92% correct) when both members of the stimulus pair were undisguised. The inclusion of a disguised speech sample in the stimulus pair significantly interfered with listener performance (59%--81% correct depending upon the particular disguise). These results present a similar pattern to this authors' previous results utilizing spectrographic speaker-identification tasks (Reich et al., 1976).  相似文献   

2.
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.  相似文献   

3.
This study evaluated the effects of time compression and expansion on sentence recognition by normal-hearing (NH) listeners and cochlear-implant (CI) recipients of the Nucleus-22 device. Sentence recognition was measured in five CI users using custom 4-channel continuous interleaved sampler (CIS) processors and five NH listeners using either 4-channel or 32-channel noise-band processors. For NH listeners, recognition was largely unaffected by time expansion, regardless of spectral resolution. However, recognition of time-compressed speech varied significantly with spectral resolution. When fine spectral resolution (32 channels) was available, speech recognition was unaffected even when the duration of sentences was shortened to 40% of their original length (equivalent to a mean duration of 40 ms/phoneme). However, a mean duration of 60 ms/phoneme was required to achieve the same level of recognition when only coarse spectral resolution (4 channels) was available. Recognition patterns were highly variable across CI listeners. The best CI listener performed as well as NH subjects listening to corresponding spectral conditions; however, three out of five CI listeners performed significantly poorer in recognizing time-compressed speech. Further investigation revealed that these three poorer-performing CI users also had more difficulty with simple temporal gap-detection tasks. The results indicate that limited spectral resolution reduces the ability to recognize time-compressed speech. Some CI listeners have more difficulty with time-compressed speech, as produced by rapid speakers, because of reduced spectral resolution and deficits in auditory temporal processing.  相似文献   

4.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

5.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

6.
This study aimed to clarify the basic auditory and cognitive processes that affect listeners' performance on two spatial listening tasks: sound localization and speech recognition in spatially complex, multi-talker situations. Twenty-three elderly listeners with mild-to-moderate sensorineural hearing impairments were tested on the two spatial listening tasks, a measure of monaural spectral ripple discrimination, a measure of binaural temporal fine structure (TFS) sensitivity, and two (visual) cognitive measures indexing working memory and attention. All auditory test stimuli were spectrally shaped to restore (partial) audibility for each listener on each listening task. Eight younger normal-hearing listeners served as a control group. Data analyses revealed that the chosen auditory and cognitive measures could predict neither sound localization accuracy nor speech recognition when the target and maskers were separated along the front-back dimension. When the competing talkers were separated along the left-right dimension, however, speech recognition performance was significantly correlated with the attentional measure. Furthermore, supplementary analyses indicated additional effects of binaural TFS sensitivity and average low-frequency hearing thresholds. Altogether, these results are in support of the notion that both bottom-up and top-down deficits are responsible for the impaired functioning of elderly hearing-impaired listeners in cocktail party-like situations.  相似文献   

7.
Quantifying the intelligibility of speech in noise for non-native listeners   总被引:3,自引:0,他引:3  
When listening to languages learned at a later age, speech intelligibility is generally lower than when listening to one's native language. The main purpose of this study is to quantify speech intelligibility in noise for specific populations of non-native listeners, only broadly addressing the underlying perceptual and linguistic processing. An easy method is sought to extend these quantitative findings to other listener populations. Dutch subjects listening to Germans and English speech, ranging from reasonable to excellent proficiency in these languages, were found to require a 1-7 dB better speech-to-noise ratio to obtain 50% sentence intelligibility than native listeners. Also, the psychometric function for sentence recognition in noise was found to be shallower for non-native than for native listeners (worst-case slope around the 50% point of 7.5%/dB, compared to 12.6%/dB for native listeners). Differences between native and non-native speech intelligibility are largely predicted by linguistic entropy estimates as derived from a letter guessing task. Less effective use of context effects (especially semantic redundancy) explains the reduced speech intelligibility for non-native listeners. While measuring speech intelligibility for many different populations of listeners (languages, linguistic experience) may be prohibitively time consuming, obtaining predictions of non-native intelligibility from linguistic entropy may help to extend the results of this study to other listener populations.  相似文献   

8.
Speech-reception thresholds (SRT) were measured for 17 normal-hearing and 17 hearing-impaired listeners in conditions simulating free-field situations with between one and six interfering talkers. The stimuli, speech and noise with identical long-term average spectra, were recorded with a KEMAR manikin in an anechoic room and presented to the subjects through headphones. The noise was modulated using the envelope fluctuations of the speech. Several conditions were simulated with the speaker always in front of the listener and the maskers either also in front, or positioned in a symmetrical or asymmetrical configuration around the listener. Results show that the hearing impaired have significantly poorer performance than the normal hearing in all conditions. The mean SRT differences between the groups range from 4.2-10 dB. It appears that the modulations in the masker act as an important cue for the normal-hearing listeners, who experience up to 5-dB release from masking, while being hardly beneficial for the hearing impaired listeners. The gain occurring when maskers are moved from the frontal position to positions around the listener varies from 1.5 to 8 dB for the normal hearing, and from 1 to 6.5 dB for the hearing impaired. It depends strongly on the number of maskers and their positions, but less on hearing impairment. The difference between the SRTs for binaural and best-ear listening (the "cocktail party effect") is approximately 3 dB in all conditions for both the normal-hearing and the hearing-impaired listeners.  相似文献   

9.
To identify a speaker's sex, listeners may rely on sex-based differences in average fundamental frequency (F0), but overlap in male and female F0 ranges undermines such judgments. To test accuracy of sex-identification throughout the F0 range, listeners were asked to judge sex based on audio recordings of /ɑ/ spoken on a number of overlapping steady F0s by 10 male and 10 female English speakers. In general, listeners performed above chance (71.6% correct). However, near range extrema, listeners followed an apparent bias toward hearing high F0s as female and low as male; confidence was high when accuracy was high and vice-versa. At mid-range, listeners identified sex fairly accurately but were not very confident in their judgments. In a forced-choice task, vowels close in F0 (but beyond the difference limen) were presented in male-female or female-male pairs. Listeners weakly identified speaker sex (63.3% correct). Identification of the male voice was considerably above chance only when the male had the lower F0 of the pair. Reliance on stereotypes of speaking F0 may bias listeners to hear low F0s as male and high F0s as female, perhaps with a contribution from vocal-tract length information. No strong evidence for a contribution of voice quality obtained.  相似文献   

10.
Previous work has shown that the intelligibility of speech in noise is degraded if the speaker and listener differ in accent, in particular when there is a disparity between native (L1) and nonnative (L2) accents. This study investigated how this talker-listener interaction is modulated by L2 experience and accent similarity. L1 Southern British English, L1 French listeners with varying L2 English experience, and French-English bilinguals were tested on the recognition of English sentences mixed in speech-shaped noise that was spoken with a range of accents (French, Korean, Northern Irish, and Southern British English). The results demonstrated clear interactions of accent and experience, with the least experienced French speakers being most accurate with French-accented English, but more experienced listeners being most accurate with L1 Southern British English accents. An acoustic similarity metric was applied to the speech productions of the talkers and the listeners, and significant correlations were obtained between accent similarity and sentence intelligibility for pairs of individuals. Overall, the results suggest that L2 experience affects talker-listener accent interactions, altering both the intelligibility of different accents and the selectivity of accent processing.  相似文献   

11.
In this study, the effect of articulation rate and speaking style on the perceived speech rate is investigated. The articulation rate is measured both in terms of the intended phones, i.e., phones present in the assumed canonical form, and as the number of actual, realized phones per second. The combination of these measures reflects the deletion of phones, which is related to speaking style. The effect of the two rate measures on the perceived speech rate is compared in two listening experiments on the basis of a set of intonation phrases with carefully balanced intended and realized phone rates, selected from a German database of spontaneous speech. Because the balance between input-oriented (effort) and output-oriented (communicative) constraints may be different at fast versus slow speech rates, the effect of articulation rate is compared both for fast and for slow phrases from the database. The effect of the listeners' own speaking habits is also investigated to evaluate if listeners' perception is based on a projection of their own behavior as a speaker. It is shown that listener judgments reflect both the intended and realized phone rates, and that their judgments are independent of the constraint balance and their own speaking habits.  相似文献   

12.
The purpose of this study was to determine the accuracy with which listeners could identify the gender of a speaker from a synthesized isolated vowel based on the natural production of that speaker when (1) the fundamental frequency was consistent with the speaker's gender, (2) the fundamental frequency was inconsistent with the the speaker's gender, and (3) the speaker was transgendered. Ten male-to-female transgendered persons, 10 men and 10 women, served as subjects. Each speaker produced the vowels /i/, /u/, and //. These vowels were analyzed for fundamental frequency and the first three formant frequencies and bandwidths. Formant frequency and bandwidth information was used to synthesize two vowel tokens for each speaker, one at a fundamental frequency of 120 Hz and one at 240 Hz. Listeners were asked to listen to these tokens and determine whether the original speaker was male or female. Listeners were not aware of the use of transgendered speakers. Results showed that, in all cases, gender identifications were based on fundamental frequency, even when fundamental frequency and formant frequency information was contradictory.  相似文献   

13.
Spectral- and cepstral-based acoustic measures are preferable to time-based measures for accurately representing dysphonic voices during continuous speech. Although these measures show promising relationships to perceptual voice quality ratings, less is known regarding their ability to differentiate normal from dysphonic voice during continuous speech and the consistency of these measures across multiple utterances by the same speaker. The purpose of this study was to determine whether spectral moments of the long-term average spectrum (LTAS) (spectral mean, standard deviation, skewness, and kurtosis) and cepstral peak prominence measures were significantly different for speakers with and without voice disorders when assessed during continuous speech. The consistency of these measures within a speaker across utterances was also addressed. Continuous speech samples from 27 subjects without voice disorders and 27 subjects with mixed voice disorders were acoustically analyzed. In addition, voice samples were perceptually rated for overall severity. Acoustic analyses were performed on three continuous speech stimuli from a reading passage: two full sentences and one constituent phrase. Significant between-group differences were found for both cepstral measures and three LTAS measures (P < 0.001): spectral mean, skewness, and kurtosis. These five measures also showed moderate to strong correlations to overall voice severity. Furthermore, high degrees of within-speaker consistency (correlation coefficients ≥0.89) across utterances with varying length and phonemic content were evidenced for both subject groups.  相似文献   

14.
The purpose of this study was to provide data on the ability of elderly listeners to estimate the age group of women (25-35, 45-55, 70-80) from phonated and whispered vowel productions. Further, comparisons were made between the performance of these elderly listeners and results for young listeners reported previously [S. E. Linville and H. Fisher, J. Acoust. Soc. Am. 78, 40-48 (1985)]. Tape recordings of whispered and normally phonated /ae/ vowels were played to 23 elderly women for relative age judgments. Results suggest that elderly women are not as accurate as young women in estimating age from sustained vowel productions, although the two listener groups tend to categorize individual speakers similarly. Further, it appears that listener age is a factor in acoustic cues used in making age judgments.  相似文献   

15.
The effect of ambient noise on vocal output and the preferred listening level of conversational speech was investigated under conditions typical of everyday speech communication. For a speaker-listener distance of 1 m, vocal output and the preferred listening level in quiet were both about 50 dB(A). Deviations from this value were observed when the noise level exceeded a level of about 40 dB(A). The regression lines for the data points above this level showed a 3 dB rise for a 10 dB rise in noise level. The experiments further suggest that both speaker and listener (when the latter is able to control the playback level of recorded speech) try to compensate for the noise interference by raising the level of speech in order to keep the (subjective) loudness of speech in noise equal to the loudness of speech in quiet.  相似文献   

16.
The effect of talker and token variability on speech perception has engendered a great deal of research. However, most of this research has compared listener performance in multiple-talker (or variable) situations to performance in single-talker conditions. It remains unclear to what extent listeners are affected by the degree of variability within a talker, rather than simply the existence of variability (being in a multitalker environment). The present study has two goals: First, the degree of variability among speakers in their /s/ and /S/ productions was measured. Even among a relatively small pool of talkers, there was a range of speech variability: some talkers had /s/ and /S/ categories that were quite distinct from one another in terms of frication centroid and skewness, while other speakers had categories that actually overlapped one another. The second goal was to examine whether this degree of variability within a talker influenced perception. Listeners were presented with natural /s/ and /S/ tokens for identification, under ideal listening conditions, and slower response times were found for speakers whose productions were more variable than for speakers with more internal consistency in their speech. This suggests that the degree of variability, not just the existence of it, may be the more critical factor in perception.  相似文献   

17.
Speaker variability and noise are two common sources of acoustic variability. The goal of this study was to examine whether these two sources of acoustic variability affected native and non-native perception of Mandarin fricatives to different degrees. Multispeaker Mandarin fricative stimuli were presented to 40 native and 52 non-native listeners in two presentation formats (blocked by speaker and mixed across speakers). The stimuli were also mixed with speech-shaped noise to create five levels of signal-to- noise ratios. The results showed that noise affected non-native identification disproportionately. By contrast, the effect of speaker variability was comparable between the native and non-native listeners. Confusion patterns were interpreted with reference to the results of acoustic analysis, suggesting native and non-native listeners used distinct acoustic cues for fricative identification. It was concluded that not all sources of acoustic variability are treated equally by native and non-native listeners. Whereas noise compromised non-native fricative perception disproportionately, speaker variability did not pose a special challenge to the non-native listeners.  相似文献   

18.
The differences in spectral shape resolution abilities among cochlear implant (CI) listeners, and between CI and normal-hearing (NH) listeners, when listening with the same number of channels (12), was investigated. In addition, the effect of the number of channels on spectral shape resolution was examined. The stimuli were rippled noise signals with various ripple frequency-spacings. An adaptive 41FC procedure was used to determine the threshold for resolvable ripple spacing, which was the spacing at which an interchange in peak and valley positions could be discriminated. The results showed poorer spectral shape resolution in CI compared to NH listeners (average thresholds of approximately 3000 and 400 Hz, respectively), and wide variability among CI listeners (range of approximately 800 to 8000 Hz). There was a significant relationship between spectral shape resolution and vowel recognition. The spectral shape resolution thresholds of NH listeners increased as the number of channels increased from 1 to 16, while the CI listeners showed a performance plateau at 4-6 channels, which is consistent with previous results using speech recognition measures. These results indicate that this test may provide a measure of CI performance which is time efficient and non-linguistic, and therefore, if verified, may provide a useful contribution to the prediction of speech perception in adults and children who use CIs.  相似文献   

19.
The ability to discriminate changes in the length of vowels and tonal complexes (filled intervals) and in the duration of closure in stop consonants and gaps in tonal complexes (unfilled intervals) was studied in three normally hearing and seven severely hearing-impaired listeners. The speech stimuli consisted of the vowels (i, I, u, U, a, A) and the consonants (p, t, k), and the tonal complexes consisted of digitally generated sinusoids at 0.5, 1, and 2 kHz. The signals were presented at conversational levels for each listener group, and a 3IFC adaptive procedure was used to estimate difference limens (DLs). The DLs for speech were similar to those for tonal complex stimuli in both the filled and unfilled conditions. Both normally and impaired-hearing listeners demonstrated greater acuity for changes in the duration of filled than unfilled intervals. Mean thresholds for filled intervals obtained from normally hearing listeners were smaller than those obtained from hearing-impaired listeners. For unfilled intervals, however, the difference between listener groups was not significant. A few hearing-impaired listeners demonstrated temporal acuity comparable to that of normally hearing listeners for several listening conditions. Implications of these results are discussed with regard to speech perception in normally and impaired-hearing individuals.  相似文献   

20.
Spectral weighting strategies using a correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] were measured in ten listeners with sensorineural-hearing loss on a sentence recognition task. Sentences and a spectrally matched noise were filtered into five separate adjacent spectral bands and presented to listeners at various signal-to-noise ratios (SNRs). Five point-biserial correlations were computed between the listeners' response (correct or incorrect) on the task and the SNR in each band. The stronger the correlation between performance and SNR, the greater that given band was weighted by the listener. Listeners were tested with and without hearing aids on. All listeners were experienced hearing aid users. Results indicated that the highest spectral band (approximately 2800-11 000 Hz) received the greatest weight in both listening conditions. However, the weight on the highest spectral band was less when listeners performed the task with their hearing aids on in comparison to when listening without hearing aids. No direct relationship was observed between the listeners' weights and the sensation level within a given band.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号