首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

2.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

3.
Noise and distortion reduce speech intelligibility and quality in audio devices such as hearing aids. This study investigates the perception and prediction of sound quality by both normal-hearing and hearing-impaired subjects for conditions of noise and distortion related to those found in hearing aids. Stimuli were sentences subjected to three kinds of distortion (additive noise, peak clipping, and center clipping), with eight levels of degradation for each distortion type. The subjects performed paired comparisons for all possible pairs of 24 conditions. A one-dimensional coherence-based metric was used to analyze the quality judgments. This metric was an extension of a speech intelligibility metric presented in Kates and Arehart (2005) [J. Acoust. Soc. Am. 117, 2224-2237] and is based on dividing the speech signal into three amplitude regions, computing the coherence for each region, and then combining the three coherence values across frequency in a calculation based on the speech intelligibility index. The one-dimensional metric accurately predicted the quality judgments of normal-hearing listeners and listeners with mild-to-moderate hearing loss, although some systematic errors were present. A multidimensional analysis indicates that several dimensions are needed to describe the factors used by subjects to judge the effects of the three distortion types.  相似文献   

4.
The Articulation Index and Speech Intelligibility Index predict intelligibility scores from measurements of speech and hearing parameters. One component in the prediction is the frequency-importance function, a weighting function that characterizes contributions of particular spectral regions of speech to speech intelligibility. The purpose of this study was to determine whether such importance functions could similarly characterize contributions of electrode channels in cochlear implant systems. Thirty-eight subjects with normal hearing listened to vowel-consonant-vowel tokens, either as recorded or as output from vocoders that simulated aspects of cochlear implant processing. Importance functions were measured using the method of Whitmal and DeRoy [J. Acoust. Soc. Am. 130, 4032-4043 (2011)], in which signal bandwidths were varied adaptively to produce specified token recognition scores in accordance with the transformed up-down rules of Levitt [J. Acoust. Soc. Am. 49, 467-477 (1971)]. Psychometric functions constructed from recognition scores were subsequently converted into importance functions. Comparisons of the resulting importance functions indicate that vocoder processing causes peak importance regions to shift downward in frequency. This shift is attributed to changes in strategy and capability for detecting voicing in speech, and is consistent with previously measured data.  相似文献   

5.
These experiments examined how high presentation levels influence speech recognition for high- and low-frequency stimuli in noise. Normally hearing (NH) and hearing-impaired (HI) listeners were tested. In Experiment 1, high- and low-frequency bandwidths yielding 70%-correct word recognition in quiet were determined at levels associated with broadband speech at 75 dB SPL. In Experiment 2, broadband and band-limited sentences (based on passbands measured in Experiment 1) were presented at this level in speech-shaped noise filtered to the same frequency bandwidths as targets. Noise levels were adjusted to produce approximately 30%-correct word recognition. Frequency bandwidths and signal-to-noise ratios supporting criterion performance in Experiment 2 were tested at 75, 87.5, and 100 dB SPL in Experiment 3. Performance tended to decrease as levels increased. For NH listeners, this "rollover" effect was greater for high-frequency and broadband materials than for low-frequency stimuli. For HI listeners, the 75- to 87.5-dB increase improved signal audibility for high-frequency stimuli and rollover was not observed. However, the 87.5- to 100-dB increase produced qualitatively similar results for both groups: scores decreased most for high-frequency stimuli and least for low-frequency materials. Predictions of speech intelligibility by quantitative methods such as the Speech Intelligibility Index may be improved if rollover effects are modeled as frequency dependent.  相似文献   

6.
The relative importance of different parts of the auditory spectrum to recognition of the Diagnostic Rhyme Test (DRT) and its six speech feature subtests was determined. Three normal hearing subjects were tested twice in each of 70 experimental conditions. The analytical procedures of French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)] were applied to the data to derive frequency importance functions for each of the DRT subtests and the test as a whole over the frequency range 178-8912 Hz. For the DRT as a whole, the low frequencies were found to be more important than is the case for nonsense syllables. Importance functions for the feature subtests also differed from those for nonsense syllables and from each other as well. These results suggest that test materials loaded with different proportions of particular phonemes have different frequency importance functions. Comparison of the results with those from other studies suggests that importance functions depend to a degree on the available response options as well.  相似文献   

7.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

8.
Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering.  相似文献   

9.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

10.
Listeners' auditory discrimination of vowel sounds depends in part on the order in which stimuli are presented. Such presentation order effects have been argued to be language independent, and to result from psychophysical (not speech- or language-specific) factors such as the decay of memory traces over time or increased weighting of later-occurring stimuli. In the present study, native Cantonese speakers' discrimination of a linguistic tone continuum is shown to exhibit order of presentation effects similar to those shown for vowels in previous studies. When presented with two successive syllables differing in fundamental frequency by approximately 4 Hz, listeners were significantly more sensitive to this difference when the first syllable was higher in frequency than the second. However, American English-speaking listeners with no experience listening to Cantonese showed no such contrast effect when tested in the same manner using the same stimuli. Neither English nor Cantonese listeners showed any order of presentation effects in the discrimination of a nonspeech continuum in which tokens had the same fundamental frequencies as the Cantonese speech tokens but had a qualitatively non-speech-like timbre. These results suggest that tone presentation order effects, unlike vowel effects, may be language specific, possibly resulting from the need to compensate for utterance-related pitch declination when evaluating fundamental frequency for tone identification.  相似文献   

11.
The ability to recognize spoken words interrupted by silence was investigated with young normal-hearing listeners and older listeners with and without hearing impairment. Target words from the revised SPIN test by Bilger et al. [J. Speech Hear. Res. 27(1), 32-48 (1984)] were presented in isolation and in the original sentence context using a range of interruption patterns in which portions of speech were replaced with silence. The number of auditory "glimpses" of speech and the glimpse proportion (total duration glimpsed/word duration) were varied using a subset of the SPIN target words that ranged in duration from 300 to 600 ms. The words were presented in isolation, in the context of low-predictability (LP) sentences, and in high-predictability (HP) sentences. The glimpse proportion was found to have a strong influence on word recognition, with relatively little influence of the number of glimpses, glimpse duration, or glimpse rate. Although older listeners tended to recognize fewer interrupted words, there was considerable overlap in recognition scores across listener groups in all conditions, and all groups were affected by interruption parameters and context in much the same way.  相似文献   

12.
There is a need, both for speech theory and for many practical applications, to know the intelligibilities of individual passbands that span the speech spectrum when they are heard singly and in combination. While indirect procedures have been employed for estimating passband intelligibilities (e.g., the Speech Intelligibility Index), direct measurements have been blocked by the confounding contributions from transition band slopes that accompany filtering. A recent study has reported that slopes of several thousand dBA/octave produced by high-order finite impulse response filtering were required to produce the effectively rectangular bands necessary to eliminate appreciable contributions from transition bands [Warren et al., J. Acoust. Soc. Am. 115, 1292-1295 (2004)]. Using such essentially vertical slopes, the present study employed sentences, and reports the intelligibilities of their six 1-octave contiguous passbands having center frequencies from 0.25 to 8 kHz when heard alone, and for each of their 15 possible pairings.  相似文献   

13.
The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit.  相似文献   

14.
Speech perception in the presence of another competing voice is one of the most challenging tasks for cochlear implant users. Several studies have shown that (1) the fundamental frequency (F0) is a useful cue for segregating competing speech sounds and (2) the F0 is better represented by the temporal fine structure than by the temporal envelope. However, current cochlear implant speech processing algorithms emphasize temporal envelope information and discard the temporal fine structure. In this study, speech recognition was measured as a function of the F0 separation of the target and competing sentence in normal-hearing and cochlear implant listeners. For the normal-hearing listeners, the combined sentences were processed through either a standard implant simulation or a new algorithm which additionally extracts a slowed-down version of the temporal fine structure (called Frequency-Amplitude-Modulation-Encoding). The results showed no benefit of increasing F0 separation for the cochlear implant or simulation groups. In contrast, the new algorithm resulted in gradual improvements with increasing F0 separation, similar to that found with unprocessed sentences. These results emphasize the importance of temporal fine structure for speech perception and demonstrate a potential remedy for difficulty in the perceptual segregation of competing speech sounds.  相似文献   

15.
The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words.  相似文献   

16.
This experiment measured the capability of hearing-impaired individuals to discriminate differences in the cues to the distance of spoken sentences. The stimuli were generated synthetically, using a room-image procedure to calculate the direct sound and first 74 reflections for a source placed in a 7 x 9 m room, and then presenting each of those sounds individually through a circular array of 24 loudspeakers. Seventy-seven listeners participated, aged 22-83 years and with hearing levels from -5 to 59 dB HL. In conditions where a substantial change in overall level due to the inverse-square law was available as a cue, the elderly hearing-impaired listeners did not perform any different from control groups. In other conditions where that cue was unavailable (so leaving the direct-to-reverberant relationship as a cue), either because the reverberant field dominated the direct sound or because the overall level had been artificially equalized, hearing-impaired listeners performed worse than controls. There were significant correlations with listeners' self-reported distance capabilities as measured by the "Speech, Spatial, and Qualities of Hearing" questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)]. The results demonstrate that hearing-impaired listeners show deficits in the ability to use some of the cues which signal auditory distance.  相似文献   

17.
Speech produced in the presence of noise-Lombard speech-is more intelligible in noise than speech produced in quiet, but the origin of this advantage is poorly understood. Some of the benefit appears to arise from auditory factors such as energetic masking release, but a role for linguistic enhancements similar to those exhibited in clear speech is possible. The current study examined the effect of Lombard speech in noise and in quiet for Spanish learners of English. Non-native listeners showed a substantial benefit of Lombard speech in noise, although not quite as large as that displayed by native listeners tested on the same task in an earlier study [Lu and Cooke (2008), J. Acoust. Soc. Am. 124, 3261-3275]. The difference between the two groups is unlikely to be due to energetic masking. However, Lombard speech was less intelligible in quiet for non-native listeners than normal speech. The relatively small difference in Lombard benefit in noise for native and non-native listeners, along with the absence of Lombard benefit in quiet, suggests that any contribution of linguistic enhancements in the Lombard benefit for natives is small.  相似文献   

18.
The speech understanding of persons with sloping high-frequency (HF) hearing impairment (HI) was compared to normal hearing (NH) controls and previous research on persons with "flat" losses [Hornsby and Ricketts (2003). J. Acoust. Soc. Am. 113, 1706-1717] to examine how hearing loss configuration affects the contribution of speech information in various frequency regions. Speech understanding was assessed at multiple low- and high-pass filter cutoff frequencies. Crossover frequencies, defined as the cutoff frequencies at which low- and high-pass filtering yielded equivalent performance, were significantly lower for the sloping HI, compared to NH, group suggesting that HF HI limits the utility of HF speech information. Speech intelligibility index calculations suggest this limited utility was not due simply to reduced audibility but also to the negative effects of high presentation levels and a poorer-than-normal use of speech information in the frequency region with the greatest hearing loss (the HF regions). This deficit was comparable, however, to that seen in low-frequency regions of persons with similar HF thresholds and "flat" hearing losses suggesting that sensorineural HI results in a "uniform," rather than frequency-specific, deficit in speech understanding, at least for persons with HF thresholds up to 60-80 dB HL.  相似文献   

19.
Performance of children aged 9 to 17 years on the SPIN test (Speech Perception in Noise) is described. The 11- and 13-year-olds performed significantly poorer than 15- and 17-year-olds, and this difference occurred primarily for high-predictability sentences presented at a O-dB signal-to-babble ratio. Performance of nine-year-olds was significantly poorer than performance of 11-year-olds. Possible reasons for these differences are discussed.  相似文献   

20.
Three investigations were conducted to determine the application of the articulation index (AI) to the prediction of speech performance of hearing-impaired subjects as well as of normal-hearing listeners. Speech performance was measured in quiet and in the presence of two interfering signals for items from the Speech Perception in Noise test in which target words are either highly predictable from contextual cues in the sentence or essentially contextually neutral. As expected, transfer functions relating the AI to speech performance were different depending on the type of contextual speech material. The AI transfer function for probability-high items rises steeply, much as for sentence materials, while the function for probability-low items rises more slowly, as for monosyllabic words. Different transfer functions were also found for tests conducted in quiet or white noise rather than in a babble background. A majority of the AI predictions for ten individuals with moderate sensorineural loss fell within +/- 2 standard deviations of normal listener performance for both quiet and babble conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号