首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recognition of speech stimuli consisting of monosyllabic words, sentences, and nonsense syllables was tested in normal subjects and in a subject with a low-frequency sensorineural hearing loss characterized by an absence of functioning sensory units in the apical region of the cochlea, as determined in a previous experiment [C. W. Turner, E. M. Burns, and D. A. Nelson, J. Acoust. Soc. Am. 73, 966-975 (1983)]. Performance of all subjects was close to 100% correct for all stimuli presented unfiltered at a moderate intensity level. When stimuli were low-pass filtered, performance of the hearing-impaired subject fell below that of the normals, but was still considerably above chance. A further diminution in the impaired subject's recognition of nonsense syllables resulted from the addition of a high-pass masking noise, indicating that his performance in the filtered quiet condition was attributable in large part to the contribution of sensory units in basal and midcochlear regions. Normals' performance was also somewhat decreased by the masker, suggesting that they also may have been extracting some low-frequency speech cues from responses of sensory units located in the base of the cochlea.  相似文献   

2.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

3.
Two signal-processing algorithms, derived from those described by Stubbs and Summerfield [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 84, 1236-1249 (1988)], were used to separate the voiced speech of two talkers speaking simultaneously, at similar intensities, in a single channel. Both algorithms use fundamental frequency (FO) as the basis for segregation. One attenuates the interfering voice by filtering the cepstrum of the signal. The other is a hybrid algorithm that combines cepstral filtering with the technique of harmonic selection [T.W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)]. The algorithms were evaluated and compared in perceptual experiments involving listeners with normal hearing and listeners with cochlear hearing impairments. In experiment 1 the processing was used to separate voiced sentences spoken on a monotone. Both algorithms gave significant increases in intelligibility to both groups of listeners. The improvements were equivalent to an increase of 3-4 dB in the effective signal-to-noise ratio (SNR). In experiment 2 the processing was used to separate voiced sentences spoken with time-varying intonation. For normal-hearing listeners, cepstral filtering gave a significant increase in intelligibility, while the hybrid algorithm gave an increase that was on the margins of significance (p = 0.06). The improvements were equivalent to an increase of 2-3 dB in the effective SNR. For impaired listeners, no intelligibility improvements were demonstrated with intoned sentences. The decrease in performance for intoned material is attributed to limitations of the algorithms when FO is nonstationary.  相似文献   

4.
While a large portion of the variance among listeners in speech recognition is associated with the audibility of components of the speech waveform, it is not possible to predict individual differences in the accuracy of speech processing strictly from the audiogram. This has suggested that some of the variance may be associated with individual differences in spectral or temporal resolving power, or acuity. Psychoacoustic measures of spectral-temporal acuity with nonspeech stimuli have been shown, however, to correlate only weakly (or not at all) with speech processing. In a replication and extension of an earlier study [Watson et al., J. Acoust. Soc. Am. Suppl. 1 71. S73 (1982)] 93 normal-hearing college students were tested on speech perception tasks (nonsense syllables, words, and sentences in a noise background) and on six spectral-temporal discrimination tasks using simple and complex nonspeech sounds. Factor analysis showed that the abilities that explain performance on the nonspeech tasks are quite distinct from those that account for performance on the speech tasks. Performance was significantly correlated among speech tasks and among nonspeech tasks. Either, (a) auditory spectral-temporal acuity for nonspeech sounds is orthogonal to speech processing abilities, or (b) the appropriate tasks or types of nonspeech stimuli that challenge the abilities required for speech recognition have yet to be identified.  相似文献   

5.
Spectral weighting strategies using a correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] were measured in ten listeners with sensorineural-hearing loss on a sentence recognition task. Sentences and a spectrally matched noise were filtered into five separate adjacent spectral bands and presented to listeners at various signal-to-noise ratios (SNRs). Five point-biserial correlations were computed between the listeners' response (correct or incorrect) on the task and the SNR in each band. The stronger the correlation between performance and SNR, the greater that given band was weighted by the listener. Listeners were tested with and without hearing aids on. All listeners were experienced hearing aid users. Results indicated that the highest spectral band (approximately 2800-11 000 Hz) received the greatest weight in both listening conditions. However, the weight on the highest spectral band was less when listeners performed the task with their hearing aids on in comparison to when listening without hearing aids. No direct relationship was observed between the listeners' weights and the sensation level within a given band.  相似文献   

6.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

7.
Articulation index (AI) theory was used to evaluate stop-consonant recognition of normal-hearing listeners and listeners with high-frequency hearing loss. From results reported in a companion article [Dubno et al., J. Acoust. Soc. Am. 85, 347-354 (1989)], a transfer function relating the AI to stop-consonant recognition was established, and a frequency importance function was determined for the nine stop-consonant-vowel syllables used as test stimuli. The calculations included the rms and peak levels of the speech that had been measured in 1/3 octave bands; the internal noise was estimated from the thresholds for each subject. The AI model was then used to predict performance for the hearing-impaired listeners. A majority of the AI predictions for the hearing-impaired subjects fell within +/- 2 standard deviations of the normal-hearing listeners' results. However, as observed in previous data, the AI tended to overestimate performance of the hearing-impaired listeners. The accuracy of the predictions decreased with the magnitude of high-frequency hearing loss. Thus, with the exception of performance for listeners with severe high-frequency hearing loss, the results suggest that poorer speech recognition among hearing-impaired listeners results from reduced audibility within critical spectral regions of the speech stimuli.  相似文献   

8.
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.  相似文献   

9.
The corruption of intonation contours has detrimental effects on sentence-based speech recognition in normal-hearing listeners Binns and Culling [(2007). J. Acoust. Soc. Am. 122, 1765-1776]. This paper examines whether this finding also applies to cochlear implant (CI) recipients. The subjects' F0-discrimination and speech perception in the presence of noise were measured, using sentences with regular and inverted F0-contours. The results revealed that speech recognition for regular contours was significantly better than for inverted contours. This difference was related to the subjects' F0-discrimination providing further evidence that the perception of intonation patterns is important for the CI-mediated speech recognition in noise.  相似文献   

10.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

11.
Lauter [J. Acoust. Soc. Am. 71, 701-707 (1982)] reported that although the magnitude and direction of the absolute ear advantage for speech and nonspeech sound sets presented dichotically varies considerably among listeners, consistent patterns of a relative ear advantage (EArel) across sound sets are preserved from listener to listener. She further claimed that EArel appeared to be related to the duration of elements that composed a sequence. The existence of EArel is investigated for four sound sets: CV nonsense syllables and pitch patterns that were composed of 50-, 80-, or 120-ms tones. The paradigm was target monitoring, a Yes/No task in which listeners attended to only one ear and listened for the presence of a target signal. The results failed to confirm that listeners have a consistent relative ear advantage related to element duration for nonspeech sound sets.  相似文献   

12.
Two recent accounts of the acoustic cues which specify place of articulation in syllable-initial stop consonants claim that they are located in the initial portions of the CV waveform and are context-free. Stevens and Blumstein [J. Acoust. Soc. Am. 64, 1358-1368 (1978)] have described the perceptually relevant spectral properties of these cues as static, while Kewley-Port [J. Acoust. Soc. Am. 73, 322-335 (1983)] describes these cues as dynamic. Three perceptual experiments were conducted to test predictions derived from these accounts. Experiment 1 confirmed that acoustic cues for place of articulation are located in the initial 20-40 ms of natural stop-vowel syllables. Next, short synthetic CV's modeled after natural syllables were generated using either a digital, parallel-resonance synthesizer in experiment 2 or linear prediction synthesis in experiment 3. One set of synthetic stimuli preserved the static spectral properties proposed by Stevens and Blumstein. Another set of synthetic stimuli preserved the dynamic properties suggested by Kewley-Port. Listeners in both experiments identified place of articulation significantly better from stimuli which preserved dynamic acoustic properties than from those based on static onset spectra. Evidently, the dynamic structure of the initial stop-vowel articulatory gesture can be preserved in context-free acoustic cues which listeners use to identify place of articulation.  相似文献   

13.
This study complements earlier experiments on the perception of the [m]-[n] distinction in CV syllables [B. H. Repp, J. Acoust. Soc. Am. 79, 1987-1999 (1986); B. H. Repp, J. Acoust. Soc. Am. 82, 1525-1538 (1987)]. Six talkers produced VC syllables consisting of [m] or [n] preceded by [i, a, u]. In listening experiments, these syllables were truncated from the beginning and/or from the end, or waveform portions surrounding the point of closure were replaced with noise, so as to map out the distribution of the place of articulation information for consonant perception. These manipulations revealed that the vocalic formant transitions alone conveyed about as much place of articulation information as did the nasal murmur alone, and both signal portions were about as informative in VC as in CV syllables. Nevertheless, full VC syllables were less accurately identified than full CV syllables, especially in female speech. The reason for this was hypothesized to be the relative absence of a salient spectral change between the vowel and the murmur in VC syllables. This hypothesis was supported by the relative ineffectiveness of two additional manipulations meant to disrupt the perception of relational spectral information (channel separation or temporal separation of vowel and murmur) and by subjects' poor identification scores for brief excerpts including the point of maximal spectral change. While, in CV syllables, the abrupt spectral change from the murmur to the vowel provides important additional place of articulation information, for VC syllables it seems as if the format transitions in the vowel and the murmur spectrum functioned as independent cues.  相似文献   

14.
The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing.  相似文献   

15.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

16.
Because they consist, in large part, of random turbulent noise, fricatives present a challenge to attempts to specify the phonetic correlates of phonological features. Previous research has focused on temporal properties, acoustic power, and a variety of spectral properties of fricatives in a number of contexts [Jongman et al., J. Acoust. Soc. Am. 108, 1252-1263 (2000); Jesus and Shadle, J. Phonet. 30, 437-467 (2002); Crystal and House, J. Acoust. Soc. Am. 83, 1553-1573 (1988a)]. However, no systematic investigation of the effects of focus and prosodic context on fricative production has been carried out. Manipulation of explicit focus can serve to selectively exaggerate linguistically relevant properties of speech in much the same manner as stress [de Jong, J. Acoust. Soc. Am. 97, 491-504 (1995); de Jong, J. Phonet. 32, 493-516 (2004); de Jong and Zawaydeh, J. Phonet. 30, 53-75 (2002)]. This experimental technique was exploited to investigate acoustic power along with temporal and spectral characteristics of American English fricatives in two prosodic contexts, to probe whether native speakers selectively attend to subsegmental features, and to consider variability in fricative production across speakers. While focus in general increased noise power and duration, speakers did not selectively enhance spectral features of the target fricatives.  相似文献   

17.
In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986)], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also.  相似文献   

18.
The relative importance of different parts of the auditory spectrum to recognition of the Diagnostic Rhyme Test (DRT) and its six speech feature subtests was determined. Three normal hearing subjects were tested twice in each of 70 experimental conditions. The analytical procedures of French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)] were applied to the data to derive frequency importance functions for each of the DRT subtests and the test as a whole over the frequency range 178-8912 Hz. For the DRT as a whole, the low frequencies were found to be more important than is the case for nonsense syllables. Importance functions for the feature subtests also differed from those for nonsense syllables and from each other as well. These results suggest that test materials loaded with different proportions of particular phonemes have different frequency importance functions. Comparison of the results with those from other studies suggests that importance functions depend to a degree on the available response options as well.  相似文献   

19.
This study examined the effects of age and hearing loss on short-term adaptation to accented speech. Data from younger and older listeners in a prior investigation [Gordon-Salant et al. (2010). J. Acoust. Soc. Am. 128, 444-455] were re-analyzed to examine changes in recognition over four administrations of equivalent lists of English stimuli recorded by native speakers of Spanish and English. Results showed improvement in recognition scores over four list administrations for the accented stimuli but not for the native English stimuli. Group effects emerged but were not involved in any interactions, suggesting that short-term adaptation to accented speech is preserved with aging and with hearing loss.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号