共查询到20条相似文献,搜索用时 15 毫秒
1.
A K Náb?lek Z Czyzewski L A Krishnan 《The Journal of the Acoustical Society of America》1992,92(3):1228-1246
Vowel identification was tested in quiet, noise, and reverberation with 20 normal-hearing subjects and 20 hearing-impaired subjects. Stimuli were 15 English vowels spoken in a /b-t/context by six male talkers. Each talker produced five tokens of each vowel. In quiet, all stimuli were identified by two judges as the intended targets. The stimuli were degraded by reverberation or speech-spectrum noise. Vowel identification scores depended upon talker, listening condition, and subject type. The relationship between identification errors and spectral details of the vowels is discussed. 相似文献
2.
The purpose of this study was to determine the role of static, dynamic, and integrated cues for perception in three adult age groups, and to determine whether age has an effect on both consonant and vowel perception, as predicted by the "age-related deficit hypothesis." Eight adult subjects in each of the age ranges of young (ages 20-26), middle aged (ages 52-59), and old (ages 70-76) listened to synthesized syllables composed of combinations of [b d g] and [i u a]. The synthesis parameters included manipulations of the following stimulus variables: formant transition (moving or straight), noise burst (present or absent), and voicing duration (10, 30, or 46 ms). Vowel perception was high across all conditions and there were no significant differences among age groups. Consonant identification showed a definite effect of age. Young and middle-aged adults were significantly better than older adults at identifying consonants from secondary cues only. Older adults relied on the integration of static and dynamic cues to a greater extent than younger and middle-aged listeners for identification of place of articulation of stop consonants. Duration facilitated correct stop-consonant identification in the young and middle-aged groups for the no-burst conditions, but not in the old group. These findings for the duration of stop-consonant transitions indicate reductions in processing speed with age. In general, the results did not support the age-related deficit hypothesis for adult identification of vowels and consonants from dynamic spectral cues. 相似文献
3.
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing. 相似文献
4.
Minimum spectral contrast for vowel identification by normal-hearing and hearing-impaired listeners 总被引:1,自引:0,他引:1
M R Leek M F Dorman Q Summerfield 《The Journal of the Acoustical Society of America》1987,81(1):148-154
To determine the minimum difference in amplitude between spectral peaks and troughs sufficient for vowel identification by normal-hearing and hearing-impaired listeners, four vowel-like complex sounds were created by summing the first 30 harmonics of a 100-Hz tone. The amplitudes of all harmonics were equal, except for two consecutive harmonics located at each of three "formant" locations. The amplitudes of these harmonics were equal and ranged from 1-8 dB more than the remaining components. Normal-hearing listeners achieved greater than 75% accuracy when peak-to-trough differences were 1-2 dB. Normal-hearing listeners who were tested in a noise background sufficient to raise their thresholds to the level of a flat, moderate hearing loss needed a 4-dB difference for identification. Listeners with a moderate, flat hearing loss required a 6- to 7-dB difference for identification. The results suggest, for normal-hearing listeners, that the peak-to-trough amplitude difference required for identification of this set of vowels is very near the threshold for detection of a change in the amplitude spectrum of a complex signal. Hearing-impaired listeners may have difficulty using closely spaced formants for vowel identification due to abnormal smoothing of the internal representation of the spectrum by broadened auditory filters. 相似文献
5.
Static, dynamic, and relational properties in vowel perception 总被引:2,自引:0,他引:2
T M Nearey 《The Journal of the Acoustical Society of America》1989,85(5):2088-2113
The present work reviews theories and empirical findings, including results from two new experiments, that bear on the perception of English vowels, with an emphasis on the comparison of data analytic "machine recognition" approaches with results from speech perception experiments. Two major sources of variability (viz., speaker differences and consonantal context effects) are addressed from the classical perspective of overlap between vowel categories in F1 x F2 space. Various approaches to the reduction of this overlap are evaluated. Two types of speaker normalization are considered. "Intrinsic" methods based on relationships among the steady-state properties (F0, F1, F2, and F3) within individual vowel tokens are contrasted with "extrinsic" methods, involving the relationships among the formant frequencies of the entire vowel system of a single speaker. Evidence from a new experiment supports Ainsworth's (1975) conclusion [W. Ainsworth, Auditory Analysis and Perception of Speech (Academic, London, 1975)] that both types of information have a role to play in perception. The effects of consonantal context on formant overlap are also considered. A new experiment is presented that extends Lindblom and Studdert-Kennedy's finding [B. Lindblom and M. Studdert-Kennedy, J. Acoust. Soc. Am. 43, 840-843 (1967)] of perceptual effects of consonantal context on vowel perception to /dVd/ and /bVb/ contexts. Finally, the role of vowel-inherent dynamic properties, including duration and diphthongization, is briefly reviewed. All of the above factors are shown to have reliable influences on vowel perception, although the relative weight of such effects and the circumstances that alter these weights remain far from clear. It is suggested that the design of more complex perceptual experiments, together with the development of quantitative pattern recognition models of human vowel perception, will be necessary to resolve these issues. 相似文献
6.
Gick B Jóhannsdóttir KM Gibraiel D Mühlbauer J 《The Journal of the Acoustical Society of America》2008,123(4):EL72-EL76
A single pool of untrained subjects was tested for interactions across two bimodal perception conditions: audio-tactile, in which subjects heard and felt speech, and visual-tactile, in which subjects saw and felt speech. Identifications of English obstruent consonants were compared in bimodal and no-tactile baseline conditions. Results indicate that tactile information enhances speech perception by about 10 percent, regardless of which other mode (auditory or visual) is active. However, within-subject analysis indicates that individual subjects who benefit more from tactile information in one cross-modal condition tend to benefit less from tactile information in the other. 相似文献
7.
This study considered consequences of sensorineural hearing loss in ten listeners. The characterization of individual hearing loss was based on psychoacoustic data addressing audiometric pure-tone sensitivity, cochlear compression, frequency selectivity, temporal resolution, and intensity discrimination. In the experiments it was found that listeners with comparable audiograms can show very different results in the supra-threshold measures. In an attempt to account for the observed individual data, a model of auditory signal processing and perception [Jepsen et al., J. Acoust. Soc. Am. 124, 422-438 (2008)] was used as a framework. The parameters of the cochlear processing stage of the model were adjusted to account for behaviorally estimated individual basilar-membrane input-output functions and the audiogram, from which the amounts of inner hair-cell and outer hair-cell losses were estimated as a function of frequency. All other model parameters were left unchanged. The predictions showed a reasonably good agreement with the measured individual data in the frequency selectivity and forward masking conditions while the variation of intensity discrimination thresholds across listeners was underestimated by the model. The model and the associated parameters for individual hearing-impaired listeners might be useful for investigating effects of individual hearing impairment in more complex conditions, such as speech intelligibility in noise. 相似文献
8.
To provide a perceptual framework for the objective evaluation of durational rules in speech synthesis, two experiments were conducted to investigate the differences between vowel (V) onsets and V-offsets in their functions of marking the perceived temporal structure of speech. The first experiment measured the detectability of temporal modifications given in four-mora (CVCVCVCV) Japanese words. In the V-onset condition, the inter-onset intervals of vowels were uniformly changed (either expanded or reduced) while their inter-offset intervals were preserved. In the V-offset condition, this was reversed. These manipulations did not change the duration of the entire word. Each of the modified words was paired with its unmodified counterpart, and the pair was given to listeners, who were asked to rate the difference between the paired words. The results show that there were no significant differences in the listeners' abilities to detect the temporal modification between the V-onset and V-offset conditions. In the second experiment, the listeners were asked to estimate the differences they perceived in speaking rates for the same stimulus set as that of the first experiment. Interestingly, the results show a clear difference in the listeners' performance between the V-onset and V-offset conditions. Specifically, changing the V-onset intervals changed the perceived speaking rates, which showed a linear relation (r = -0.9) despite the fact that the duration of the entire word remained unchanged. In contrast, modifying the V-offset intervals produced no clear relation with the perceived speaking rates. The second experiment also showed that the listeners performed well in speaking rate discrimination (3.5%-5% in the change ratio). These results are discussed in relation to the differences in the listeners' temporal processing range (local or global) between the two experiments. 相似文献
9.
R A Fox 《The Journal of the Acoustical Society of America》1985,77(4):1552-1559
Selective adaption and anchoring effects in speech perception have generated several different hypotheses regarding the nature of contextual contrast, including auditory/phonetic feature detector fatigue, response bias, and auditory contrast. In the present study three different seven-step [hId]-[h epsilon d] continua were constructed to represent a low F0 (long vocal tract source), a high F0 (long vocal tract source), and a high F0 (short vocal tract source), respectively. Subjects identified the tokens from each of the stimulus continua under two conditions: an equiprobable control and an anchoring condition which included an endpoint stimulus from one of the three continua occurring at least three times more often than any other single stimulus. Differential contrast effects were found depending on whether the anchor differed from the test stimuli in terms of F0, absolute formant frequencies, or both. Results were inconsistent with both the feature detector fatigue and response bias hypothesis. Rather, the obtained data suggest that vowel contrast occurs on the basis of normalized formant values, thus supporting a version of the auditory-contrast theory. 相似文献
10.
S M Gordon-Salant F L Wightman 《The Journal of the Acoustical Society of America》1983,73(5):1756-1765
A triadic comparisons task and an identification task were used to evaluate normally hearing listeners' and hearing-impaired listeners' perceptions of synthetic CV stimuli in the presence of competition. The competing signals included multitalker babble, continuous speech spectrum noise, a CV masker, and a brief noise masker shaped to resemble the onset spectrum of the CV masker. All signals and maskers were presented monotically. Interference by competition was assessed by comparing Multidimensional Scaling solutions derived from each masking condition to that derived from the baseline (quiet) condition. Analysis of the effects of continuous maskers revealed that multitalker babble and continuous noise caused the same amount of change in performance, as compared to the baseline condition, for all listeners. CV masking changed performance significantly more than did brief noise masking, and the hearing-impaired listeners experienced more degradation in performance than normals. Finally, the velar CV maskers (g epsilon and k epsilon) caused significantly greater masking effects than the bilabial CV maskers (b epsilon and p epsilon), and were most resistant to masking by other competing stimuli. The results suggest that speech intelligibility difficulties in the presence of competing segments of speech are primarily attributable to phonetic interference rather than to spectral masking. Individual differences in hearing-impaired listeners' performances are also discussed. 相似文献
11.
Frequency discrimination of complex signals, frequency selectivity, and speech perception in hearing-impaired subjects 总被引:1,自引:0,他引:1
J W Horst 《The Journal of the Acoustical Society of America》1987,82(3):874-885
Frequency discrimination of spectral envelopes of complex stimuli, frequency selectivity measured with psychophysical tuning curves, and speech perception were determined in hearing-impaired subjects each having a relatively flat, sensory-neural loss. Both the frequency discrimination and speech perception measures were obtained in quiet and noise. Most of these subjects showed abnormal susceptibility to ambient noise with regard to speech perception. Frequency discrimination in quiet and frequency selectivity did not correlate significantly. At low signal-to-noise ratios, frequency discrimination correlated significantly with frequency selectivity. Speech perception in noise correlated significantly with frequency selectivity and with frequency discrimination at low signal-to-noise ratios. The frequency discrimination data are discussed in terms of an excitation-pattern model. However, they neither support nor refute the model. 相似文献
12.
P G Stelmachowicz W Jesteadt M P Gorga J Mott 《The Journal of the Acoustical Society of America》1985,77(2):620-627
Performance-intensity functions for monosyllabic words were obtained as a function of signal-to-noise ratio for broadband and low-pass filtered noise. Subjects were 11 normal-hearing listeners and 13 hearing-impaired listeners with flat, moderate sensorineural hearing losses and good speech-discrimination ability (at least 86%) in quiet. In the broadband-noise condition, only small differences in speech perception were noted between the two groups. In low-pass noise, however, large differences in performance were observed. These findings were correlated with various aspects of psychophysical tuning curves (PTCs) obtained from the same individuals. Results of a multivariate analysis suggest that performance in broadband noise is correlated with filter bandwidth (Q10), while performance in low-pass noise is correlated with changes on the low-frequency side of the PTC. 相似文献
13.
Several experiments have found that changing the intrinsic f0 of a vowel can have an effect on perceived vowel quality. It has been suggested that these shifts may occur because f0 is involved in the specification of vowel quality in the same way as the formant frequencies. Another possibility is that f0 affects vowel quality indirectly, by changing a listener's assumptions about characteristics of a speaker who is likely to have uttered the vowel. In the experiment outlined here, participants were asked to listen to vowels differing in terms of f0 and their formant frequencies and report vowel quality and the apparent speaker's gender and size on a trial-by-trial basis. The results presented here suggest that f0 affects vowel quality mainly indirectly via its effects on the apparent-speaker characteristics; however, f0 may also have some residual direct effects on vowel quality. Furthermore, the formant frequencies were also found to have significant indirect effects on vowel quality by way of their strong influence on the apparent speaker. 相似文献
14.
P G Stelmachowicz A L Pittman B M Hoover D E Lewis 《The Journal of the Acoustical Society of America》2001,110(4):2183-2190
Recent studies with adults have suggested that amplification at 4 kHz and above fails to improve speech recognition and may even degrade performance when high-frequency thresholds exceed 50-60 dB HL. This study examined the extent to which high frequencies can provide useful information for fricative perception for normal-hearing and hearing-impaired children and adults. Eighty subjects (20 per group) participated. Nonsense syllables containing the phonemes /s/, /f/, and /O/, produced by a male, female, and child talker, were low-pass filtered at 2, 3, 4, 5, 6, and 9 kHz. Frequency shaping was provided for the hearing-impaired subjects only. Results revealed significant differences in recognition between the four groups of subjects. Specifically, both groups of children performed more poorly than their adult counterparts at similar bandwidths. Likewise, both hearing-impaired groups performed more poorly than their normal-hearing counterparts. In addition, significant talker effects for /s/ were observed. For the male talker, optimum performance was reached at a bandwidth of approximately 4-5 kHz, whereas optimum performance for the female and child talkers did not occur until a bandwidth of 9 kHz. 相似文献
15.
Akeroyd MA Gatehouse S Blaschke J 《The Journal of the Acoustical Society of America》2007,121(2):1077-1089
This experiment measured the capability of hearing-impaired individuals to discriminate differences in the cues to the distance of spoken sentences. The stimuli were generated synthetically, using a room-image procedure to calculate the direct sound and first 74 reflections for a source placed in a 7 x 9 m room, and then presenting each of those sounds individually through a circular array of 24 loudspeakers. Seventy-seven listeners participated, aged 22-83 years and with hearing levels from -5 to 59 dB HL. In conditions where a substantial change in overall level due to the inverse-square law was available as a cue, the elderly hearing-impaired listeners did not perform any different from control groups. In other conditions where that cue was unavailable (so leaving the direct-to-reverberant relationship as a cue), either because the reverberant field dominated the direct sound or because the overall level had been artificially equalized, hearing-impaired listeners performed worse than controls. There were significant correlations with listeners' self-reported distance capabilities as measured by the "Speech, Spatial, and Qualities of Hearing" questionnaire [S. Gatehouse and W. Noble, Int. J. Audiol. 43, 85-99 (2004)]. The results demonstrate that hearing-impaired listeners show deficits in the ability to use some of the cues which signal auditory distance. 相似文献
16.
Grant KW Tufts JB Greenberg S 《The Journal of the Acoustical Society of America》2007,121(2):1164-1176
In face-to-face speech communication, the listener extracts and integrates information from the acoustic and optic speech signals. Integration occurs within the auditory modality (i.e., across the acoustic frequency spectrum) and across sensory modalities (i.e., across the acoustic and optic signals). The difficulties experienced by some hearing-impaired listeners in understanding speech could be attributed to losses in the extraction of speech information, the integration of speech cues, or both. The present study evaluated the ability of normal-hearing and hearing-impaired listeners to integrate speech information within and across sensory modalities in order to determine the degree to which integration efficiency may be a factor in the performance of hearing-impaired listeners. Auditory-visual nonsense syllables consisting of eighteen medial consonants surrounded by the vowel [a] were processed into four nonoverlapping acoustic filter bands between 300 and 6000 Hz. A variety of one, two, three, and four filter-band combinations were presented for identification in auditory-only and auditory-visual conditions: A visual-only condition was also included. Integration efficiency was evaluated using a model of optimal integration. Results showed that normal-hearing and hearing-impaired listeners integrated information across the auditory and visual sensory modalities with a high degree of efficiency, independent of differences in auditory capabilities. However, across-frequency integration for auditory-only input was less efficient for hearing-impaired listeners. These individuals exhibited particular difficulty extracting information from the highest frequency band (4762-6000 Hz) when speech information was presented concurrently in the next lower-frequency band (1890-2381 Hz). Results suggest that integration of speech information within the auditory modality, but not across auditory and visual modalities, affects speech understanding in hearing-impaired listeners. 相似文献
17.
18.
19.
This study investigated changes in vowel production and perception among university students from the north of England, as individuals adapt their accent from regional to educated norms. Subjects were tested in their production and perception at regular intervals over a period of 2 years: before beginning university, 3 months later, and at the end of their first and second years at university. At each testing session, subjects were recorded reading a set of experimental words and a short passage. Subjects also completed two perceptual tasks; they chose best exemplar locations for vowels embedded in either northern or southern English accented carrier sentences and identified words in noise spoken with either a northern or southern English accent. The results demonstrated that subjects at a late stage in their language development, early adulthood, changed their spoken accent after attending university. There were no reliable changes in perception over time, but there was evidence for a between-subjects link between production and perception; subjects chose similar vowels to the ones they produced, and subjects who had a more southern English accent were better at identifying southern English speech in noise. 相似文献
20.
R S Cowan P J Blamey K L Galvin J Z Sarant J I Alcántara G M Clark 《The Journal of the Acoustical Society of America》1990,88(3):1374-1384
Fourteen prelinguistically profoundly hearing-impaired children were fitted with the multichannel electrotactile speech processor (Tickle Talker) developed by Cochlear Pty. Ltd. and the University of Melbourne. Each child participated in an ongoing training and evaluation program, which included measures of speech perception and production. Results of speech perception testing demonstrate clear benefits for children fitted with the device. Thresholds for detection of pure tones were lower for the Tickle Talker than for hearing aids across the frequency range 250-4000 Hz, with the greatest tactual advantage in the high-frequency consonant range (above 2000 Hz). Individual and mean speech detection thresholds for the Ling 5-sound test confirmed that speech sounds were detected by the electrotactile device at levels consistent with normal conversational speech. Results for three speech feature tests showed significant improvement when the Tickle Talker was used in combination with hearing aids (TA) as compared with hearing aids along (A). Mean scores in the TA condition increased by 11% for vowel duration, 20% for vowel formant, and 25% for consonant manner as compared with hearing aids alone. Mean TA score on a closed-set word test (WIPI) was 48%, as compared with 32% for hearing aids alone. Similarly, mean WIPI score for the combination of Tickle Talker, lipreading, and hearing aids (TLA) increased by 6% as compared with combined lipreading and hearing aid (LA) scores. Mean scores on open-set sentences (BKB) showed a significant increase of 21% for the tactually aided condition (TLA) as compared with unaided (LA). These results indicate that, given sufficient training, children can utilize speech feature information provided through the Tickle Talker to improve discrimination of words and sentences. These results indicate that, given sufficient training, children can utilize speech feature information provided through the Tickle Talker to improve discrimination of words and sentences. These results are consistent with improvement in speech discrimination previously reported for normally hearing and hearing-impaired adults using the device. Anecdotal evidence also indicates some improvements in speech production for children fitted with the Tickle Talker. 相似文献