首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The benefit of supplementing speechreading with frequency-selective sound-pressure information was studied by auditorily presenting this information to normal-hearing listeners. The sound-pressure levels in one or two frequency bands of the speech signal with center frequencies of 500, 1600, and 3160 Hz, respectively, and with 1- or 1/3-oct bandwidth were used to amplitude-modulate pure-tone carriers with frequencies equal to the center frequencies of the filter bands. Short sentences were presented to 18 normal-hearing listeners under the conditions of speechreading-only and speechreading combined with the sound-pressure information. The mean number of correctly perceived syllables increased from 22.8% for speechreading-only to 65.7% when sound-pressure information was supplied in a single 1-oct band at 500 Hz and to 86.7% with two 1-oct bands at 500 and 3160 Hz, respectively. The latter signal scored only 26.7% correct syllables without accompanying visual information.  相似文献   

2.
The benefit of supplementing speechreading with information about the frequencies of the first and second formants from the voiced sections of the speech signal was studied by presenting short sentences to 18 normal-hearing listeners under the following three conditions: (a) speechreading combined with listening to the formant-frequency information, (b) speechreading only, and (c) formant-frequency information only. The formant frequencies were presented either as pure tones or as a complex speechlike signal, obtained by filtering a periodic pulse sequence of 250 Hz by a cascade of four second-order bandpass filters (with constant bandwidth); the center frequencies of two of these filters followed the frequencies of the first and second formants, whereas the frequencies of the others remained constant. The percentage of correctly identified syllables increased from 22.8 in the case of speechreading only to 82.0 in the case of speechreading while listening to the complex speechlike signal. Listening to the formant information only scored 33.2% correct. However, comparison with the best-scoring condition of our previous study [Breeuwer and Plomp, J. Acoust. Soc. Am. 76, 686-691 (1984)] indicates that information about the sound-pressure levels in two one-octave filter bands with center frequencies of 500 and 3160 Hz is a more effective supplement to speechreading than the formant-frequency information.  相似文献   

3.
The ability to combine speechreading (i.e., lipreading) with prosodic information extracted from the low-frequency regions of speech was evaluated with three normally hearing subjects. The subjects were tested in a connected discourse tracking procedure which measures the rate at which spoken text can be repeated back without any errors. Receptive conditions included speechreading alone (SA), speechreading plus amplitude envelope cues (AM), speechreading plus fundamental frequency cues (FM), and speechreading plus intensity-modulated fundamental frequency cues (AM + FM). In a second experiment, one subject was further tested in a speechreading plus voicing duration cue condition (DUR). Speechreading performance was best in the AM + FM condition (83.6 words per minute,) and worst in the SA condition (41.1 words per minute). Tracking levels in the AM, FM, and DUR conditions were 73.7, 73.6, and 65.4 words per minute, respectively. The average tracking rate obtained when subjects were allowed to listen to the talker's normal (unfiltered) speech (NS condition) was 108.3 words per minute. These results demonstrate that speechreaders can use information related to the rhythm, stress, and intonation patterns of speech to improve their speechreading performance.  相似文献   

4.
The main goal of this study was to investigate the efficacy of four vibrotactile speechreading supplements. Three supplements provided single-channel encodings of fundamental frequency (F0). Two encodings involved scaling and shifting glottal pulses to pulse rate ranges suited to tactual sensing capabilities; the third transformed F0 to differential amplitude of two fixed-frequency sinewaves. The fourth supplement added to one of the F0 encodings a second vibrator indicating high-frequency speech energy. A second goal was to develop improved methods for experimental control. Therefore, a sentence corpus was recorded on videodisc using two talkers whose speech was captured by video, microphone, and electroglottograph. Other experimental control issues included use of visual-alone control subjects, a multiple-baseline, single-subject design replicated for each of 15 normal-hearing subjects, sentence and syllable pre- and post-tests balanced for difficulty, and a speechreading screening test for subject selection. Across 17 h of treatment and 5 h of visual-alone baseline testing, each subject performed open-set sentence identification. Covariance analyses showed that the single-channel supplements provided a small but significant benefit, whereas the two-channel supplement was not effective. All subjects improved in visual-alone speechreading and maintained individual differences across the experiment. Vibrotactile benefit did not depend on speechreading ability.  相似文献   

5.
Frequency resolution was evaluated for two normal-hearing and seven hearing-impaired subjects with moderate, flat sensorineural hearing loss by measuring percent correct detection of a 2000-Hz tone as the width of a notch in band-reject noise increased. The level of the tone was fixed for each subject at a criterion performance level in broadband noise. Discrimination of synthetic speech syllables that differed in spectral content in the 2000-Hz region was evaluated as a function of the notch width in the same band-reject noise. Recognition of natural speech consonant/vowel syllables in quiet was also tested; results were analyzed for percent correct performance and relative information transmitted for voicing and place features. In the hearing-impaired subjects, frequency resolution at 2000 Hz was significantly correlated with the discrimination of synthetic speech information in the 2000-Hz region and was not related to the recognition of natural speech nonsense syllables unless (a) the speech stimuli contained the vowel /i/ rather than /a/, and (b) the score reflected information transmitted for place of articulation rather than percent correct.  相似文献   

6.
An adaptive test has been developed to determine the minimum bandwidth of speech that a listener needs to reach 50% intelligibility. Measuring this speech-reception bandwidth threshold (SRBT), in addition to the more common speech-reception threshold (SRT) in noise, may be useful in investigating the factors underlying impaired suprathreshold speech perception. Speech was bandpass filtered (center frequency: 1 kHz) and complementary bandstop filtered noise was added. To obtain reference values, the SRBT was measured in 12 normal-hearing listeners at four sound-pressure levels, in combination with three overall spectral tilts. Plotting SRBT as a function of sound-pressure level resulted in U-shaped curves. The most narrow SRBT (1.4 octave) was obtained at an A-weighted sound-pressure level of 55 dB. The required bandwidth increases with increasing level, probably due to upward spread of masking. At a lower level (40 dBA) listeners also need a broader band, because parts of the speech signal will be below threshold. The SII (Speech Intelligibility Index) model reasonably predicts the data, although it seems to underestimate upward spread of masking.  相似文献   

7.
This research is concerned with the development and evaluation of a tactual display of consonant voicing to supplement the information available through lipreading for persons with profound hearing impairment. The voicing cue selected is based on the envelope onset asynchrony derived from two different filtered bands (a low-pass band and a high-pass band) of speech. The amplitude envelope of each of the two bands was used to modulate a different carrier frequency which in turn was delivered to one of the two fingers of a tactual stimulating device. Perceptual evaluations of speech reception through this tactual display included the pairwise discrimination of consonants contrasting voicing and identification of a set of 16 consonants under conditions of the tactual cue alone (T), lipreading alone (L), and the combined condition (L + T). The tactual display was highly effective for discriminating voicing at the segmental level and provided a substantial benefit to lipreading on the consonant-identification task. No such benefits of the tactual cue were observed, however, for lipreading of words in sentences due perhaps to difficulties in integrating the tactual and visual cues and to insufficient training on the more difficult task of connected-speech reception.  相似文献   

8.
Fundamental frequency (F0) information extracted from low-pass-filtered speech and aurally presented as frequency-modulated sinusoids can greatly improve speechreading performance [Grant et al., J. Acoust. Soc. Am. 77, 671-677 (1985)]. To use this source of information, listeners must be able to detect the presence or absence of F0 (i.e., voicing), discriminate changes in frequency, and make judgments about the linguistic meaning of perceived variations in F0. In the present study, normally hearing and hearing-impaired subjects were required to locate the stressed peak of an intonation contour according to the extent of frequency transition at the primary peak. The results showed that listeners with profound hearing impairments required frequency transitions that were 1.5-6 times greater than those required by normally hearing subjects. These results were consistent with the subjects' identification performance for intonation and stress patterns in natural speech, and suggest that natural variations in F0 may be too small for some impaired listeners to perceive and follow accurately.  相似文献   

9.
Young deaf children using a cochlear implant develop speech abilities on the basis of speech temporal-envelope signals distributed over a limited number of frequency bands. A Headturn Preference Procedure was used to measure looking times in 6-month-old, normal-hearing infants during presentation of repeating or alternating sequences composed of different tokens of /aba/and /apa/ processed to retain envelope information below 64 Hz while degrading temporal fine structure cues. Infants attended longer to the alternating sequences, indicating that they perceive the voicing contrast on the basis of envelope cues alone in the absence of fine spectral and temporal structure information.  相似文献   

10.
We present the results of a large-scale study on speech perception, assessing the number and type of perceptual hypotheses which listeners entertain about possible phoneme sequences in their language. Dutch listeners were asked to identify gated fragments of all 1179 diphones of Dutch, providing a total of 488,520 phoneme categorizations. The results manifest orderly uptake of acoustic information in the signal. Differences across phonemes in the rate at which fully correct recognition was achieved arose as a result of whether or not potential confusions could occur with other phonemes of the language (long with short vowels, affricates with their initial components, etc.). These data can be used to improve models of how acoustic-phonetic information is mapped onto the mental lexicon during speech comprehension.  相似文献   

11.
Vowel and consonant confusion matrices were collected in the hearing alone (H), lipreading alone (L), and hearing plus lipreading (HL) conditions for 28 patients participating in the clinical trial of the multiple-channel cochlear implant. All patients were profound-to-totally deaf and "hearing" refers to the presentation of auditory information via the implant. The average scores were 49% for vowels and 37% for consonants in the H condition and the HL scores were significantly higher than the L scores. Information transmission and multidimensional scaling analyses showed that different speech features were conveyed at different levels in the H and L conditions. In the HL condition, the visual and auditory signals provided independent information sources for each feature. For vowels, the auditory signal was the major source of duration information, while the visual signal was the major source of first and second formant frequency information. The implant provided information about the amplitude envelope of the speech and the estimated frequency of the main spectral peak between 800 and 4000 Hz, which was useful for consonant recognition. A speech processor that coded the estimated frequency and amplitude of an additional peak between 300 and 1000 Hz was shown to increase the vowel and consonant recognition in the H condition by improving the transmission of first formant and voicing information.  相似文献   

12.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

13.
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.  相似文献   

14.
The ability of five profoundly hearing-impaired subjects to "track" connected speech and to make judgments about the intonation and stress in spoken sentences was evaluated under a variety of auditory-visual conditions. These included speechreading alone, speechreading plus speech (low-pass filtered at 4 kHz), and speechreading plus a tone whose frequency, intensity, and temporal characteristics were matched to the speaker's fundamental frequency (F0). In addition, several frequency transfer functions were applied to the normal F0 range resulting in new ranges that were both transposed and expanded with respect to the original F0 range. Three of the five subjects were able to use several of the tonal representations of F0 nearly as well as speech to improve their speechreading rates and to make appropriate judgments concerning sentence intonation and stress. The remaining two subjects greatly improved their identification performance for intonation and stress patterns when expanded F0 signals were presented alone (i.e., without speechreading), but had difficulty integrating visual and auditory information at the connected discourse level, despite intensive training in the connected discourse tracking procedure lasting from 27.8-33.8 h.  相似文献   

15.
Cues to the voicing distinction for final /f,s,v,z/ were assessed for 24 impaired- and 11 normal-hearing listeners. In base-line tests the listeners identified the consonants in recorded /d circumflex C/ syllables. To assess the importance of various cues, tests were conducted of the syllables altered by deletion and/or temporal adjustment of segments containing acoustic patterns related to the voicing distinction for the fricatives. The results showed that decreasing the duration of /circumflex/ preceding /v/ or /z/, and lengthening the /circumflex/ preceding /f/ or /s/, considerably reduced the correctness of voicing perception for the hearing-impaired group, while showing no effect for the normal-hearing group. For the normals, voicing perception deteriorated for /f/ and /s/ when the frications were deleted from the syllables, and for /v/ and /z/ when the vowel offsets were removed from the syllables with duration-adjusted vowels and deleted frications. We conclude that some hearing-impaired listeners rely to a greater extent on vowel duration as a voicing cue than do normal-hearing listeners.  相似文献   

16.
The role of different modulation frequencies in the speech envelope were studied by means of the manipulation of vowel-consonant-vowel (VCV) syllables. The envelope of the signal was extracted from the speech and the fine-structure was replaced by speech-shaped noise. The temporal envelopes in every critical band of the speech signal were notch filtered in order to assess the relative importance of different modulation frequency regions between 0 and 20 Hz. For this purpose notch filters around three center frequencies (8, 12, and 16 Hz) with three different notch widths (4-, 8-, and 12-Hz wide) were used. These stimuli were used in a consonant-recognition task in which ten normal-hearing subjects participated, and their results were analyzed in terms of recognition scores. More qualitative information was obtained with a multidimensional scaling method (INDSCAL) and sequential information analysis (SINFA). Consonant recognition is very robust for the removal of certain modulation frequency areas. Only when a wide notch around 8 Hz is applied does the speech signal become heavily degraded. As expected, the voicing information is lost, while there are different effects on plosiveness and nasality. Even the smallest filtering has a substantial effect on the transfer of the plosiveness feature, while on the other hand, filtering out only the low-modulation frequencies has a substantial effect on the transfer of nasality cues.  相似文献   

17.
Recent studies with adults have suggested that amplification at 4 kHz and above fails to improve speech recognition and may even degrade performance when high-frequency thresholds exceed 50-60 dB HL. This study examined the extent to which high frequencies can provide useful information for fricative perception for normal-hearing and hearing-impaired children and adults. Eighty subjects (20 per group) participated. Nonsense syllables containing the phonemes /s/, /f/, and /O/, produced by a male, female, and child talker, were low-pass filtered at 2, 3, 4, 5, 6, and 9 kHz. Frequency shaping was provided for the hearing-impaired subjects only. Results revealed significant differences in recognition between the four groups of subjects. Specifically, both groups of children performed more poorly than their adult counterparts at similar bandwidths. Likewise, both hearing-impaired groups performed more poorly than their normal-hearing counterparts. In addition, significant talker effects for /s/ were observed. For the male talker, optimum performance was reached at a bandwidth of approximately 4-5 kHz, whereas optimum performance for the female and child talkers did not occur until a bandwidth of 9 kHz.  相似文献   

18.
Gross variations of the speech amplitude envelope, such as the duration of different segments and the gaps between them, carry information about prosody and some segmental features of vowels and consonants. The amplitude envelope is one parameter encoded by the Tickle Talker, an electrotactile speech processor for the hearing impaired which stimulates the digital nerve bundles with a pulsatile electric current. Psychophysical experiments measuring the duration discrimination and identification, gap detection, and integration times for pulsatile electrical stimulation are described and compared with similar auditory measures for normal and impaired hearing and electrical stimulation via a cochlear implant. The tactile duration limen of 15% for a 300-ms standard was similar to auditory measures. Tactile gap detection thresholds of 9 to 20 ms were larger than for normal-hearing but shorter than for some hearing-impaired listeners and cochlear implant users. The electrotactile integration time of about 250 ms was shorter than previously measured tactile values but longer than auditory integration times. The results indicate that the gross amplitude envelope variations should be conveyed well by the Tickle Talker. Short bursts of low amplitude are the features most likely to be poorly perceived.  相似文献   

19.
Identification of multiple-electrode stimulus patterns was evaluated in nine adult subjects, to assess the feasibility of providing additional speech information through the tactual display of an electrotactile speech processor. Absolute identification scores decreased from 97.8% for single electrodes, to 61.9% for electrode pairs, and, to 31.8% for electrode triplets. Although input information increased with paired-and triple-electrode stimuli, information transmission scores were not significantly increased for either electrode pairs (2.99 bits) or triplets (2.84 bits) as compared with single electrodes (2.84 bits). These results suggest that speech coding strategies using stimulus patterns of electrode pairs or triplets would provide little improvement beyond that found for the present single-electrode scheme. However, higher absolute identification scores (73.6%), and an increase in information transmission to 3.88 bits, were recorded for test stimuli containing all combinations of paired and single electrodes. Based on this finding, two stimulus sets using a restricted number of combinations of paired and single electrodes were evaluated. The two stimulus sets simulated the spatial patterns of paired and single electrodes arising from use of alternative speech coding schemes to increase consonant voicing information. Results for the two stimulus sets showed higher electrode identification scores (79.7% and 90.4%), as compared with paired-electrode stimuli. Although electrode identification score was not as high as for single electrodes, information transmission was increased to 3.31 bits for the VF2 stimulus set. Analysis of the responses also showed that scores for identification of simulated voicing information conveyed by the two stimulus sets were 99.4 and 90.4% correct.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

20.
Information transfer analysis [G. A. Miller and P. E. Nicely, J. Acoust. Soc. Am. 27, 338-352 (1955)] is a tool used to measure the extent to which speech features are transmitted to a listener, e.g., duration or formant frequencies for vowels; voicing, place and manner of articulation for consonants. An information transfer of 100% occurs when no confusions arise between phonemes belonging to different feature categories, e.g., between voiced and voiceless consonants. Conversely, an information transfer of 0% occurs when performance is purely random. As asserted by Miller and Nicely, the maximum-likelihood estimate for information transfer is biased to overestimate its true value when the number of stimulus presentations is small. This small-sample bias is examined here for three cases: a model of random performance with pseudorandom data, a data set drawn from Miller and Nicely, and reported data from three studies of speech perception by hearing impaired listeners. The amount of overestimation can be substantial, depending on the number of samples, the size of the confusion matrix analyzed, as well as the manner in which data are partitioned therein.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号