首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Reiterant speech, or nonsense syllable mimicry, has been proposed as a way to study prosody, particularly syllable and word durations, unconfounded by segmental influences. Researchers have shown that segmental influences on durations can be neutralized in reiterant speech. If it is to be a useful tool in the study of prosody, it must also be shown that reiterant speech preserves the suprasegmental duration and intonation differences relevant to perception. In the present study, syllable durations for nonreiterant and reiterant ambiguous sentences were measured to seek evidence of the duration differences which can enable listeners to resolve surface structure ambiguities in nonreiterant speech. These duration patterns were found in both nonreiterant and reiterant speech. A perceptual study tested listeners' perception of these ambiguous sentences as spoken by four "good" speakers--speakers who neutralized intrinsic duration differences and whose sentences were independently rated by skilled listeners as good imitations of normal speech. The listeners were able to choose the correct interpretation when the ambiguous sentences were in reiterant form as well as they did when the sentences were spoken normally. These results support the notion that reiterant speech is like nonreiterant speech in aspects which are important in the study of prosody.  相似文献   

2.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

3.
Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering.  相似文献   

4.
Previous work has established that naturally produced clear speech is more intelligible than conversational speech for adult hearing-impaired listeners and normal-hearing listeners under degraded listening conditions. The major goal of the present study was to investigate the extent to which naturally produced clear speech is an effective intelligibility enhancement strategy for non-native listeners. Thirty-two non-native and 32 native listeners were presented with naturally produced English sentences. Factors that varied were speaking style (conversational versus clear), signal-to-noise ratio (-4 versus -8 dB) and talker (one male versus one female). Results showed that while native listeners derived a substantial benefit from naturally produced clear speech (an improvement of about 16 rau units on a keyword-correct count), non-native listeners exhibited only a small clear speech effect (an improvement of only 5 rau units). This relatively small clear speech effect for non-native listeners is interpreted as a consequence of the fact that clear speech is essentially native-listener oriented, and therefore is only beneficial to listeners with extensive experience with the sound structure of the target language.  相似文献   

5.
The present study examined the effects of short-term perceptual training on normal-hearing listeners' ability to adapt to spectrally altered speech patterns. Using noise-band vocoder processing, acoustic information was spectrally distorted by shifting speech information from one frequency region to another. Six subjects were tested with spectrally shifted sentences after five days of practice with upwardly shifted training sentences. Training with upwardly shifted sentences significantly improved recognition of upwardly shifted speech; recognition of downwardly shifted speech was nearly unchanged. Three subjects were later trained with downwardly shifted speech. Results showed that the mean improvement was comparable to that observed with the upwardly shifted training. In this retrain and retest condition, performance was largely unchanged for upwardly shifted sentence recognition, suggesting that these listeners had retained some of the improved speech perception resulting from the previous training. The results suggest that listeners are able to partially adapt to a spectral shift in acoustic speech patterns over the short-term, given sufficient training. However, the improvement was localized to where the spectral shift was trained, as no change in performance was observed for spectrally altered speech outside of the trained regions.  相似文献   

6.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

7.
Rapid adaptation to foreign-accented English   总被引:1,自引:0,他引:1  
This study explored the perceptual benefits of brief exposure to non-native speech. Native English listeners were exposed to English sentences produced by non-native speakers. Perceptual processing speed was tracked by measuring reaction times to visual probe words following each sentence. Three experiments using Spanish- and Chinese-accented speech indicate that processing speed is initially slower for accented speech than for native speech but that this deficit diminishes within one minute of exposure. Control conditions rule out explanations for the adaptation effect based on practice with the task and general strategies for dealing with difficult speech. Further results suggest that adaptation can occur within as few as two to four sentence-length utterances. The findings emphasize the flexibility of human speech processing and require models of spoken word recognition that can rapidly accommodate significant acoustic-phonetic deviations from native language speech patterns.  相似文献   

8.
Two signal-processing algorithms, derived from those described by Stubbs and Summerfield [R.J. Stubbs and Q. Summerfield, J. Acoust. Soc. Am. 84, 1236-1249 (1988)], were used to separate the voiced speech of two talkers speaking simultaneously, at similar intensities, in a single channel. Both algorithms use fundamental frequency (FO) as the basis for segregation. One attenuates the interfering voice by filtering the cepstrum of the signal. The other is a hybrid algorithm that combines cepstral filtering with the technique of harmonic selection [T.W. Parsons, J. Acoust. Soc. Am. 60, 911-918 (1976)]. The algorithms were evaluated and compared in perceptual experiments involving listeners with normal hearing and listeners with cochlear hearing impairments. In experiment 1 the processing was used to separate voiced sentences spoken on a monotone. Both algorithms gave significant increases in intelligibility to both groups of listeners. The improvements were equivalent to an increase of 3-4 dB in the effective signal-to-noise ratio (SNR). In experiment 2 the processing was used to separate voiced sentences spoken with time-varying intonation. For normal-hearing listeners, cepstral filtering gave a significant increase in intelligibility, while the hybrid algorithm gave an increase that was on the margins of significance (p = 0.06). The improvements were equivalent to an increase of 2-3 dB in the effective SNR. For impaired listeners, no intelligibility improvements were demonstrated with intoned sentences. The decrease in performance for intoned material is attributed to limitations of the algorithms when FO is nonstationary.  相似文献   

9.
When listening to natural speech, listeners are fairly adept at using cues such as pitch, vocal tract length, prosody, and level differences to extract a target speech signal from an interfering speech masker. However, little is known about the cues that listeners might use to segregate synthetic speech signals that retain the intelligibility characteristics of speech but lack many of the features that listeners normally use to segregate competing talkers. In this experiment, intelligibility was measured in a diotic listening task that required the segregation of two simultaneously presented synthetic sentences. Three types of synthetic signals were created: (1) sine-wave speech (SWS); (2) modulated noise-band speech (MNB); and (3) modulated sine-band speech (MSB). The listeners performed worse for all three types of synthetic signals than they did with natural speech signals, particularly at low signal-to-noise ratio (SNR) values. Of the three synthetic signals, the results indicate that SWS signals preserve more of the voice characteristics used for speech segregation than MNB and MSB signals. These findings have implications for cochlear implant users, who rely on signals very similar to MNB speech and thus are likely to have difficulty understanding speech in cocktail-party listening environments.  相似文献   

10.
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.  相似文献   

11.
This research was designed to investigate the effects of selected vocal disguises upon speaker identification by listening. The experiment consisted of 360 pair discriminations presented in a fixed-sequence mode. The listeners were asked to decide whether two sentences were uttered by the same or different speakers and to rate their degree of confidence in each decision. The speakers produced two sentence sets utilizing their normal speaking mode and five selected disguises. One member of each stimulus pair in the listening task was always an undisguised speech sample; the other member was either disguised or undisguised. Two listener groups were trained for the task: a naive group of 24 undergraduate students, and a sophisticated group of three doctoral students and three professors of Speech and Hearing Sciences. Both groups of listeners were able to discriminate speakers with a moderately high degree of accuracy (92% correct) when both members of the stimulus pair were undisguised. The inclusion of a disguised speech sample in the stimulus pair significantly interfered with listener performance (59%--81% correct depending upon the particular disguise). These results present a similar pattern to this authors' previous results utilizing spectrographic speaker-identification tasks (Reich et al., 1976).  相似文献   

12.
Intonation perception of English speech was examined for English- and Chinese-native listeners. F0 contour was manipulated from falling to rising patterns for the final words of three sentences. Listener's task was to identify and discriminate the intonation of each sentence (question versus statement). English and Chinese listeners had significant differences in the identification functions such as the categorical boundary and the slope. In the discrimination functions, Chinese listeners showed greater peakedness than English peers. The cross-linguistic differences in intonation perception were similar to the previous findings in perception of lexical tones, likely due to listeners' language background differences.  相似文献   

13.
The development of an accurate and efficient sonar-target classification system depends upon the identification of a set of signal features which may be used to discriminate important classes of signals. Feature selection can be facilitated through the identification of perceptual features used by human listeners in discriminating relevant sonar echoes. This study was conducted to establish a more reliable means of identifying perceptual features in terms of physical signal parameters as an initial step toward the development of an automatic sonar-target classification system. The results of an experiment involving eight subjects and six sonar echoes are presented. A model of the perceptual structure of these echoes was derived from subject similarity judgments using a multidimensional scaling (MDS) technique. It was found that three perceptual features accounted for the similarity judgments made by the human listeners. Echoes modified along candidate physical dimensions were employed to aid in the identification of perceptual dimensions in terms of physical signal parameters. The three perceptual features could be associated with signal parameters involving the amplitude envelope of the echoes.  相似文献   

14.
This study examined the relationship of speech breathing to other elements of speech production. It was hypothesized that initiating speech from different lung volumes would have an effect on different elements of the acoustic output. It was postulated that effects may be brought about by mechanical interaction as well as a dispersion of effort to mechanically unlinked elements of speech production, such as articulatory behavior. To this end, selected acoustic variables were studied in eight young healthy women who initiated speech from low, typical, and high lung volume levels. The acoustic variables studied were selected because they have been shown to be sensitive indicators of speech production performance. It was found that with increasing lung volume initiation levels, average sound pressure level, average fundamental frequency, and declination rate of fundamental frequency increased. It was also observed that vowel space was significantly smaller during low lung volume initiation levels relative to typical lung volume initiation levels. Vowel space reduction is discussed relative to "gaining down."  相似文献   

15.
The vertical position of the larynx seems to be relevant to voicefunction. As a high vertical larynx position is often seen in hyperfunctional and strained voices, a lowering of a habitually elevated larynx is sometimes a specific goal in clinical voice therapy and different larynx-lowering exercises are used to achieve this goal. Earlier investigations have shown that pitch and to some extent also vocal loudness are relevant to vertical larynx position. In the present investigation, we examine if lung volume affects vertical larynx position. Using a multi-channel electroglottograph, the larynx position was measured in 29 healthy, vocally untrained subjects, who phonated at different lung volumes, pitches, and degrees of vocal loudness. The main results were that high lung volume was clearly associated with a lower larynx position as compared to low lung volume. In addition, vertical larynx position was strongly correlated with pitch. Both of these dependencies were shown to be stronger in males than in females. Our results suggest that lung volume is a factor that is highly relevant to larynx height in untrained subjects.  相似文献   

16.
The goal of this study was to measure the ability of normal-hearing listeners to discriminate formant frequency for vowels in isolation and sentences at three signal levels. Results showed significant elevation in formant thresholds as formant frequency and linguistic context increased. The signal level indicated a rollover effect, especially for F2, in which formant thresholds at 85 dB SPL were lower than thresholds at 70 or 100 dB SPL in both isolated vowels and sentences. This rollover level effect could be due to reduced frequency selectivity and forward/backward masking in sentence at high signal levels for normal-hearing listeners.  相似文献   

17.
Speech intelligibility and localization in a multi-source environment.   总被引:1,自引:0,他引:1  
Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability.  相似文献   

18.
Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.  相似文献   

19.
With as few as 10-20 sentences of exposure, listeners are able to adapt to speech that is highly distorted compared to that which is encountered in everyday conversation. The current study examines the extent to which adaptation to time-compressed speech can be impeded by disrupting the continuity of the exposure sentences, and whether this differs between young and older adult listeners when they are equated for starting accuracy. In separate sessions conducted one week apart, the degree of adaptation was assessed in four exposure conditions, all of which involved exposure to the same number of time-compressed sentences. A continuous exposure condition involved presentation of the time-compressed sentences without interruption. Two alternation conditions alternated time-compressed speech and uncompressed speech by single sentences or groups of four sentences. A fourth condition presented sentences that were separated by a period of silence but no uncompressed speech. For all conditions, neither young nor older adults' overall level of learning was influenced by disruptions to the exposure sentences. In addition, participants' performance showed reliable improvement across the first and subsequent sessions. These results support robust learning mechanisms in speech perception that remain functional throughout the lifespan.  相似文献   

20.
Speech perception in the presence of another competing voice is one of the most challenging tasks for cochlear implant users. Several studies have shown that (1) the fundamental frequency (F0) is a useful cue for segregating competing speech sounds and (2) the F0 is better represented by the temporal fine structure than by the temporal envelope. However, current cochlear implant speech processing algorithms emphasize temporal envelope information and discard the temporal fine structure. In this study, speech recognition was measured as a function of the F0 separation of the target and competing sentence in normal-hearing and cochlear implant listeners. For the normal-hearing listeners, the combined sentences were processed through either a standard implant simulation or a new algorithm which additionally extracts a slowed-down version of the temporal fine structure (called Frequency-Amplitude-Modulation-Encoding). The results showed no benefit of increasing F0 separation for the cochlear implant or simulation groups. In contrast, the new algorithm resulted in gradual improvements with increasing F0 separation, similar to that found with unprocessed sentences. These results emphasize the importance of temporal fine structure for speech perception and demonstrate a potential remedy for difficulty in the perceptual segregation of competing speech sounds.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号