首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Systems designed to recognize continuous speech must be able to adapt to many types of acoustic variation, including variations in stress. A speaker-dependent recognition study was conducted on a group of stressed and destressed syllables. These syllables, some containing the short vowel /I/ and others the long vowel /ae/, were excised from continuous speech and transformed into arrays of cepstral coefficients at two levels of precision. From these data, four types of template dictionaries varying in size and stress composition were formed by a time-warping procedure. Recognition performance data were gathered from listeners and from a computer recognition algorithm that also employed warping. It was found that for a significant portion of the data base, stressed and destressed versions of the same syllable are sufficiently different from one another as to justify the use of separate dictionary templates. Second, destressed syllables exhibit roughly the same acoustic variance as their stressed counterparts. Third, long vowels tend to be involved in proportionally fewer cross-vowel errors but tend to diminish the warping algorithm's ability to discriminate consonantal information. Finally, the pattern of consonant errors that listeners make as a function of vowel length shows significant differences from that produced by the computer.  相似文献   

2.
Acoustic measurements were conducted to determine the degree to which vowel duration, closure duration, and their ratio distinguish voicing of word-final stop consonants across variations in sentential and phonetic environments. Subjects read CVC test words containing three different vowels and ending in stops of three different places of articulation. The test words were produced either in nonphrase-final or phrase-final position and in several local phonetic environments within each of these sentence positions. Our measurements revealed that vowel duration most consistently distinguished voicing categories for the test words. Closure duration failed to consistently distinguish voicing categories across the contextual variables manipulated, as did the ratio of closure and vowel duration. Our results suggest that vowel duration is the most reliable correlate of voicing for word-final stops in connected speech.  相似文献   

3.
Vowel durations typically vary according to both intrinsic (segment-specific) and extrinsic (contextual) specifications. It can be argued that such variations are due to both predisposition and cognitive learning. The present report utilizes acoustic phonetic measurements from Swedish and American children aged 24 and 30 months to investigate the hypothesis that default behaviors may precede language-specific learning effects. The predicted pattern is the presence of final consonant voicing effects in both languages as a default, and subsequent learning of intrinsic effects most notably in the Swedish children. The data, from 443 monosyllabic tokens containing high-front vowels and final stop consonants, are analyzed in statistical frameworks at group and individual levels. The results confirm that Swedish children show an early tendency to vary vowel durations according to final consonant voicing, followed only six months later by a stage at which the intrinsic influence of vowel identity grows relatively more robust. Measures of vowel formant structure from selected 30-month-old children also revealed a tendency for children of this age to focus on particular acoustic contrasts. In conclusion, the results indicate that early acquisition of vowel specifications involves an interaction between language-specific features and articulatory predispositions associated with phonetic context.  相似文献   

4.
The powerful techniques of covariance structure modeling (CSM) long have been used to study complex behavioral phenomenon in the social and behavioral sciences. This study employed these same techniques to examine simultaneous effects on vowel duration in American English. Additionally, this study investigated whether a single population model of vowel duration fits observed data better than a dual population model where separate parameters are generated for syllables that carry large information loads and for syllables that specify linguistic relationships. For the single population model, intrinsic duration, phrase final position, lexical stress, post-vocalic consonant voicing, and position in word all were significant predictors of vowel duration. However, the dual population model, in which separate model parameters were generated for (1) monosyllabic content words and lexically stressed syllables and (2) monosyllabic function words and lexically unstressed syllables, fit the data better than the single population model. Intrinsic duration and phrase final position affected duration similarly for both the populations. On the other hand, the effects of post-vocalic consonant voicing and position in word, while significant predictors of vowel duration in content words and stressed syllables, were not significant predictors of vowel duration in function words or unstressed syllables. These results are not unexpected, based on previous research, and suggest that covariance structure analysis can be used as a complementary technique in linguistic and phonetic research.  相似文献   

5.
Three groups of nine 5-11-month-old infants provided evidence of discrimination of speechlike stimuli differing only in vowel duration. Ease of discrimination was directly related to the magnitude of the ratio of the longer to shorter vowel. Group one infants discriminated three vowel duration contrasts (with ratios of 0.33, 0.67, and 1.0) embedded in a synthetic [mad] syllable; group two discriminated these same duration contrasts within the bisyllable [ samad ], and group three in the trisyllable [ masamad ]. In all cases, the contrasting durations were carried by the last vowel of the synthetic word. These same three infant groups failed to provide evidence of discrimination of a final position released stop consonant contrast ([mat] versus [mad]) cued by voice excitation during closure of the [d] and not the [t]. These results suggest that vowel duration may be a primary cue for infants' perception of the voicing of final position stop consonants.  相似文献   

6.
7.
8.
As part of an investigation of the temporal implementation rules of English, measurements were made of voice-onset time for initial English stops and the duration of the following voiced vowel in monosyllabic words for New York City speakers. It was found that the VOT of a word-initial consonant was longer before a voiceless final cluster than before a single nasal, and longer before tense vowels than lax vowels. The vowels were also longer in environments where VOT was longer, but VOT did not maintain a constant ratio with the vowel duration, even for a single place of articulation. VOT was changed by a smaller proportion than the following voiced vowel in both cases. VOT changes associated with the vowel were consistent across place of articulation of the stop. In the final experiment, when vowel tensity and final consonant effects were combined, it was found that the proportion of vowel duration change that carried over to the preceding VOT is different for the two phonetic changes. These results imply that temporal implementation rules simultaneously influence several acoustic intervals including both VOT and the "inherent" interval corresponding to a segment, either by independent control of the relevant articulatory variables or by some unknown common mechanism.  相似文献   

9.
Some effects of talker variability on spoken word recognition   总被引:2,自引:0,他引:2  
The perceptual consequences of trial-to-trial changes in the voice of the talker on spoken word recognition were examined. The results from a series of experiments using perceptual identification and naming tasks demonstrated that perceptual performance decreases when the voice of the talker changes from trial to trial compared to performance when the voice on each trial remains the same. In addition, the effects of talker variability on word recognition appeared to be more robust and less dependent on task than the effects of word frequency and lexical structure. Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker's voice is intimately related to early perceptual processes that extract acoustic-phonetic information from the speech signal.  相似文献   

10.
The harmonic sieve has been proposed as a mechanism for excluding extraneous frequency components from the estimate of the pitch of a complex sound. The experiments reported here examine whether a harmonic sieve could also determine whether a particular harmonic contributes to the phonetic quality of a vowel. Mistuning a harmonic in the first formant region of vowels from an /I/-/e/ continuum gave shifts in the phoneme boundary that could be explained by (i) phase effects for small amounts of mistuning and (ii) a harmonic sievelike grouping mechanism for larger amounts of mistuning. Similar grouping criteria to those suggested for pitch may operate for the determination of first formant frequency in voiced speech.  相似文献   

11.
12.
Study on the acoustical characteristic is important to speech and speaker recognition in Chinese whispered speech. In this paper, the characteristics of whispered speech are introduced and the acoustical characteristics in Chinese whispered speech are discussed. There is no fundamental frequency in the whispered speech, so other characteristics such as the duration and frequency of formant are extracted and analyzed. From experiments with six simple Chinese whispered vowels, it is proved that the duration and the frequency of formant can be used as the main acoustical characteristics in the Chinese whispered recognition.  相似文献   

13.
Covariation among vowel height effects on vowel intrinsic fundamental frequency (IF(0)), voice onset time (VOT), and voiceless interval duration (VID) is analyzed to assess the plausibility of a common physiological mechanism underlying variation in these measures. Phrases spoken by 20 young adults, containing words composed of initial voiceless stops or /s/ and high or low vowels, were produced in habitual and voluntarily increased F(0) conditions. High vowels were associated with increased IF(0) and longer VIDs. VOT and VID exhibited significant covariation with IF(0) only for males at habitual F(0). The lack of covariation for females and at increased F(0) is discussed.  相似文献   

14.
Hearing talkers produce shorter vowel and word durations in multisyllabic contexts than in monosyllabic contexts. This investigation determined whether a similar effect occurs for deaf talkers, a population often characterized as lacking coarticulation in their speech. Four prelingually deafened adults and two hearing controls produced three sets of word sequences. Each set included a kernel word and six derived forms (e.g., "speed," "speedy," "speeding," etc.). The derived forms were created by adding unstressed and stressed syllables to the kernel form. A spectrographic analysis indicated that the deaf subjects did not always decrease word and vowel durations for the derivatives. Unlike hearing speakers, they often did not reduce vowel segments more than consonant segments. Three explanations are forwarded for the shortening effects. One relates to the implementation of temporal rules, the second concerns the organization imposed upon the articulators to produce speech, and the third suggests a language-independent vocal tract characteristic. The role of auditory information in developing the shortening effects is also considered.  相似文献   

15.
16.
17.
The purpose of this investigation was to study the effects of consonant environment on vowel duration for normally hearing males, hearing-impaired males with intelligible speech, and hearing-impaired males with semi-intelligible speech. The results indicated that the normally hearing and intelligible hearing-impaired speakers exhibited similar trends with respect to consonant influence on vowel duration; i.e., vowels were longer in duration, in a voiced environment as compared with a voiceless, and in a fricative environment as compared with a plosive. The semi-intelligible hearing-impaired speakers, however, failed to demonstrate a consonant effect on vowel duration, and produced the vowels with significantly longer durations when compared with the other two groups of speakers. These data provide information regarding temporal conditions which may contribute to the decreased intelligibility of hearing-impaired persons.  相似文献   

18.
In same-different discrimination tasks employing isolated vowel sounds, subjects often give significantly more "different" responses to one order of two stimuli than to the other order. Cowan and Morse [J. Acoust. Soc. Am. 79, 500-507 (1986)] proposed a neutralization hypothesis to account for such effects: The first vowel in a pair is assumed to change its quality in memory in the direction of the neutral vowel, schwa. Three experiments were conducted using a variety of vowels and some initial support for the hypothesis was obtained, using a large stimulus set, but conflicting evidence with smaller stimulus sets. Rather than becoming more similar to schwa, the first vowel in a pair seems to drift toward the interior of the stimulus range employed in a given test. Several possible explanations are discussed for this tendency and its relation to presentation order effects obtained in other psychophysical paradigms is noted.  相似文献   

19.
This study explored how across-talker differences influence non-native vowel perception. American English (AE) and Korean listeners were presented with recordings of 10 AE vowels in /bVd/ context. The stimuli were mixed with noise and presented for identification in a 10-alternative forced-choice task. The two listener groups heard recordings of the vowels produced by 10 talkers at three signal-to-noise ratios. Overall the AE listeners identified the vowels 22% more accurately than the Korean listeners. There was a wide range of identification accuracy scores across talkers for both AE and Korean listeners. At each signal-to-noise ratio, the across-talker intelligibility scores were highly correlated for AE and Korean listeners. Acoustic analysis was conducted for 2 vowel pairs that exhibited variable accuracy across talkers for Korean listeners but high identification accuracy for AE listeners. Results demonstrated that Korean listeners' error patterns for these four vowels were strongly influenced by variability in vowel production that was within the normal range for AE talkers. These results suggest that non-native listeners are strongly influenced by across-talker variability perhaps because of the difficulty they have forming native-like vowel categories.  相似文献   

20.
An experiment investigated the effects of amplitude ratio (-35 to 35 dB in 10-dB steps) and fundamental frequency difference (0%, 3%, 6%, and 12%) on the identification of pairs of concurrent synthetic vowels. Vowels as weak as -25 dB relative to their competitor were easier to identify in the presence of a fundamental frequency difference (delta F0). Vowels as weak as -35 dB were not. Identification was generally the same at delta F0 = 3%, 6%, and 12% for all amplitude ratios: unfavorable amplitude ratios could not be compensated by larger delta F0's. Data for each vowel pair and each amplitude ratio, at delta F0 = 0%, were compared to the spectral envelope of the stimulus at the same ratio, in order to determine which spectral cues determined identification. This information was then used to interpret the pattern of improvement with delta F0 for each vowel pair, to better understand mechanisms of F0-guided segregation. Identification of a vowel was possible in the presence of strong cues belonging to its competitor, as long as cues to its own formants F1 and F2 were prominent. delta F0 enhanced the prominence of a target vowel's cues, even when the spectrum of the target was up to 10 dB below that of its competitor at all frequencies. The results are incompatible with models of segregation based on harmonic enhancement, beats, or channel selection.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号