首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The goal of this study was to determine whether acoustic properties could be derived for English labial and alveolar nasal consonants that remain stable across vowel contexts, speakers, and syllable positions. In experiment I, critical band analyses were conducted of five tokens each of [m] and [n] followed by the vowels [i e a o u] spoken by three speakers. Comparison of the nature of the changes in the spectral patterns from the murmur to the release showed that, for labials, there was a greater change in energy in the region of Bark 5-7 relative to that of Bark 11-14, whereas, for alveolars, there was a greater change in energy from the murmur to the release in the region of Bark 11-14 relative to that of Bark 5-7. Quantitative analyses of each token indicated that over 89% of the utterances could be appropriately classified for place of articulation by comparing the proportion of energy change in these spectral regions. In experiment II, the spectral patterns of labial and alveolar nasals produced in the context of [s] + nasal ([ m n]) + vowel ([ i e a o u]) by two speakers were explored. The same analysis procedures were used as in experiment I. Eighty-four percent of the utterances were appropriately classified, although labial consonants were less consistently classified than in experiment I. The properties associated with nasal place of articulation found in this study are discussed in relation to those associated with place of articulation in stop consonants and are considered from the viewpoint of a more general theory of acoustic invariance.  相似文献   

2.
Two recent accounts of the acoustic cues which specify place of articulation in syllable-initial stop consonants claim that they are located in the initial portions of the CV waveform and are context-free. Stevens and Blumstein [J. Acoust. Soc. Am. 64, 1358-1368 (1978)] have described the perceptually relevant spectral properties of these cues as static, while Kewley-Port [J. Acoust. Soc. Am. 73, 322-335 (1983)] describes these cues as dynamic. Three perceptual experiments were conducted to test predictions derived from these accounts. Experiment 1 confirmed that acoustic cues for place of articulation are located in the initial 20-40 ms of natural stop-vowel syllables. Next, short synthetic CV's modeled after natural syllables were generated using either a digital, parallel-resonance synthesizer in experiment 2 or linear prediction synthesis in experiment 3. One set of synthetic stimuli preserved the static spectral properties proposed by Stevens and Blumstein. Another set of synthetic stimuli preserved the dynamic properties suggested by Kewley-Port. Listeners in both experiments identified place of articulation significantly better from stimuli which preserved dynamic acoustic properties than from those based on static onset spectra. Evidently, the dynamic structure of the initial stop-vowel articulatory gesture can be preserved in context-free acoustic cues which listeners use to identify place of articulation.  相似文献   

3.
A study was undertaken to explore the effects of fixing the mandible with a bite block on the formant frequencies of the vowels [i a u] produced by two groups of children aged 4 and 5 and 7 and 8 years. Vowels produced in both normal and bite-block conditions were submitted to LPC analysis with windows placed over the first glottal pulse and at the vowel midpoint. For both groups of children, no differences were found in the frequencies of either the first or second formant between the normal and bite-block conditions. Results are discussed in relation to theories of the acquisition of speech motor control.  相似文献   

4.
Much recent research on acoustic cues for consonants' places of articulation has focused upon the nature of the rapid spectral changes that take place between signal portions corresponding to consonantal closure and adjacent vowels. The study reported here builds on the foundation laid by earlier studies that have explored techniques for representing spectral change and for classifying place of articulation of nasal consonants using features extracted from rapid spectral changes that take place over murmur-to-vowel transitions. A new procedure is reported that avoids the use of predetermined absolute frequency bands in deriving parameters of spectral change in nasals. In experiments using the speech of 20 female and 20 male talkers, in a variety of physical and perceptual spectral scalings, application of the new procedure results in 77% correct classification of place of articulation of syllable-initial nasals and 51% correct classification of place of articulation of syllable-final nasals (for which there is a three-way contrast). Tested on the same data, a technique using predetermined absolute frequency bands produced 72% correct classification of syllable-initial nasals.  相似文献   

5.
6.
This paper reports two series of experiments that examined the phonetic correlates of lexical stress in Vietnamese compounds in comparison to their phrasal constructions. In the first series of experiments, acoustic and perceptual characteristics of Vietnamese compound words and their phrasal counterparts were investigated on five likely acoustic correlates of stress or prominence (f0 range and contour, duration, intensity and spectral slope, vowel reduction), elicited under two distinct speaking conditions: a "normal speaking" condition and a "maximum contrast" condition which encouraged speakers to employ prosodic strategies for disambiguation. The results suggested that Vietnamese lacks phonetic resources for distinguishing compounds from phrases lexically and that native speakers may employ a phrase-level prosodic disambiguation strategy (juncture marking), when required to do so. However, in a second series of experiments, minimal pairs of bisyllabic coordinative compounds with reversible syllable positions were examined for acoustic evidence of asymmetrical prominence relations. Clear evidence of asymmetric prominences in coordinative compounds was found, supporting independent results obtained from an analysis of reduplicative compounds and tone sandhi in Vietnamese [Nguye;n and Ingram, 2006]. A reconciliation of these apparently conflicting findings on word stress in Vietnamese is presented and discussed.  相似文献   

7.
Speech perception requires the integration of information from multiple phonetic and phonological dimensions. A sizable literature exists on the relationships between multiple phonetic dimensions and single phonological dimensions (e.g., spectral and temporal cues to stop consonant voicing). A much smaller body of work addresses relationships between phonological dimensions, and much of this has focused on sequences of phones. However, strong assumptions about the relevant set of acoustic cues and/or the (in)dependence between dimensions limit previous findings in important ways. Recent methodological developments in the general recognition theory framework enable tests of a number of these assumptions and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A hierarchical Bayesian Gaussian general recognition theory model was fit to data from two experiments investigating identification of English labial stop and fricative consonants in onset (syllable initial) and coda (syllable final) position. The results underscore the importance of distinguishing between conceptually distinct processing levels and indicate that, for individual subjects and at the group level, integration of phonological information is partially independent with respect to perception and that patterns of independence and interaction vary with syllable position.  相似文献   

8.
In obstruent consonants, a major constriction in the upper vocal tract yields an increase in intraoral pressure (P(io)). Phonation requires that subglottal pressure (P(sub)) exceed P(io) by a threshold value, so as the transglottal pressure reaches the threshold, phonation will cease. This work investigates how P(io) levels at phonation offset and onset vary before and after different German voiceless obstruents (stop, fricative, affricates, clusters), and with following high vs low vowels. Articulatory contacts, measured using electropalatography, were recorded simultaneously with P(io) to clarify how supraglottal constrictions affect P(io). Effects of consonant type on phonation thresholds could be explained mainly in terms of the magnitude and timing of vocal-fold abduction. Phonation offset occurred at lower values of P(io) before fricative-initial sequences than stop-initial sequences, and onset occurred at higher levels of P(io) following the unaspirated stops of clusters compared to fricatives, affricates, and aspirated stops. The vowel effects were somewhat surprising: High vowels had an inhibitory effect at voicing offset (phonation ceasing at lower values of P(io)) in short-duration consonant sequences, but a facilitating effect on phonation onset that was consistent across consonantal contexts. The vowel influences appear to reflect a combination of vocal-fold characteristics and vocal-tract impedance.  相似文献   

9.
According to recent theoretical accounts of place of articulation perception, global, invariant properties of the stop CV syllable onset spectrum serve as primary, innate cues to place of articulation, whereas contextually variable formant transitions constitute secondary, learned cues. By this view, one might expect that young infants would find the discrimination of place of articulation contrasts signaled by formant transition differences more difficult than those cued by gross spectral differences. Using an operant head-turning paradigm, we found that 6-month-old infants were able to discriminate two-formant stimuli contrasting in place of articulation as well as they did five-formant + burst stimuli. Apparently, neither the global properties of the onset spectrum nor simply the additional acoustic information contained in the five-formant + burst stimuli afford the infant any advantage in the discrimination task. Rather, formant transition information provides a sufficient basis for discriminating place of articulation differences.  相似文献   

10.
This study reassessed the role of the nasal murmur and formant transitions as perceptual cues for place of articulation in nasal consonants across a number of vowel environments. Five types of computer-edited stimuli were generated from natural utterances consisting of [m n] followed by [i e a o u]: (1) full murmurs; (2) transitions plus vowel segments; (3) the last six pulses of the murmur; (4) the six pulses starting from the beginning of the formant transitions; and (5) the six pulses surrounding the nasal release (three pulses before and three pulses after). Results showed that the murmur provided as much information for the perception of place of articulation as did the transitions. Moreover, the highest performance scores for place of articulation were obtained in the six-pulse condition containing both murmur and transition information. The data support the view that it is the combination of nasal murmur plus formant transitions which forms an integrated property for the perception of place of articulation.  相似文献   

11.
We have examined the effects of the relative amplitude of the release burst on perception of the place of articulation of utterance-initial voiceless and voiced stop consonants. The amplitude of the burst, which occurs within the first 10-15 ms following consonant release, was systematically varied in 5-dB steps from -10 to +10 dB relative to a "normal" burst amplitude for two labial-to-alveolar synthetic speech continua--one comprising voiceless stops and the other, voiced stops. The distribution of spectral energy in the bursts for the labial and alveolar stops at the ends of the continuum was consistent with the spectrum shapes observed in natural utterances, and intermediate shapes were used for intermediate stimuli on the continuum. The results of identification tests with these stimuli showed that the relative amplitude of the burst significantly affected the perception of the place of articulation of both voiceless and voiced stops, but the effect was greater for the former than the latter. The results are consistent with a view that two basic properties contribute to the labial-alveolar distinction in English. One of these is determined by the time course of the change in amplitude in the high-frequency range (above 2500 Hz) in the few tens of ms following consonantal release, and the other is determined by the frequencies of spectral peaks associated with the second and third formants in relation to the first formant.  相似文献   

12.
Seven listener groups, varying in terms of the nasal consonant inventory of their native language, orthographically labeled and rated a set of naturally produced non-native nasal consonants varying in place of articulation. The seven listener groups included speakers of Malayalam, Marathi, Punjabi, Tamil, Oriya, Bengali, and American English. The stimulus set included bilabial, dental, alveolar, and retroflex nasals from Malayalam, Marathi, and Oriya. The stimulus set and nasal consonant inventories of the seven listener groups were described by both phonemic and allophonic representations. The study was designed to determine the extent to which phonemic and allophonic representations of perceptual categories can be used to predict a listener group's identification of non-native sounds. The results of the experiment showed that allophonic representations were more successful in predicting the native category that listeners used to label a non-native sound in a majority of trials. However, both representations frequently failed to accurately predict the goodness of fit between a non-native sound and a perceptual category. The results demonstrate that the labeling and rating of non-native stimuli were conditioned by a degree of language-specific phonetic detail that corresponds to perceptually relevant cues to native language contrasts.  相似文献   

13.
This study examined the ability of six-month-old infants to recognize the perceptual similarity of syllables sharing a phonetic segment when variations were introduced in phonetic environment and talker. Infants in a "phonetic" group were visually reinforced for head turns when a change occurred from a background category of labial nasals to a comparison category of alveolar nasals . The infants were initially trained on a [ma]-[na] contrast produced by a male talker. Novel tokens differing in vowel environment and talker were introduced over several stages of increasing complexity. In the most complex stage infants were required to make a head turn when a change occurred from [ma,mi,mu] to [na,ni,nu], with the tokens in each category produced by both male and female talkers. A " nonphonetic " control group was tested using the same pool of stimuli as the phonetic condition. The only difference was that the stimuli in the background and comparison categories were chosen in such a way that the sounds could not be organized by acoustic or phonetic characteristics. Infants in the phonetic group transferred training to novel tokens produced by different talkers and in different vowel contexts. However, infants in the nonphonetic control group had difficulty learning the phonetically unrelated tokens that were introduced as the experiment progressed. These findings suggest that infants recognize the similarity of nasal consonants sharing place of articulation independent of variation in talker and vowel context.  相似文献   

14.
15.
Signal design in cf-bats is hypothesized to be commensurate with the evaluation of time-variant echo parameters, imposed by changes in the sound channel occurring as the bat flies by a target. Two such parameters, the proportional changes in Doppler frequency and sound pressure amplitude, are surveyed, employing a simplified acoustic model in order to assess their fitness for target localization given a translational movement within a plane. This is accomplished by considering the properties of the scalar fields given by the value of these putative sensory variables as a function of position in a plane. The considered criteria are: existence and extent of ambiguity areas (i.e., multiple solutions for target position), magnitude of the variables (relevant with regard to perceptual thresholds), as well as magnitude and orthogonality of the gradients (relevant to localization accuracy). It is concluded that these properties render the considered variables compatible with gross judgements of target position. This may be sufficient for behavioral contexts like obstacle avoidance, where adoption of suitable safety margins could compensate for the variance and bias associated with estimates of target location.  相似文献   

16.
This study investigated the integration of place- and temporal-pitch cues in pitch contour identification (PCI), in which cochlear implant (CI) users were asked to judge the overall pitch-change direction of stimuli. Falling and rising pitch contours were created either by continuously steering current between adjacent electrodes (place pitch), by continuously changing amplitude modulation (AM) frequency (temporal pitch), or both. The percentage of rising responses was recorded as a function of current steering or AM frequency change, with single or combined pitch cues. A significant correlation was found between subjects' sensitivity to current steering and AM frequency change. The integration of place- and temporal-pitch cues was most effective when the two cues were similarly discriminable in isolation. Adding the other (place or temporal) pitch cues shifted the temporal- or place-pitch psychometric functions horizontally without changing the slopes. PCI was significantly better with consistent place- and temporal-pitch cues than with inconsistent cues. PCI with single cues and integration of pitch cues were similar on different electrodes. The results suggest that CI users effectively integrate place- and temporal-pitch cues in relative pitch perception tasks. Current steering and AM frequency change should be coordinated to better transmit dynamic pitch information to CI users.  相似文献   

17.
It has been posited that the role of prosody in lexical segmentation is elevated when the speech signal is degraded or unreliable. Using predictions from Cutler and Norris' [J. Exp. Psychol. Hum. Percept. Perform. 14, 113-121 (1988)] metrical segmentation strategy hypothesis as a framework, this investigation examined how individual suprasegmental and segmental cues to syllabic stress contribute differentially to the recognition of strong and weak syllables for the purpose of lexical segmentation. Syllabic contrastivity was reduced in resynthesized phrases by systematically (i) flattening the fundamental frequency (F0) contours, (ii) equalizing vowel durations, (iii) weakening strong vowels, (iv) combining the two suprasegmental cues, i.e., F0 and duration, and (v) combining the manipulation of all cues. Results indicated that, despite similar decrements in overall intelligibility, F0 flattening and the weakening of strong vowels had a greater impact on lexical segmentation than did equalizing vowel duration. Both combined-cue conditions resulted in greater decrements in intelligibility, but with no additional negative impact on lexical segmentation. The results support the notion of F0 variation and vowel quality as primary conduits for stress-based segmentation and suggest that the effectiveness of stress-based segmentation with degraded speech must be investigated relative to the suprasegmental and segmental impoverishments occasioned by each particular degradation.  相似文献   

18.
The present study systematically manipulated three acoustic cues--fundamental frequency (f0), amplitude envelope, and duration--to investigate their contributions to tonal contrasts in Mandarin. Simplified stimuli with all possible combinations of these three cues were presented for identification to eight normal-hearing listeners, all native speakers of Mandarin from Taiwan. The f0 information was conveyed either by an f0-controlled sawtooth carrier or a modulated noise so as to compare the performance achievable by a clear indication of voice f0 and what is possible with purely temporal coding of f0. Tone recognition performance with explicit f0 was much better than that with any combination of other acoustic cues (consistently greater than 90% correct compared to 33%-65%; chance is 25%). In the absence of explicit f0, the temporal coding of f0 and amplitude envelope both contributed somewhat to tone recognition, while duration had only a marginal effect. Performance based on these secondary cues varied greatly across listeners. These results explain the relatively poor perception of tone in cochlear implant users, given that cochlear implants currently provide only weak cues to f0, so that users must rely upon the purely temporal (and secondary) features for the perception of tone.  相似文献   

19.
The effect of the filter bank on fundamental frequency (F0) discrimination was examined in four Nucleus CI24 cochlear implant subjects for synthetic stylized vowel-like stimuli. The four tested filter banks differed in cutoff frequencies, amount of overlap between filters, and shape of the filters. To assess the effects of temporal pitch cues on F0 discrimination, temporal fluctuations were removed above 10 Hz in one condition and above 200 Hz in another. Results indicate that F0 discrimination based upon place pitch cues is possible, but just-noticeable differences exceed 1 octave or more depending on the filter bank used. Increasing the frequency resolution in the F0 range improves the F0 discrimination based upon place pitch cues. The results of F0 discrimination based upon place pitch agree with a model that compares the centroids of the electrical excitation pattern. The addition of temporal fluctuations up to 200 Hz significantly improves F0 discrimination. Just-noticeable differences using both place and temporal pitch cues range from 6% to 60%. Filter banks that do not resolve the higher harmonics provided the best temporal pitch cues, because temporal pitch cues are clearest when the fluctuation on all channels is at F0 and preferably in phase.  相似文献   

20.
Frequency modulation detection limens (FMDLs) were measured for five hearing-impaired (HI) subjects for carrier frequencies f(c) = 1000, 4000, and 6000 Hz, using modulation frequencies f(m) = 2 and 10 Hz and levels of 20 dB sensation level and 90 dB SPL. FMDLs were smaller for f(m) = 10 than for f(m) = 2 Hz for the two higher f(c), but not for f(c) = 1000 Hz. FMDLs were also determined with additional random amplitude modulation (AM), to disrupt excitation-pattern cues. The disruptive effect was larger for f(m) = 10 than for f(m) = 2 Hz. The smallest disruption occurred for f(m) = 2 Hz and f(c) = 1000 Hz. AM detection thresholds for normal-hearing and HI subjects were measured for the same f(c) and f(m) values. Performance was better for the HI subjects for both f(m). AM detection was much better for f(m) = 10 than for f(m) = 2 Hz. Additional tests showed that most HI subjects could discriminate temporal fine structure (TFS) at 800 Hz. The results are consistent with the idea that, for f(m) = 2 Hz and f(c) = 1000 Hz, frequency modulation (FM) detection was partly based on the use of TFS information. For higher carrier frequencies and for all carrier frequencies with f(m) = 10 Hz, FM detection was probably based on place cues.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号