期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A modeling investigation of articulatory variability and acoustic stability during American English /r/ production

Nieto-Castanon A Guenther FH Perkell JS Curtin HD 《The Journal of the Acoustical Society of America》2005,117(5):3196-3212

This paper investigates the functional relationship between articulatory variability and stability of acoustic cues during American English /r/ production. The analysis of articulatory movement data on seven subjects shows that the extent of intrasubject articulatory variability along any given articulatory direction is strongly and inversely related to a measure of acoustic stability (the extent of acoustic variation that displacing the articulators in this direction would produce). The presence and direction of this relationship is consistent with a speech motor control mechanism that uses a third formant frequency (F3) target; i.e., the final articulatory variability is lower for those articulatory directions most relevant to determining the F3 value. In contrast, no consistent relationship across speakers and phonetic contexts was found between hypothesized vocal-tract target variables and articulatory variability. Furthermore, simulations of two speakers' productions using the DIVA model of speech production, in conjunction with a novel speaker-specific vocal-tract model derived from magnetic resonance imaging data, mimic the observed range of articulatory gestures for each subject, while exhibiting the same articulatory/acoustic relations as those observed experimentally. Overall these results provide evidence for a common control scheme that utilizes an acoustic, rather than articulatory, target specification for American English /r/. 相似文献

2.

Articulatory and acoustic adaptation to palatal perturbation

Thibeault M Ménard L Baum SR Richard G McFarland DH 《The Journal of the Acoustical Society of America》2011,129(4):2112-2120

Previous work has established that speakers have difficulty making rapid compensatory adjustments in consonant production (especially in fricatives) for structural perturbations of the vocal tract induced by artificial palates with thicker-than-normal alveolar regions. The present study used electromagnetic articulography and simultaneous acoustic recordings to estimate tongue configurations during production of [s s? t k] in the presence of a thin and a thick palate, before and after a practice period. Ten native speakers of English participated in the study. In keeping with previous acoustic studies, fricatives were more affected by the palate than were the stops. The thick palate lowered the center of gravity and the jaw was lower and the tongue moved further backwards and downwards. Center of gravity measures revealed complete adaptation after training, and with practice, subjects' decreased interlabial distance. The fact that adaptation effects were found for [k], which are produced with an articulatory gesture not directly impeded by the palatal perturbation, suggests a more global sensorimotor recalibration that extends beyond the specific articulatory target. 相似文献

3.

Vowel posture normalization.

M Hashi J R Westbury K Honda 《The Journal of the Acoustical Society of America》1998,104(4):2426-2437

相似文献

4.

Articulatory-acoustic kinematics: the production of American English /s/

Iskarous K Shadle CH Proctor MI 《The Journal of the Acoustical Society of America》2011,129(2):944-954

Due to its aerodynamic, articulatory, and acoustic complexities, the fricative /s/ is known to require high precision in its control, and to be highly resistant to coarticulation. This study documents in detail how jaw, tongue front, tongue back, lips, and the first spectral moment covary during the production of /s/, to establish how coarticulation affects this segment. Data were obtained from 24 speakers in the Wisconsin x-ray microbeam database producing /s/ in prevocalic and pre-obstruent sequences. Analysis of the data showed that certain aspects of jaw and tongue motion had specific kinematic trajectories, regardless of context, and the first spectral moment trajectory corresponded to these in some aspects. In particular contexts, variability due to jaw motion is compensated for by tongue-tip motion and bracing against the palate, to maintain an invariant articulatory-aerodynamic goal, constriction degree. The change in the first spectral moment, which rises to a peak at the midpoint of the fricative, primarily reflects the motion of the jaw. Implications of the results for theories of speech motor control and acoustic-articulatory relations are discussed. 相似文献

5.

Acoustic analyses and perceptual data on anticipatory labial coarticulation in adults and children

J A Sereno S R Baum G C Marean P Lieberman 《The Journal of the Acoustical Society of America》1987,81(2):512-519

The present study investigated anticipatory labial coarticulation in the speech of adults and children. CV syllables, composed of [s], [t], and [d] before [i] and [u], were produced by four adult speakers and eight child speakers aged 3-7 years. Each stimulus was computer edited to include only the aperiodic portion of fricative-vowel and stop-vowel syllables. LPC spectra were then computed for each excised segment. Analyses of the effect of the following vowel on the spectral peak associated with the second formant frequency and on the characteristic spectral prominence for each consonant were performed. Perceptual data were obtained by presenting the aperiodic consonantal segments to subjects who were instructed to identify the following vowel as [i] or [u]. Both the acoustic and the perceptual data show strong coarticulatory effects for the adults and comparable, although less consistent, coarticulation in the speech stimuli of the children. The results are discussed in terms of the articulatory and perceptual aspects of coarticulation in language learning. 相似文献

6.

Vocal Tract Resonance Analysis of Aging Voice Using Long-Term Average Spectra

Sue Ellen Linville Jennifer Rens 《Journal of voice》2001,15(3):323-330

This study is the first to use long-term average spectra (LTAS) to investigate resonance characteristics of dynamic speech in young adulthood and old age. A total of 80 speakers participated, divided equally by age group and gender. All elderly speakers were healthy, active members of the community. Measurement of the first three spectral peaks in LTAS from the first paragraph of the Rainbow Passage revealed significant lowering of peak 1 from young adulthood to old age in both men and women. Peaks 2 and 3 also lowered significantly across the adult lifespan in women and showed a tendency to lower in men. These acoustic findings are consistent with anatomic data suggesting that aging results in lengthening of the supraglottic vocal tract. Findings that women demonstrate more substantial lowering of spectral peaks with aging than men suggest that women may undergo more pronounced age-related lengthening of the supraglottic vocal tract. Alternatively, it is possible that elderly men systematically alter tongue position during vowel articulation while elderly women are less inclined to do so. Taken in conjunction with previous research, these findings suggest a "mixed model" of vocal tract resonance changes with aging in which an interaction exists between gender, the resonance effects of laryngeal lowering, and vowel articulatory patterns. 相似文献

7.

The emergence of mature gestural patterns in the production of voiceless and voiced word-final stops

Nittrouer S Estee S Lowenstein JH Smith J 《The Journal of the Acoustical Society of America》2005,117(1):351-364

The organization of gestures was examined in children's and adults' samples of consonant-vowel-stop words differing in stop voicing. Children (5 and 7 years old) and adults produced words from five voiceless/voiced pairs, five times each in isolation and in sentences. Acoustic measurements were made of vocalic duration, and of the first and second formants at syllable center and voicing offset. The predicted acoustic correlates of syllable-final voicing were observed across speakers: vocalic segments were shorter and first formants were higher in words with voiceless, rather than voiced, final stops. In addition, the second formant was found to differ depending on the voicing of the final stop for all speakers. It was concluded that by 5 years of age children produce words ending in stops with the same overall gestural organization as adults. However, some age-related differences were observed for jaw gestures, and variability for all measures was greater for children than for adults. These results suggest that children are still refining their organization of articulatory gestures past the age of 7 years. Finally, context effects (isolation or sentence) showed that the acoustic correlates of syllable-final voicing are attenuated when words are produced in sentences, rather than in isolation. 相似文献

8.

Acoustic and Perceptual Appraisal of Vocal Gestures in the Female Classical Voice

Dianna T. Kenny Helen F. Mitchell 《Journal of voice》2006,20(1):55-70

Long-term average spectra (LTAS) have identified features in the sounds of singers and have compared different vocal qualities based on energy changes that occur during different vocal tasks. In this study, we compared the perceptual ratings of vocal quality of expert pedagogues with acoustic measures performed on LTAS. Fifteen expert judges rated 24 samples with six repeats of six advanced singing students under two conditions: "optimal" (O), which represented the application of the maximal open throat technique; and "suboptimal" (SO), which represented the application of the reduced open throat technique. LTAS were performed on each singing sample, and two conventional assessments of peak energy height [singing power ratio (SPR)] and peak area [energy ratio (ER)] were calculated on each LTAS. Perceptual scores, SPR, and ER were rank ordered. We then compared perceptual rankings with rankings of acoustic measures (SPR and ER) to assess whether these acoustic measurements matched the perceptual judgments of vocal quality. Although we found the expected significant relationship between SPR and ER, there was no relationship between perceptual ratings of vocal samples or singers based on SPR or ER. These findings suggest that because LTAS measures are not consistent with perceptual ratings of vocal quality, such measurements cannot define a voice of quality. Future research with LTAS to assess vocal quality should consider alternative measures that are more sensitive to subtle differences in vocal parameters. 相似文献

9.

Articulatory dynamics of loud and normal speech 总被引：2，自引：0，他引：2

R Schulman 《The Journal of the Acoustical Society of America》1989,85(1):295-312

A comparison was made between normal and loud productions of bilabial stops and stressed vowels. Simultaneous recordings of lip and jaw movement and the accompanying audio signal were made for four native speakers of Swedish. The stimuli consisted of 12 Swedish vowels appearing in an /i'b_b/ frame and were produced with both normal and increased vocal effort. The displacement, velocity, and relative timing associated with the individual articulators as well as their coarticulatory interactions were studied together with changes in acoustic segmental duration. It is shown that the production of loud as compared with normal speech is characterized by amplification of normal movement patterns that are predictable for the above articulatory parameters. In addition, it was observed that the acoustic durations of bilabial stops were shortened, whereas stressed vowels were lengthened during loud speech production. Two interpretations of the data are offered, viewing loud articulatory behavior as a response to production demands and perceptual constraints, respectively. 相似文献

10.

Perceptuomotor bias in the imitation of steady-state vowels

Vallabha GK Tuller B 《The Journal of the Acoustical Society of America》2004,116(2):1184-1197

Previous studies suggest that speakers are systematically inaccurate, or biased, when imitating self-produced vowels. The direction of these biases in formant space and their variation may offer clues about the organization of the vowel perceptual space. To examine these patterns, three male speakers were asked to imitate 45 self-produced vowels that were systematically distributed in F1/F2 space. All three speakers showed imitation bias, and the bias magnitudes were significantly larger than those predicted by a model of articulatory noise. Each speaker showed a different pattern of bias directions, but the pattern was unrelated to the locations of prototypical vowels produced by that speaker. However, there were substantial quantitative regularities: (1) The distribution of imitation variability and bias magnitudes were similar for all speakers, (2) the imitation variability was independent of the bias magnitudes, and (3) the imitation variability (a production measure) was commensurate with the formant discrimination limen (a perceptual measure). These results indicate that there is additive Gaussian noise in the imitation process that independently affects each formant and that there are speaker-dependent and potentially nonlinguistic biases in vowel perception and production. 相似文献

11.

Effects of prosodic boundary on /aC/ sequences: acoustic results

Tabain M 《The Journal of the Acoustical Society of America》2003,113(1):516-531

This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary. 相似文献

12.

Managing the distinctiveness of phonemic nasal vowels: articulatory evidence from Hindi

Shosted R Carignan C Rong P 《The Journal of the Acoustical Society of America》2012,131(1):455-465

There is increasing evidence that fine articulatory adjustments are made by speakers to reinforce and sometimes counteract the acoustic consequences of nasality. However, it is difficult to attribute the acoustic changes in nasal vowel spectra to either oral cavity configuration or to velopharyngeal opening (VPO). This paper takes the position that it is possible to disambiguate the effects of VPO and oropharyngeal configuration on the acoustic output of the vocal tract by studying the position and movement of the tongue and lips during the production of oral and nasal vowels. This paper uses simultaneously collected articulatory, acoustic, and nasal airflow data during the production of all oral and phonemically nasal vowels in Hindi (four speakers) to understand the consequences of the movements of oral articulators on the spectra of nasal vowels. For Hindi nasal vowels, the tongue body is generally lowered for back vowels, fronted for low vowels, and raised for front vowels (with respect to their oral congeners). These movements are generally supported by accompanying changes in the vowel spectra. In Hindi, the lowering of back nasal vowels may have originally served to enhance the acoustic salience of nasality, but has since engendered a nasal vowel chain shift. 相似文献

13.

Application of MRI to the analysis of speech production

T Baer J C Gore S Boyce P W Nye 《Magnetic resonance imaging》1987,5(1):1-7

Computer models of the process of speech articulation require a detailed knowledge of the vocal tract configurations employed in speech and the application of acoustic theory to calculate the sound waveform. Almost all currently available data on vocal tract dimensions come from x-ray films and are severely limited in quantity and coherence due to restrictions on radiation dosage and intersubject differences. We are using MRI techniques to obtain the pharyngeal dimensions of speakers producing sustained vowels. The fact that MRI does not employ ionizing radiation provides speech research with the opportunity to obtain comprehensive bodies of much-needed data on the articulatory characteristics of single subjects. 相似文献

14.

Relation between voice-onset time and vowel duration.

R F Port R Rotunno 《The Journal of the Acoustical Society of America》1979,66(3):654-662

As part of an investigation of the temporal implementation rules of English, measurements were made of voice-onset time for initial English stops and the duration of the following voiced vowel in monosyllabic words for New York City speakers. It was found that the VOT of a word-initial consonant was longer before a voiceless final cluster than before a single nasal, and longer before tense vowels than lax vowels. The vowels were also longer in environments where VOT was longer, but VOT did not maintain a constant ratio with the vowel duration, even for a single place of articulation. VOT was changed by a smaller proportion than the following voiced vowel in both cases. VOT changes associated with the vowel were consistent across place of articulation of the stop. In the final experiment, when vowel tensity and final consonant effects were combined, it was found that the proportion of vowel duration change that carried over to the preceding VOT is different for the two phonetic changes. These results imply that temporal implementation rules simultaneously influence several acoustic intervals including both VOT and the "inherent" interval corresponding to a segment, either by independent control of the relevant articulatory variables or by some unknown common mechanism. 相似文献

15.

Perception of the [m]-[n] distinction in consonant-vowel (CV) and vowel-consonant (VC) syllables produced by child and adult talkers

Ohde RN Haley KL Barnes CW 《The Journal of the Acoustical Society of America》2006,119(3):1697-1711

The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production. 相似文献

16.

Time dependence of vocal tract modes during production of vowels and vowel sequences

Story BH 《The Journal of the Acoustical Society of America》2007,121(6):3770-3789

Vocal tract shaping patterns based on articulatory fleshpoint data from four speakers in the University of Wisconsin x-ray microbeam (XRMB) database [J. Westbury, UW-Madison, (1994)] were determined with a principal component analysis (PCA). Midsagittal cross-distance functions representative of approximately the front 6 cm of the oral cavity for each of 11 vowels and vowel-vowel (VV) sequences were obtained from the pellet positions and the hard palate profile for the four speakers. A PCA was independently performed on each speaker's set of cross-distance functions representing static vowels only, and again with time-dependent cross-distance functions representing vowels and VV sequences. In all cases, results indicated that the first two orthogonal components (referred to as modes) accounted for more than 97% of the variance in each speaker's set of cross-distance functions. In addition, the shape of each mode was shown to be similar across the speakers suggesting that the modes represent common patterns of vocal tract deformation. Plots of the resulting time-dependent coefficient records showed that the four speakers activated each mode similarly during production of the vowel sequences. Finally, a procedure was described for using the time-dependent mode coefficients obtained from the XRMB data as input for an area function model of the vocal tract. 相似文献

17.

Variability in production of the vowels /i/ and /a/

J S Perkell W L Nelson 《The Journal of the Acoustical Society of America》1985,77(5):1889-1895

A hypothesis on the nature of articulatory targets for the vowels /i/ and /a/ is proposed, based on acoustic considerations and vowel articulations. The conjecture is that positioning of points on the tongue surface in a repetition experiment should be most accurate in the direction perpendicular to the vocal-tract midline, at the acoustically critical point of maximal constriction for each vowel. The hypothesis was tested by: examining x-ray microbeam data for three speakers, conducting a partial acoustical analysis, and performing a modeling study. Distributions were plotted of the midsagittal locations of three tongue points at the time of maximal excursion toward the vowel target for numbers of examples of the vowels, embedded in a variety of phonetic contexts. More variation was found along a direction parallel to the vocal tract midline than perpendicular to the midline, supporting the hypothesis. Statistics on formant values for one subject have been calculated, and pairwise regressions of displacement and formant data have been run. An articulatory synthesizer [Rubin et al., J. Acoust. Soc. Am. 70, 321-328 (1981)] has been manipulated through displacements similar to the subject's articulatory variation. Although articulatory synthesis showed systematic relationships between articulatory relationships and formant frequencies, there were no significant correlations between the subject's measured articulatory displacements and his formant data. These additional results raise questions about the methodology and point to the need for additional work for an adequate test of the hypothesis. 相似文献

18.

A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/

Zhou X Espy-Wilson CY Boyce S Tiede M Holland C Choe A 《The Journal of the Acoustical Society of America》2008,123(6):4466-4481

Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. 相似文献

19.

Glottal characteristics of male speakers: acoustic correlates and comparison with female data.

H M Hanson E S Chuang 《The Journal of the Acoustical Society of America》1999,106(2):1064-1077

Acoustic measurements believed to reflect glottal characteristics were made on recordings collected from 21 male speakers. The waveforms and spectra of three nonhigh vowels (/ae, lambda, epsilon/) were analyzed to obtain acoustic parameters related to first-formant bandwidth, open quotient, spectral tilt, and aspiration noise. Comparisons were made with previous results obtained for 22 female speakers [H. M. Hanson, J. Acoust. Soc. Am. 101, 466-481 (1997)]. While there is considerable overlap across gender, the male data show lower average values and less interspeaker variation for all measures. In particular, the amplitude of the first harmonic relative to that of the third formant is 9.6 dB lower for the male speakers than for the female speakers, suggesting that spectral tilt is an especially significant parameter for differentiating male and female speech. These findings are consistent with fiberscopic studies which have shown that males tend to have a more complete glottal closure, leading to less energy loss at the glottis and less spectral tilt. Observations of the speech waveforms and spectra suggest the presence of a second glottal excitation within a glottal period for some of the male speakers. Possible causes and acoustic consequences of these second excitations are discussed. 相似文献

20.

Articulatory tradeoffs reduce acoustic variability during American English /r/ production.

F H Guenther C Y Espy-Wilson S E Boyce M L Matthies M Zandipour J S Perkell 《The Journal of the Acoustical Society of America》1999,105(5):2854-2865

The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme. 相似文献