期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Articulatory tradeoffs reduce acoustic variability during American English /r/ production.

F H Guenther C Y Espy-Wilson S E Boyce M L Matthies M Zandipour J S Perkell 《The Journal of the Acoustical Society of America》1999,105(5):2854-2865

The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme. 相似文献

2.

Generation of the vocal tract spectrum from the underlying articulatory mechanism

Kaburagi T Kim J 《The Journal of the Acoustical Society of America》2007,121(1):456-468

A method for synthesizing vocal-tract spectra from phoneme sequences by mimicking the speech production process of humans is presented. The model consists of four main processes and is particularly characterized by an adaptive formation of articulatory movements. First, our model determines the time when each phoneme is articulated. Next, it generates articulatory constraints that must be met for the production of each phoneme, and then it generates trajectories of the articulatory movements that satisfy the constraints. Finally, the time sequence of spectra is estimated from the produced articulatory trajectories. The articulatory constraint of each phoneme does not change with the phonemic context, but the contextual variability of speech is reproduced because of the dynamic articulatory model. The accuracy of the synthesis model was evaluated using data collected by the simultaneous measurement of speech and articulatory movements. The accuracy of the phonemic timing estimates were measured and compared the synthesized results to the measured results. Experimental results showed that the model captured the contextual variability of both the articulatory movements and speech acoustics. 相似文献

3.

A model of acoustic interspeaker variability based on the concept of formant-cavity affiliation

Apostol L Perrier P Bailly G 《The Journal of the Acoustical Society of America》2004,115(1):337-351

相似文献

4.

Variability in production of the vowels /i/ and /a/

J S Perkell W L Nelson 《The Journal of the Acoustical Society of America》1985,77(5):1889-1895

A hypothesis on the nature of articulatory targets for the vowels /i/ and /a/ is proposed, based on acoustic considerations and vowel articulations. The conjecture is that positioning of points on the tongue surface in a repetition experiment should be most accurate in the direction perpendicular to the vocal-tract midline, at the acoustically critical point of maximal constriction for each vowel. The hypothesis was tested by: examining x-ray microbeam data for three speakers, conducting a partial acoustical analysis, and performing a modeling study. Distributions were plotted of the midsagittal locations of three tongue points at the time of maximal excursion toward the vowel target for numbers of examples of the vowels, embedded in a variety of phonetic contexts. More variation was found along a direction parallel to the vocal tract midline than perpendicular to the midline, supporting the hypothesis. Statistics on formant values for one subject have been calculated, and pairwise regressions of displacement and formant data have been run. An articulatory synthesizer [Rubin et al., J. Acoust. Soc. Am. 70, 321-328 (1981)] has been manipulated through displacements similar to the subject's articulatory variation. Although articulatory synthesis showed systematic relationships between articulatory relationships and formant frequencies, there were no significant correlations between the subject's measured articulatory displacements and his formant data. These additional results raise questions about the methodology and point to the need for additional work for an adequate test of the hypothesis. 相似文献

5.

A study of high front vowels with articulatory data and acoustic simulations

Jackson MT McGowan RS 《The Journal of the Acoustical Society of America》2012,131(4):3017-3035

The purpose of this study is to test a methodology for describing the articulation of vowels. High front vowels are a test case because some theories suggest that high front vowels have little cross-linguistic variation. Acoustic studies appear to show counterexamples to these predictions, but purely acoustic studies are difficult to interpret because of the many-to-one relation between articulation and acoustics. In this study, vocal tract dimensions, including constriction degree and position, are measured from cinéradiographic and x-ray data on high front vowels from three different languages (North American English, French, and Mandarin Chinese). Statistical comparisons find several significant articulatory differences between North American English /i/ and Mandarin Chinese and French /i/. In particular, differences in constriction degree were found, but not constriction position. Articulatory synthesis is used to model the acoustic consequences of some of the significant articulatory differences, finding that the articulatory differences may have the acoustic consequences of making the latter languages' /i/ perceptually sharper by shifting the frequencies of F(2) and F(3) upwards. In addition, the vowel /y/ has specific articulations that differ from those for /i/, including a wider tongue constriction, and substantially different acoustic sensitivity functions for F(2) and F(3). 相似文献

6.

Effects of deafness on acoustic characteristics of American English tense/lax vowels in maternal speech to infants

MV Kondaurova TR Bergeson LC Dilley 《The Journal of the Acoustical Society of America》2012,132(2):1039-1049

Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. 相似文献

7.

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

Ghosh PK Goldstein LM Narayanan SS 《The Journal of the Acoustical Society of America》2011,129(6):4014-4022

Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. 相似文献

8.

Vocal tract normalization for midsagittal articulatory recovery with analysis-by-synthesis.

R S McGowan S Cushing 《The Journal of the Acoustical Society of America》1999,106(2):1090-1105

A method is presented that accounts for differences in the acoustics of vowel production caused by human talkers' vocal-tract anatomies and postural settings. Such a method is needed by an analysis-by-synthesis procedure designed to recover midsagittal articulatory movement from speech acoustics because the procedure employs an articulatory model as an internal model. The normalization procedure involves the adjustment of parameters of the articulatory model that are not of interest for the midsagittal movement recovery procedure. These parameters are adjusted so that acoustic signals produced by the human and the articulatory model match as closely as possible over an initial set of pairs of corresponding human and model midsagittal shapes. Further, these initial midsagittal shape correspondence need to be generalized so that all midsagittal shapes of the human can be obtained from midsagittal shapes of the model. Once these procedures are complete, the midsagittal articulatory movement recovery algorithm can be used to derive model articulatory trajectories that, subsequently, can be transformed into human articulatory trajectories. In this paper the proposed normalization procedure is outlined and the results of experiments with data from two talkers contained in the X-ray Microbeam Speech Production Database are presented. It was found to be possible to characterize these vocal tracts during vowel production with the proposed procedure and to generalize the initial midsagittal correspondences over a set of vowels to other vowels. The procedure was also found to aid in midsagittal articulatory movement recovery from speech acoustics in a vowel-to-vowel production for the two subjects. 相似文献

9.

Temporal characteristics of nasalization in children and adult speakers of American English and Korean during production of three vowel contexts

Ha S Kuehn D 《The Journal of the Acoustical Society of America》2006,120(3):1622-1630

The purpose of this study was to identify and compare the temporal characteristics of nasalization in relation to (1) languages, (2) vowel contexts, and (3) age groups. Two distinct acoustic energies from the mouth and nose were recorded during speech production (/pamap, pimip, pumup/) using two microphones to obtain the absolute and proportional measurements on the acoustic temporal characteristics of nasalization. Twenty-eight normal adults (14 American English and 14 Korean speakers) and 28 normal children (14 American English and 14 Korean speakers) participated in this study. In both languages, adults showed shorter duration of nasalization than children within all three vowel contexts. The high vowel context revealed longer duration of nasalization than the low vowel context in both languages. There was no significant difference of temporal characteristics of nasalization between American English and Korean. Nasalization showed different timing characteristics between children and adults across vowel contexts. The results are discussed in association with developmental coarticulation and the relationship between acoustic consequences of articulatory events and vowel height. 相似文献

10.

Acoustic modeling of American English /r/

Espy-Wilson CY Boyce SE Jackson M Narayanan S Alwan A 《The Journal of the Acoustical Society of America》2000,108(1):343-356

Recent advances in physiological data collection methods have made it possible to test the accuracy of predictions against speaker-specific vocal tracts and acoustic patterns. Vocal tract dimensions for /r/ derived via magnetic-resonance imaging (MRI) for two speakers of American English [Alwan, Narayanan, and Haker, J. Acoust. Soc. Am. 101, 1078-1089 (1997)] were used to construct models of the acoustics of /r/. Because previous models have not sufficiently accounted for the very low F3 characteristic of /r/, the aim was to match formant frequencies predicted by the models to the full range of formant frequency values produced by the speakers in recordings of real words containing /r/. In one set of experiments, area functions derived from MRI data were used to argue that the Perturbation Theory of tube acoustics cannot adequately account for /r/, primarily because predicted locations did not match speakers' actual constriction locations. Different models of the acoustics of /r/ were tested using the Maeda computer simulation program [Maeda, Speech Commun. 1, 199-299 (1982)]; the supralingual vocal-tract dimensions reported in Alwan et al. were found to be adequate at predicting only the highest of attested F3 values. By using (1) a recently developed adaptation of the Maeda model that incorporates the sublingual space as a side branch from the front cavity, and by including (2) the sublingual space as an increment to the dimensions of the front cavity, the mid-to-low values of the speakers' F3 range were matched. Finally, a simple tube model with dimensions derived from MRI data was developed to account for cavity affiliations. This confirmed F3 as a front cavity resonance, and variations in F1, F2, and F4 as arising from mid- and back-cavity geometries. Possible trading relations for F3 lowering based on different acoustic mechanisms for extending the front cavity are also proposed. 相似文献

11.

A magnetic resonance imaging-based articulatory and acoustic study of "retroflex" and "bunched" American English /r/

Zhou X Espy-Wilson CY Boyce S Tiede M Holland C Choe A 《The Journal of the Acoustical Society of America》2008,123(6):4466-4481

Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator. 相似文献

12.

Effects of stress and final-consonant voicing on vowel production: articulatory and acoustic analyses

W V Summers 《The Journal of the Acoustical Society of America》1987,82(3):847-863

Durations of the vocalic portions of speech are influenced by a large number of linguistic and nonlinguistic factors (e.g., stress and speaking rate). However, each factor affecting vowel duration may influence articulation in a unique manner. The present study examined the effects of stress and final-consonant voicing on the detailed structure of articulatory and acoustic patterns in consonant-vowel-consonant (CVC) utterances. Jaw movement trajectories and F 1 trajectories were examined for a corpus of utterances differing in stress and final-consonant voicing. Jaw lowering and raising gestures were more rapid, longer in duration, and spatially more extensive for stressed versus unstressed utterances. At the acoustic level, stressed utterances showed more rapid initial F 1 transitions and more extreme F 1 steady-state frequencies than unstressed utterances. In contrast to the results obtained in the analysis of stress, decreases in vowel duration due to devoicing did not result in a reduction in the velocity or spatial extent of the articulatory gestures. Similarly, at the acoustic level, the reductions in formant transition slopes and steady-state frequencies demonstrated by the shorter, unstressed utterances did not occur for the shorter, voiceless utterances. The results demonstrate that stress-related and voicing-related changes in vowel duration are accomplished by separate and distinct changes in speech production with observable consequences at both the articulatory and acoustic levels. 相似文献

13.

Evolving theories of vowel perception

W Strange 《The Journal of the Acoustical Society of America》1989,85(5):2081-2087

Research on the perception of vowels in the last several years has given rise to new conceptions of vowels as articulatory, acoustic, and perceptual events. Starting from a "simple" target model in which vowels were characterized articulatorily as static vocal tract shapes and acoustically as points in a first and second formant (F1/F2) vowel space, this paper briefly traces the evolution of vowel theory in the 1970s and 1980s in two directions. (1) Elaborated target models represent vowels as target zones in perceptual spaces whose dimensions are specified as formant ratios. These models have been developed primarily to account for perceivers' solution of the "speaker normalization" problem. (2) Dynamic specification models emphasize the importance of formant trajectory patterns in specifying vowel identity. These models deal primarily with the problem of "target undershoot" associated with the coarticulation of vowels with consonants in natural speech and with the issue of "vowel-inherent spectral change" or diphthongization of English vowels. Perceptual studies are summarized that motivate these theoretical developments. 相似文献

14.

Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers

Darwin CJ Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2003,114(5):2913-2922

Three experiments used the Coordinated Response Measure task to examine the roles that differences in F0 and differences in vocal-tract length have on the ability to attend to one of two simultaneous speech signals. The first experiment asked how increases in the natural F0 difference between two sentences (originally spoken by the same talker) affected listeners' ability to attend to one of the sentences. The second experiment used differences in vocal-tract length, and the third used both F0 and vocal-tract length differences. Differences in F0 greater than 2 semitones produced systematic improvements in performance. Differences in vocal-tract length produced systematic improvements in performance when the ratio of lengths was 1.08 or greater, particularly when the shorter vocal tract belonged to the target talker. Neither of these manipulations produced improvements in performance as great as those produced by a different-sex talker. Systematic changes in both F0 and vocal-tract length that simulated an incremental shift in gender produced substantially larger improvements in performance than did differences in F0 or vocal-tract length alone. In general, shifting one of two utterances spoken by a female voice towards a male voice produces a greater improvement in performance than shifting male towards female. The increase in performance varied with the intonation patterns of individual talkers, being smallest for those talkers who showed most variability in their intonation patterns between different utterances. 相似文献

15.

A vocal-tract model of American English /l/

Zhang Z Espy-Wilson CY 《The Journal of the Acoustical Society of America》2004,115(3):1274-1280

The production of the lateral sounds involves airflow paths around the tongue produced by the laterally inward movement of the tongue toward the midsagittal plane. If contact is made with the palate, a closure is formed in the flow path along the midsagittal line. The effects of the lateral channels on the sound spectrum are not clear. In this study, a vocal-tract model with parallel lateral channels and a supralingual cavity was developed. Analysis shows that the lateral channels with dimensions derived from magnetic resonance images of an American English /l/ are able to produce a pole-zero pair in the frequency range of 2-5 kHz. This pole-zero pair, together with an additional pole-zero pair due to the supralingual cavity, results in a low-amplitude and relatively flat spectral shape in the F3-F5 frequency region of the /l/ sound spectrum. 相似文献

16.

Effects of short-term auditory deprivation on speech production in adult cochlear implant users.

M A Svirsky H Lane J S Perkell J Wozniak 《The Journal of the Acoustical Society of America》1992,92(3):1284-1300

Speech production parameters of three postlingually deafened adults who use cochlear implants were measured: after 24 h of auditory deprivation (which was achieved by turning the subject's speech processor off); after turning the speech processor back on; and after turning the speech processor off again. The measured parameters included vowel acoustics [F1, F2, F0, sound-pressure level (SPL), duration and H1-H2, the amplitude difference between the first two spectral harmonics, a correlate of breathiness] while reading word lists, and average airflow during the reading of passages. Changes in speech processor state (on-to-off or vice versa) were accompanied by numerous changes in speech production parameters. Many changes were in the direction of normalcy, and most were consistent with long-term speech production changes in the same subjects following activation of the processors of their cochlear implants [Perkell et al., J. Acoust. Soc. Am. 91, 2961-2978 (1992)]. Changes in mean airflow were always accompanied by H1-H2 (breathiness) changes in the same direction, probably due to underlying changes in laryngeal posture. Some parameters (different combinations of SPL, F0, H1-H2 and formants for different subjects) showed very rapid changes when turning the speech processor on or off. Parameter changes were faster and more pronounced, however, when the speech processor was turned on than when it was turned off. The picture that emerges from the present study is consistent with a dual role for auditory feedback in speech production: long-term calibration of articulatory parameters as well as feedback mechanisms with relatively short time constants. 相似文献

17.

An EMA study of VCV coarticulatory direction

Recasens D 《The Journal of the Acoustical Society of America》2002,111(6):2828-2841

This study addresses three issues that are relevant to coarticulation theory in speech production: whether the degree of articulatory constraint model (DAC model) accounts for patterns of the directionality of tongue dorsum coarticulatory influences; the extent to which those patterns in tongue dorsum coarticulatory direction are similar to those for the tongue tip; and whether speech motor control and phonemic planning use a fixed or a context-dependent temporal window. Tongue dorsum and tongue tip movement data on vowel-to-vowel coarticulation are reported for Catalan VCV sequences with vowels /i/, /a/, and /u/, and consonants /p/, /n/, dark /l/, /s/, /S/, alveolopalatal /n/ and /k/. Electromidsagittal articulometry recordings were carried out for three speakers using the Carstens articulograph. Trajectory data are presented for the vertical dimension for the tongue dorsum, and for the horizontal dimension for tongue dorsum and tip. In agreement with predictions of the DAC model, results show that directionality patterns of tongue dorsum coarticulation can be accounted for to a large extent based on the articulatory requirements on consonantal production. While dorsals exhibit analogous trends in coarticulatory direction for all articulators and articulatory dimensions, this is mostly so for the tongue dorsum and tip along the horizontal dimension in the case of lingual fricatives and apicolaminal consonants. This finding results from different articulatory strategies: while dorsal consonants are implemented through homogeneous tongue body activation, the tongue tip and tongue dorsum act more independently for more anterior consonantal productions. Discontinuous coarticulatory effects reported in the present investigation suggest that phonemic planning is adaptative rather than context independent. 相似文献

18.

Influences of pellet markers on speech production behavior: acoustical and perceptual measures.

G Weismer K Bunton 《The Journal of the Acoustical Society of America》1999,105(5):2882-2894

Peri- and intraoral devices are often used to obtain measurements concerning articulator motions and placements. Surprisingly, there are few formal evaluations of the potential influence of these devices on speech production behavior. In particular, the potential effects of lingual pellets or coils used in x-ray or electromagnetic studies of tongue motion have never been evaluated formally, even though a large x-ray database exists and electromagnetic systems are commercially available. The x-ray microbeam database [Westbury, J. "X-ray Microbeam Speech Production Database User's Handbook, version 1" (1994)] includes several utterances produced with pellets-off and -on, which allowed us to evaluate effects of pellets for the utterance, She had your dark suit in greasy wash water all year, using acoustic and perceptual measures. Overall, there were no acoustic or perceptual measures that showed consistent effects of pellets across speakers, but certain effects were consistent either within a given speaker or in direction across a subgroup of the speakers. The results are discussed in terms of the general goodness of the assumption that point parameterization of lingual motion does not interfere with normal articulatory behaviors. A brief screening procedure is suggested to protect articulatory kinematic experiments from those individuals who may show consistent effects of having devices placed on perioral structures. 相似文献

19.

The relation of lung volume initiation to selected acoustic properties of speech

Watson PJ Ciccia AH Weismer G 《The Journal of the Acoustical Society of America》2003,113(5):2812-2819

This study examined the relationship of speech breathing to other elements of speech production. It was hypothesized that initiating speech from different lung volumes would have an effect on different elements of the acoustic output. It was postulated that effects may be brought about by mechanical interaction as well as a dispersion of effort to mechanically unlinked elements of speech production, such as articulatory behavior. To this end, selected acoustic variables were studied in eight young healthy women who initiated speech from low, typical, and high lung volume levels. The acoustic variables studied were selected because they have been shown to be sensitive indicators of speech production performance. It was found that with increasing lung volume initiation levels, average sound pressure level, average fundamental frequency, and declination rate of fundamental frequency increased. It was also observed that vowel space was significantly smaller during low lung volume initiation levels relative to typical lung volume initiation levels. Vowel space reduction is discussed relative to "gaining down." 相似文献

20.

Effects of prosodic boundary on /aC/ sequences: acoustic results

Tabain M 《The Journal of the Acoustical Society of America》2003,113(1):516-531

This study presents various acoustic measures used to examine the sequence /a # C/, where "#" represents different prosodic boundaries in French. The 6 consonants studied are /b d g f s S/ (3 stops and 3 fricatives). The prosodic units investigated are the utterance, the intonational phrase, the accentual phrase, and the word. It is found that vowel target values, formant transitions into the stop consonant, and the rate of change in spectral tilt into the fricative, are affected by the strength of the prosodic boundary. F1 becomes higher for /a/ the stronger the prosodic boundary, with the exception of one speaker's utterance data, which show the effects of articulatory declension at the utterance level. Various effects of the stop consonant context are observed, the most notable being a tendency for the vowel /a/ to be displaced in the direction of the F2 consonant "locus" for /d/ (the F2 consonant values for which remain relatively stable across prosodic boundaries) and for /g/ (the F2 consonant values for which are displaced in the direction of the velar locus in weaker prosodic boundaries, together with those of the vowel). Velocity of formant transition may be affected by prosodic boundary (with greater velocity at weaker boundaries), though results are not consistent across speakers. There is also a tendency for the rate of change in spectral tilt moving from the vowel to the fricative to be affected by the presence of a prosodic boundary, with a greater rate of change at the weaker prosodic boundaries. It is suggested that spectral cues, in addition to duration, amplitude, and F0 cues, may alert listeners to the presence of a prosodic boundary. 相似文献