期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Incorporation of phonetic constraints in acoustic-to-articulatory inversion

Potard B Laprie Y Ouni S 《The Journal of the Acoustical Society of America》2008,123(4):2310-2323

This study investigates the use of constraints upon articulatory parameters in the context of acoustic-to-articulatory inversion. These speaker independent constraints, referred to as phonetic constraints, were derived from standard phonetic knowledge for French vowels and express authorized domains for one or several articulatory parameters. They were experimented on in an existing inversion framework that utilizes Maeda's articulatory model and a hypercubic articulatory-acoustic table. Phonetic constraints give rise to a phonetic score rendering the phonetic consistency of vocal tract shapes recovered by inversion. Inversion has been applied to vowels articulated by a speaker whose corresponding x-ray images are also available. Constraints were evaluated by measuring the distance between vocal tract shapes recovered through inversion to real vocal tract shapes obtained from x-ray images, by investigating the spreading of inverse solutions in terms of place of articulation and constriction degree, and finally by studying the articulatory variability. Results show that these constraints capture interdependencies and synergies between speech articulators and favor vocal tract shapes close to those realized by the human speaker. In addition, this study also provides how acoustic-to-articulatory inversion can be used to explore acoustical and compensatory articulatory properties of an articulatory model. 相似文献

2.

Voice and Lifestyle Behaviors of Speech-Language Pathology Students: Impact of History Gathering Method on Self-Reported Data

Jeff Searl Troy Dargin 《Journal of voice》2021,35(1):158.e9-158.e20

ObjectivesThis study described voice use and lifestyle information from student speech-language pathologists (SLP) and assessed the impact of history gathering method on the acquired data.MethodsOne hundred sixty-two SLP students completed a detailed history form and estimated voice and life style parameters at study intake and subsequently tracked the same parameters daily for three consecutive weeks. Nonparametric statistical comparisons were applied to assess differences in estimates at intake versus the 3-week log.ResultsVoice problems diagnosed by a physician or SLP were reported by 11% of the students. A similar percentage reported frequent loud talking and heavy occupational voice demands beyond clinical training use. Furthermore, high stress was reported by 49%, frequent anxiety by 53%, and depression by 17%. Comparing data from study intake relative to the 3-week log, SLP students statistically significantly overestimated speaking time, and underestimated singing, second hand smoke exposure time, and hours of sleep. Additionally, they overestimated water intake and daily stress, and underestimated caffeine and alcohol intake, at the study onset versus the log. The experience of vocal fatigue was common within the 3-week log, but how a student identified at study intake on this parameter (experiencing it frequently or not) did not differentiate how many days of vocal fatigue were reported in 3 weeks.ConclusionsSLP students engage in some voice use and lifestyle behaviors that place them at risk for voice problems. The method of soliciting information about the voice and lifestyle of SLP students impacted the information obtained. Optimal methods of gathering accurate and reliable clinical history and voice us data are needed. 相似文献

3.

Articulatory tradeoffs reduce acoustic variability during American English /r/ production.

F H Guenther C Y Espy-Wilson S E Boyce M L Matthies M Zandipour J S Perkell 《The Journal of the Acoustical Society of America》1999,105(5):2854-2865

The American English phoneme /r/ has long been associated with large amounts of articulatory variability during production. This paper investigates the hypothesis that the articulatory variations used by a speaker to produce /r/ in different contexts exhibit systematic tradeoffs, or articulatory trading relations, that act to maintain a relatively stable acoustic signal despite the large variations in vocal tract shape. Acoustic and articulatory recordings were collected from seven speakers producing /r/ in five phonetic contexts. For every speaker, the different articulator configurations used to produce /r/ in the different phonetic contexts showed systematic tradeoffs, as evidenced by significant correlations between the positions of transducers mounted on the tongue. Analysis of acoustic and articulatory variabilities revealed that these tradeoffs act to reduce acoustic variability, thus allowing relatively large contextual variations in vocal tract shape for /r/ without seriously degrading the primary acoustic cue. Furthermore, some subjects appeared to use completely different articulatory gestures to produce /r/ in different phonetic contexts. When viewed in light of current models of speech movement control, these results appear to favor models that utilize an acoustic or auditory target for each phoneme over models that utilize a vocal tract shape target for each phoneme. 相似文献

4.

On the perception of similarity among talkers

Remez RE Fellowes JM Nagel DS 《The Journal of the Acoustical Society of America》2007,122(6):3688-3696

A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers. 相似文献

5.

Large scale data acquisition of simultaneous MRI and speech

《Applied Acoustics》2014

We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM. 相似文献

6.

The relationship of vocal tract shape to three voice qualities

Story BH Titze IR Hoffman EA 《The Journal of the Acoustical Society of America》2001,109(4):1651-1667

Three-dimensional vocal tract shapes and consequent area functions representing the vowels [i, ae, a, u] have been obtained from one male and one female speaker using magnetic resonance imaging (MRI). The two speakers were trained vocal performers and both were adept at manipulation of vocal tract shape to alter voice quality. Each vowel was performed three times, each with one of the three voice qualities: normal, yawny, and twangy. The purpose of the study was to determine some ways in which the vocal tract shape can be manipulated to alter voice quality while retaining a desired phonetic quality. To summarize any overall tract shaping tendencies mean area functions were subsequently computed across the four vowels produced within each specific voice quality. Relative to normal speech, both the vowel area functions and mean area functions showed, in general, that the oral cavity is widened and tract length increased for the yawny productions. The twangy vowels were characterized by shortened tract length, widened lip opening, and a slightly constricted oral cavity. The resulting acoustic characteristics of these articulatory alterations consisted of the first two formants (F1 and F2) being close together for all yawny vowels and far apart for all the twangy vowels. 相似文献

7.

Effect of Fasting on Voice in Women

Abdul-Latif Hamdan Abla Sibai Charbel Rameh 《Journal of voice》2007,21(4):495-501

OBJECTIVE/HYPOTHESIS: To study the effect of fasting on voice in women: abstinence from food and water intake between 14 and 18 hours. STUDY DESIGN: A prospective study on female subjects. MATERIAL AND METHOD: A total of 28 female subjects were included in this study. Their age ranged between 21 and 45 years. Subjects with vocal symptoms or vocal fold lesions were excluded. The subjects were tested when they were not fasting and while fasting after the first week of intermittent fasting during Ramadan. Each subject was first asked about her vocal symptoms and the ease of phonation or phonatory effort. Then each underwent acoustic analysis and laryngeal video-endostroboscopy. RESULTS: Vocal fatigue was the most common reported complaint (53.6%) followed by deepening of the voice (21.4%) and harshness (10.2%). Self-reported phonatory effort was significantly affected by fasting (P value < 0.001). Out of the 28 subjects, 23 had an increase in their phonatory effort. Vocal acoustic parameters did not change markedly except for the maximum phonation time, which decreased significantly. Laryngeal video-endostroboscopy did not reveal any significant changes during fasting. All stroboscopic parameters were the same except for a decrease in the amplitude of the mucosal waves in one subject and the presence of a posterior chink in three subjects. CONCLUSION: Fasting affects voice. There is an increase in the phonatory effort, and vocal fatigue is the most common symptom. 相似文献

8.

Rhesus macaques spontaneously perceive formants in conspecific vocalizations

Fitch WT Fritz JB 《The Journal of the Acoustical Society of America》2006,120(4):2132-2141

We provide a direct demonstration that nonhuman primates spontaneously perceive changes in formant frequencies in their own species-typical vocalizations, without training or reinforcement. Formants are vocal tract resonances leading to distinctive spectral prominences in the vocal signal, and provide the acoustic determinant of many key phonetic distinctions in human languages. We developed algorithms for manipulating formants in rhesus macaque calls. Using the resulting computer-manipulated calls in a habituation/dishabituation paradigm, with blind video scoring, we show that rhesus macaques spontaneously respond to a change in formant frequencies within the normal macaque vocal range. Lack of dishabituation to a "synthetic replica" signal demonstrates that dishabituation was not due to an artificial quality of synthetic calls, but to the formant shift itself. These results indicate that formant perception, a significant component of human voice and speech perception, is a perceptual ability shared with other primates. 相似文献

9.

Auditory contrast and speaker quality variation in vowel perception

R A Fox 《The Journal of the Acoustical Society of America》1985,77(4):1552-1559

Selective adaption and anchoring effects in speech perception have generated several different hypotheses regarding the nature of contextual contrast, including auditory/phonetic feature detector fatigue, response bias, and auditory contrast. In the present study three different seven-step [hId]-[h epsilon d] continua were constructed to represent a low F0 (long vocal tract source), a high F0 (long vocal tract source), and a high F0 (short vocal tract source), respectively. Subjects identified the tokens from each of the stimulus continua under two conditions: an equiprobable control and an anchoring condition which included an endpoint stimulus from one of the three continua occurring at least three times more often than any other single stimulus. Differential contrast effects were found depending on whether the anchor differed from the test stimuli in terms of F0, absolute formant frequencies, or both. Results were inconsistent with both the feature detector fatigue and response bias hypothesis. Rather, the obtained data suggest that vowel contrast occurs on the basis of normalized formant values, thus supporting a version of the auditory-contrast theory. 相似文献

10.

Estimation of vocal dysperiodicities in disordered connected speech by means of distant-sample bidirectional linear predictive analysis

Bettens F Grenez F Schoentgen J 《The Journal of the Acoustical Society of America》2005,117(1):328-337

The article presents an analysis of vocal dysperiodicities in connected speech produced by dysphonic speakers. The processing is based on a comparison of the present speech fragment with future and past fragments. The size of the dysperiodicity estimate is zero for periodic speech signals. A feeble increase of the vocal dysperiodicity is guaranteed to produce a feeble increase of the estimate. No spurious noise boosting occurs owing to cycle insertion and omission errors, or phonetic segment boundary artifacts. Additional objectives of the study have been investigating whether deviations from periodicity are larger or more commonplace in connected speech than in sustained vowels, and whether sentences that comprise frequent voice onsets and offsets are noisier than sentences that comprise few. The corpora contain sustained vowels as well as grammatically- and phonetically matched sentences. An acoustic marker that correlates with the perceived degree of hoarseness summarizes the size of the dysperiodicities. The marker values for sustained vowels have been highly correlated with those for connected speech, and the marker values for sentences that comprise few voiced/unvoiced transients have been highly correlated with the marker values for sentences that comprise many. 相似文献