首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
It has been suggested that pauses between words could act as indices of processes such as selection, retrieval or planning that are required before an utterance is articulated. For normal meaningful phrase utterances, there is hardly any information regarding the relationship between articulation and pause duration and their subsequent relation to the final phrase duration. Such associations could provide insights into the mechanisms underlying the planning and execution of a vocal utterance. To execute a fluent vocal utterance, children might adopt different strategies in development. We investigate this hypothesis by examining the roles of articulation time and pause duration in meaningful phrase utterances in 46 children between the ages of 4 and 8 years, learning English as a second language.Our results indicate a significant reduction in phrase, word and interword pause duration with increasing age. A comparison of pause, word and phrase duration for individual subjects belonging to different age groups indicates a changing relationship between pause and word duration for the production of fluent speech. For the youngest children, a strong correlation between pause and word duration indicates local planning at word level for speech production and thus greater dependence of pause on immediate word utterance. In contrast for the oldest children we find a significant drop in correlation between word and pause indicating the emergence of articulation and pause planning as two independent processes directed at producing a fluent utterance. Strong correlations between other temporal parameters indicate a more holistic approach being adopted by the older children for language production.  相似文献   

2.
徐冬冬 《应用声学》2021,40(2):194-199
具有自注意机制的Transformer网络在语声识别研究领域渐渐得到广泛关注.该文围绕着将位置信息嵌入与语声特征相结合的方向,研究更加适合普通话语声识别模型的位置编码方法.实验结果得出,采用卷积编码的输入表示代替正弦位置编码,可以更好地融合语声特征上下文联系和相对位置信息,获得较好的识别效果.训练的语声识别系统是在Tr...  相似文献   

3.
4.
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed.  相似文献   

5.
6.
7.
It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.  相似文献   

8.
Modeling the peripheral speech motor system can advance the understanding of speech motor control and audiovisual speech perception. A 3-D physical model of the human face is presented. The model represents the soft tissue biomechanics with a multilayer deformable mesh. The mesh is controlled by a set of modeled facial muscles which uses a standard Hill-type representation of muscle dynamics. In a test of the model, recorded intramuscular electromyography (EMG) was used to activate the modeled muscles and the kinematics of the mesh was compared with 3-D kinematics recorded with OPTOTRAK. Overall, there was a good match between the recorded data and the model's movements. Animations of the model are provided as MPEG movies.  相似文献   

9.
Computer models of the process of speech articulation require a detailed knowledge of the vocal tract configurations employed in speech and the application of acoustic theory to calculate the sound waveform. Almost all currently available data on vocal tract dimensions come from x-ray films and are severely limited in quantity and coherence due to restrictions on radiation dosage and intersubject differences. We are using MRI techniques to obtain the pharyngeal dimensions of speakers producing sustained vowels. The fact that MRI does not employ ionizing radiation provides speech research with the opportunity to obtain comprehensive bodies of much-needed data on the articulatory characteristics of single subjects.  相似文献   

10.
l introductionUsing articulatory synthesis to produce formant targeted voices can not only succeed theadvantages of a formant synthesizer, viz., the pitch and formant parameters can be axtificiallyspecified or modified flexibly and easily, but also can facilitate high quality with an articulatorysynthesizer. It is a worthy topic in speech techniques.SpeeCh synthesis has been studied deeply and many attentions have been shifted to utilizing prosody in text-to-speech system. However, the ultimat…  相似文献   

11.
Effects of noise on speech production: acoustic and perceptual analyses   总被引:4,自引:0,他引:4  
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.  相似文献   

12.
Direct measures of subglottal pressure obtained through a tracheal puncture were used to calculate laryngeal airway resistance. Six subjects completed tasks including syllable trains and more natural speech samples produced at three loudness levels. Direct calculations of natural speech resistance values were compared with indirect estimates obtained during syllable train production. The degree of correspondence between direct and indirect calculations varied by subject. Overall, the smallest relative errors among calculations occurred for syllable trains, with higher relative errors for the monologue and sentence. For loudness conditions, the smallest and largest relative errors occurred for soft and loud productions, respectively. The clinical utility of indirect estimation is questioned and suggestions for improving its validity are provided.  相似文献   

13.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

14.
15.
Japanese 5- to 13-yr-olds who used cochlear implants (CIs) and a comparison group of normally hearing (NH) Japanese children were tested on their perception and production of speech prosody. For the perception task, they were required to judge whether semantically neutral utterances that were normalized for amplitude were spoken in a happy, sad, or angry manner. The performance of NH children was error-free. By contrast, child CI users performed well below ceiling but above chance levels on happy- and sad-sounding utterances but not on angry-sounding utterances. For the production task, children were required to imitate stereotyped Japanese utterances expressing disappointment and surprise as well as culturally typically representations of crow and cat sounds. NH 5- and 6-year-olds produced significantly poorer imitations than older hearing children, but age was unrelated to the imitation quality of child CI users. Overall, child CI user's imitations were significantly poorer than those of NH children, but they did not differ significantly from the imitations of the youngest NH group. Moreover, there was a robust correlation between the performance of child CI users on the perception and production tasks; this implies that difficulties with prosodic perception underlie their difficulties with prosodic imitation.  相似文献   

16.
A model is presented which predicts the movements of flesh points on the tongue, lips, and jaw during speech production, from time-aligned phonetic strings. Starting from a database of x-ray articulator trajectories, means and variances of articulator positions and curvatures at the midpoints of phonemes are extracted from the data set. During prediction, the amount of articulatory effort required in a particular phonetic context is estimated from the relative local curvature of the articulator trajectory concerned. Correlations between position and curvature are used to directly predict variations from mean articulator positions due to coarticulatory effects. Use of the explicit coarticulation model yields a significant increase in articulatory modeling accuracy with respect to x-ray traces, as compared with the use of mean articulator positions alone.  相似文献   

17.
This study used glossometry to examine the position of the tongue and the velocity of its movements in vowels spoken normally and at a self-selected fast rate. The subject in experiment 1 showed lingual undershoot for stressed vowels in "a big again" and "a bob again." The tongue was lower for /I/ and higher for /a/ at the fast rate than at the normal rate. The stressed vowels exerted an affect on unstressed vowels: The tongue was lower in the schwas that preceded and followed /a/ than /I/. Only one of the three subjects in experiment 2 showed no lingual undershoot for fast-rate /I/. The tongue was higher at the fast rate than at the normal rate in the schwas flanking /I/ so that the displacement was less at the fast rate than at the normal rate. Another talker increased the peak velocity of tongue movements at the fast rate and showed no undershoot for /a/. Multiple regression analyses showed that the timing of movements for successive phonetic segments accounted well for undershoot in only one of the three subjects. The results suggest that in order to model the effects of speaking rate on the tongue movements used in forming stressed vowels, it will be necessary to take into account: (1) how much vowels are shortened at a fast rate: (2) how much the peak velocity of tongue movements is increased, if at all; and (3) the position of the tongue before and after the stressed vowels. All three factors are likely to be influenced by how clearly the talker wishes to speak.  相似文献   

18.
Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production. Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal-tract shaping during fluent speech using MRI. The present study uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5-T CV/i scanner. This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and interarticulator timing can all be quantitatively evaluated. Data show clear real-time movements of the lips, tongue, and velum. Sample movies and data analysis strategies are presented.  相似文献   

19.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

20.
This paper examines how breathing differs in the upright and supine body positions. Passive and active forces and associated chest wall motions are described for resting tidal breathing and speech breathing performed in the two positions. Clinical implications are offered regarding evaluation and treatment of breathing behavior in clients with speech and voice disorders.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号