首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
《Journal of voice》2019,33(6):851-859
PurposeThe pitch-shift reflex (PSR) is the adaptation of the fundamental frequency during phonation and speech and describes the auditory feedback control. Speakers without voice and speech disorders mostly show a compensation of the pitch change in the auditory feedback and adapt their fundamental frequency to the opposite direction. Dysphonic patients often display problems with the auditory perception and control of their voice during therapy. Our study focuses on the auditory and kinesthetic control mechanisms of patients with muscle tension dysphonia (MTD) and speakers without voice and speech problems. Main purpose of the study is the analysis of the functionality of the control mechanisms within phonation and speech between patients with MTD and normal speakers.MethodSixty-one healthy subjects (17 male, 44 female) and 22 patients with MTD (7 male, 15 female) participated following two paradigms including a sustained phonation (vowel /a/) and speech ([‘mama]). Within both paradigms the fundamental frequency of the auditory feedback was increased synthetically. For the analysis of the PSR the electroencephalogram, electroglottography, the voice signal, and the high-speed endoscopy data were recorded simultaneously. The PSR in the electroencephalogram was detected via the N100 and the mismatch negativity. Statistical tests were applied for the detection of the PSR in the physiological response within the electroglottography, voice, and high-speed endoscopy signals. The results were compared between both groups.ResultsNo differences were found between the controls and patients with MTD regarding latency and magnitude of the perception of the pitch shift in both paradigms, but for the magnitude of the behavioral response. Differences also could be found for both groups between the “no pitch” and “pitch” condition of the two paradigms regarding vocal fold dynamics and voice quality. Patients with MTD showed more vibrational irregularities during the PSR than the controls, especially regarding the symmetry of vocal fold dynamics.ConclusionPatients with MTD seem to have a disturbed interaction between the auditory and kinesthetic feedback inducing the execution of an overriding behavioral response.  相似文献   

2.
To clarify the role of formant frequency in the perception of pitch in whispering, we conducted a preliminary experiment to determine (1.) whether speakers change their pitch during whispering; (2.) whether listeners can perceive differences in pitch; and (3.) what the acoustical features are when speakers change their pitch. The listening test of whispered Japanese speech demonstrates that one can determine the perceived pitch of vowel /a/ as ordinary, high, or low. Acoustical analysis revealed that the perception of pitch corresponds to some formant frequencies. Further data with synthesized whispered voice are necessary to confirm the importance of the formant frequencies in detail for perceived pitch of whispered vowels.  相似文献   

3.
The performance of the human pitch control system was characterized by measurement of the speed of pitch shift and pitch shift response speed (inverse of reaction time) at various initial pitch and loudness levels. Data from three nonsinger adult male subjects and one professional singer suggest a strong inverse correlation (r greater than 0.78) between initial pitch and rate of pitch rise. This study showed no significant relation between initial loudness and rate of pitch rise. Also, vocal response speed showed no significant relation with either initial pitch or loudness. However, it is suggested that pitch shift response speed might be related to the second formant frequency of the target vowel. A composite index of pitch control performance capacity was defined as the product of response speed and vocal fold contractile velocity. From experimental data, the composite index was able to reflect a distinct 74% superior performance by the professional singer (relative to the average maximum performance capacity of nonsingers). It is suggested that the product-based composite index of performance capacity can serve as a sensitive means for vocal proficiency determination.  相似文献   

4.
汉语语句通常存在音高下倾现象,然而关于语句内部韵律词的具体音高表现目前的研究尚较欠缺。本研究使用的对话语料选自973电话语料库,包括69段对话,涉及79位说话人;朗读话语语料为广播电台两位主持人的新闻播音,长度为221个语句,对语句内部韵律词的高音点、低音点及音域进行了分析,结果显示对话与朗读话语多数语句的音高呈前高后低的走势,不过口语对话较长语句前半段的音高下降趋势不太明显。与朗读话语相比,口语对话韵律词的音域通常比较小。对话语句最后一个韵律词的音域相对较大,而朗读话语内部韵律词的音域大多没有差异。本研究的结果,将有助于语音合成中语句内部韵律词音阶及音域的构拟。   相似文献   

5.
Previous studies have demonstrated that motor control of segmental features of speech rely to some extent on sensory feedback. Control of voice fundamental frequency (F0) has been shown to be modulated by perturbations in voice pitch feedback during various phonatory tasks and in Mandarin speech. The present study was designed to determine if voice Fo is modulated in a task-dependent manner during production of suprasegmental features of English speech. English speakers received pitch-modulated voice feedback (+/-50, 100, and 200 cents, 200 ms duration) during a sustained vowel task and a speech task. Response magnitudes during speech (mean 31.5 cents) were larger than during the vowels (mean 21.6 cents), response magnitudes increased as a function of stimulus magnitude during speech but not vowels, and responses to downward pitch-shift stimuli were larger than those to upward stimuli. Response latencies were shorter in speech (mean 122 ms) compared to vowels (mean 154 ms). These findings support previous research suggesting the audio vocal system is involved in the control of suprasegmental features of English speech by correcting for errors between voice pitch feedback and the desired F0.  相似文献   

6.
This study was designed to test the hypothesis that the kinematic manipulations used by speakers in different speaking conditions are influenced by kinematic performance limits. A range of kinematic parameter values was elicited by having seven subjects produce cyclical CV movements of lips, tongue blade and tongue dorsum (/ba/, /da/, /ga/), at rates ranging from 1 to 6 Hz. The resulting measures were used to establish speaker- and articulator-specific kinematic performance spaces, defined by movement duration, displacement and peak speed. These data were compared with speech movement data produced by the subjects in several different speaking conditions in the companion study (Perkell et al., 2002). The amount of overlap of the speech data and cyclical data varied across speakers, from almost no overlap to complete overlap. Generally, for a given movement duration, speech movements were larger than cyclical movements, indicating that the speech movements were faster and were produced with greater effort, according to the performance space analysis. It was hypothesized that the cyclical movements of the tongue and lips were slower than the speech movements because they were more constrained by (coupled to) the relatively massive mandible. To test this hypothesis, a comparison was made of cyclical movements in maxillary versus mandibular frames of reference. The results indicate that the cyclical movements were not strongly constrained by mandible movements. The overall results generally indicate that the cyclical task did not succeed in defining the upper limits of kinematic performance spaces within which the speech data were confined. Thus, the hypothesis that performance limits influence speech kinematics could not be tested effectively. The differences between the speech and cyclical movements may be due to other factors, such as differences in speakers' "skill" with the two types of movement, or the size of the movements--the speech movements were larger, probably because of a well-defined target for the primary, stressed vowel.  相似文献   

7.
This study explores the hypothesis that clear speech is produced with greater "articulatory effort" than normal speech. Kinematic and acoustic data were gathered from seven subjects as they pronounced multiple repetitions of utterances in different speaking conditions, including normal, fast, clear, and slow. Data were analyzed within a framework based on a dynamical model of single-axis frictionless movements, in which peak movement speed is used as a relative measure of articulatory effort (Nelson, 1983). There were differences in peak movement speed, distance and duration among the conditions and among the speakers. Three speakers produced the "clear" condition utterances with movements that had larger distances and durations than those for "normal" utterances. Analyses of the data within a peak speed, distance, duration "performance space" indicated increased effort (reflected in greater peak speed) in the clear condition for the three speakers, in support of the hypothesis. The remaining four speakers used other combinations of parameters to produce the clear condition. The validity of the simple dynamical model for analyzing these complex movements was considered by examining several additional parameters. Some movement characteristics differed from those required for the model-based analysis, presumably because the articulators are complicated structurally and interact with one another mechanically. More refined tests of control strategies for different speaking styles will depend on future analyses of more complicated movements with more realistic models.  相似文献   

8.
In intonation research, prominence-lending pitch movements have either been described on a linear or on a logarithmic frequency scale. An experiment has been carried out to check whether pitch movements in speech intonation are perceived on one of these two scales or on a psychoacoustic scale representing the frequency selectivity of the auditory system. This last scale is intermediary between the other two scales. Subjects matched the excursion size of prominence-lending pitch movements in utterances resynthesized in different pitch registers. Their task was to adjust the excursion size in a comparison stimulus in such a way that it lent equal prominence to the corresponding syllable in a fixed test stimulus. The comparison stimulus and the test stimulus had pitches running parallel on either the logarithmic frequency scale, the psychoacoustic scale, or the linear frequency scale. In one-half of the experimental sessions, the test stimulus was presented in the low register, while the comparison stimulus was presented in the high register, and, conversely, for the other half of the sessions. The result is that, in all cases, stimuli are matched in such a way that the average excursion sizes in different registers are equal on the psychoacoustic scale.  相似文献   

9.
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.  相似文献   

10.
The purpose of the present study was to compare the speech performance of four types of alaryngeal phonation-electrolaryngeal (EL), pneumatic artificial laryngeal (PA), tracheoesophageal (TE), and standard esophageal (SE) speech-by adult Cantonese-speaking laryngectomees. Subjective ratings of (1) voice quality, (2) articulation proficiency, (3) quietness of speech, (4) pitch variability, and (5) overall speech intelligibility were given by eight naive individuals who had no prior experience with any form of alaryngeal speech. Results indicated that SE and TE speech was perceived to be more hoarse than PA and EL speech. EL speech was associated with significantly less pitch variability, and PA speakers produced speech with the least amount of perceived noise. However, articulation proficiency and overall speech intelligibility were found to be comparable in all four types of alaryngeal speakers.  相似文献   

11.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

12.
Trained singers and nonsingers vocally shadowed sequences of rapidly changing tones. Tone changes within the sequences were unpredictable in terms of direction and extent of frequency change. Subjects' responses to the shadowing task could be evaluated for accuracy of frequency matching, and for time and speed of voice frequency change. In addition, subjects' transitions between tones could be classified as hit, overshoot, undershoot, or oscillate. The two groups were equally accurate in matching the pitches of tones comprising the sequences. Similarly, pitch lowering was faster than pitch raising for both groups of subjects, while speeds for both lowering and raising increased with increases in size of the interval between tones. However, singers required less time than nonsingers in effecting transitions, apparently because they achieved faster peak speeds and took more direct paths between tones. Implications of the data for physiological and mechanical aspects of voice frequency control are discussed.  相似文献   

13.
The primary purpose of this study was to investigate the aerodynamic characteristics of laryngectomees under two conditions: breathing quietly and speaking with electrolarynx. Twenty male adult subjects, 8 normal speakers, and 12 laryngectomees participated the experiment. Airflow, pressure, and speech data were obtained simultaneously. The acceptability of electrolarynx speech under different conditions was also evaluated by 20 listeners (14 men, 6 women). Results indicated a higher peak expiration airflow and pressure among the laryngectomees as compared with the normal during breathing. Three different breathing patterns appeared among the laryngectomees when speaking with the electrolarynx: holding breath, exhaling, and breathing. Four long-time electrolarynx users held breath during speaking. Seven of 12 laryngectomees kept exhaling, whereas only 1 could breathe during speech production. In addition, (1) the acceptability of electrolarynx speech was the highest when speaking breathlessly; (2) no significant difference was found in the acceptability between the patterns of exhaling and breathing smoothly; and (3) the acceptability decreased if breathing quickly during phonation with the electrolarynx. It also suggests that the laryngectomees who can breathe during speaking may be more appropriate to use the new electrolarynx controlling the pitch by expiration pressure.  相似文献   

14.
I.IntroductionEady[1]examinedhowthefundamentalfrequencyF,patternsintone1anguage-Chinesearcdifferentfromthatinstresslang1agnEnglish.HearguedthatthcF,patternsinatonelanguagearesystematica1lydifferentfromthoseinastress1anguageandhisfindingcontradictstheclaimofBo1ingerthat-humanspcakerseverywhercdoessentiallythesamethingwithfundamentalpiteh'.EadyusedcepstalmethodtOdoF,extractionsohecannotgetaninsightintothemicrostructureofthelaryngea1vibrahonsprecisely,althoughanaveragerateofchangeinF,forevery…  相似文献   

15.
Pitch detection is an important part of speech recognition and speech processing. In this paper, a pitch detection algorithm based on second generation wavelet transform was developed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform. The proposed pitch detection algorithm was tested for both real speech and synthetic speech signal. Some experiments were carried out under noisy environment condition to evaluate the accuracy and robustness of the proposed algorithm. Results showed that the proposed algorithm was robust to noise and provided accurate estimates of the pitch period for both low-pitched and high-pitched speakers. Moreover, different wavelet filters that were obtained using second generation wavelet transform were considered to see the effects of them on the proposed algorithm. It was noticed that Haar filter showed good performance as compared to the other wavelet filters.  相似文献   

16.
The voice conversion (VC) technique recently has emerged as a new branch of speech synthesis dealing with speaker identity. In this work, a linear prediction (LP) analysis is carried out on speech signals to obtain acoustical parameters related to speaker identity - the speech fundamental frequency, or pitch, voicing decision, signal energy, and vocal tract parameters. Once these parameters are established for two different speakers designated as source and target speakers, statistical mapping functions can then be applied to modify the established parameters. The mapping functions are derived from these parameters in such a way that the source parameters resemble those of the target. Finally, the modified parameters are used to produce the new speech signal. To illustrate the feasibility of the proposed approach, a simple to use voice conversion software has been developed. This VC technique has shown satisfactory results. The synthesized speech signal virtually matching that of the target speaker.  相似文献   

17.
The role of auditory feedback in speech motor control was explored in three related experiments. Experiment 1 investigated auditory sensorimotor adaptation: the process by which speakers alter their speech production to compensate for perturbations of auditory feedback. When the first formant frequency (F1) was shifted in the feedback heard by subjects as they produced vowels in consonant-vowel-consonant (CVC) words, the subjects' vowels demonstrated compensatory formant shifts that were maintained when auditory feedback was subsequently masked by noise-evidence of adaptation. Experiment 2 investigated auditory discrimination of synthetic vowel stimuli differing in F1 frequency, using the same subjects. Those with more acute F1 discrimination had compensated more to F1 perturbation. Experiment 3 consisted of simulations with the directions into velocities of articulators model of speech motor planning, which showed that the model can account for key aspects of compensation. In the model, movement goals for vowels are regions in auditory space; perturbation of auditory feedback invokes auditory feedback control mechanisms that correct for the perturbation, which in turn causes updating of feedforward commands to incorporate these corrections. The relation between speaker acuity and amount of compensation to auditory perturbation is mediated by the size of speakers' auditory goal regions, with more acute speakers having smaller goal regions.  相似文献   

18.
The purpose of this research was to obtain information on the mean fundamental frequency (Fo) levels of Portuguese speakers and to investigate (1) whether differences exist between two groups with different voice qualities (normal and dysphonia), genders, ages, and speech tasks; and (2) which variables contribute to the variance. To this end, 109 subjects (52 dysphonics and 57 controls) participated in the study. Speech material included three sustained vowels ([u], [i], and [a]), a standard written passage, and a conversation produced at a comfortable conversational pitch and loudness level. Electroglottographic (EGG) data were obtained. Although the results show that, overall, the dysphonics had lower Fo than did controls for all speaking conditions, and that there were differences according to age, these contrasts were not statistically significant. Gender and speech task effects were statistically significant. Additionally, the mean Fo variance is explained by a high prediction model (between 59% and 73%).  相似文献   

19.
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.  相似文献   

20.
对汉语普通话新闻语篇朗读语料的分析表明,被置于语段中的小句,作为重音标志的音高和音长将发生变化。语段小句与孤立小句相比,音高变化集中表现在小句调核上,是高音点的整体降低,而不同类别的重音,音高降低的程度不同。在语段中,非语段重音的小句重音呈现出较明显的弱化,即表现为音高降低和音节时长缩短。在多个小句构成的语段中,说话人可以利用各小句重音的强弱变化来实现对语段的韵律调节,进而实现对语篇韵律的整体控制和顺畅的语义表达。语段重音及小句重音的研究将实验语音学引进了播音语言教学,也有助于汉语合成语音的韵律控制。   相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号