首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In order to investigate control of voice fundamental frequency (F0) in speaking and singing, 24 adults had to utter the nonsense word ['ta:tatas] repeatedly, while in selected trials their auditory feedback was frequency-shifted by 100 cents downwards. In the speaking condition the target speech rate and prosodic pattern were indicated by a rhythmic sequence made of white noise. In the singing condition the sequence consisted of piano notes, and subjects were instructed to match the pitch of the notes. In both conditions a response in voice F0 begins with a latency of about 150 ms. As predicted, response magnitude is greater in the singing condition (66 cents) than in the speaking condition (47 cents). Furthermore the singing condition seems to prolong the after-effect which is a continuation of the response in trials after the frequency shift. In the singing condition, response magnitude and the ability to match the target F0 correlate significantly. Results support the view that in speaking voice F0 is monitored mainly supra-segmentally and controlled less tightly than in singing.  相似文献   

2.
The differences of speaking frequency and intensity in different tonal dialects has not been widely investigated. The purposes of this study were (1) to compare the speaking frequency and speaking intensity ranges of Mandarin and Min and (2) to compare the speaking frequency and intensity ranges of Mandarin and Min to those of American English. The subjects were 80 normal Taiwanese adults divided into two dialect groups, Mandarin and Min. The speaking F0, the highest speaking F0, the lowest speaking F0, the maximum range of speaking F0, and the intensity counterpart were obtained from reading in their native dialects. Statistical analysis revealed that Min speakers had a significantly greater maximum range of speaking intensity and a smaller lowest speaking intensity than Mandarin speakers, which indicated tonal effects by speakers of the Min dialect. Moreover, Mandarin and Min speakers had a greater maximum range of speaking F0 and maximum range of speaking intensity than American English speakers. The data may provide an assessment tool for Mandarin speakers and Min speakers.  相似文献   

3.
Nineteen trained soprano singers aged 18–30 years vocalized tasks designed to assess average speaking fundamental frequency (SFF) during spontaneous speaking and reading. Vocal range and perceptual characteristics while singing with low intensity and high frequency were also assessed, and subjects completed a survey of vocal habits/symptoms. Recorded signals were digitized prior to being analyzed for SFF using the Kay Computerized Speech Lab program. Subjects were assigned to a normal voice or impaired voice group based on ratings of perceptual tasks and survey results. Data analysis showed group differences in mean SFF, no differences in vocal range, higher mean SFF values for reading than speaking, and 58% ability to perceive speaking in low pitch. The role of speaking in too low pitch as causal for vocal symptoms and need for voice classification differentiation in vocal performance studies are discussed.  相似文献   

4.
A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.  相似文献   

5.
A model for fundamental frequency (F0, or commonly pitch) employing a functional principal component (FPC) analysis framework is presented. The model is applied to Mandarin Chinese; this Sino-Tibetan language is rich in pitch-related information as the relative pitch curve is specified for most syllables in the lexicon. The approach yields a quantification of the influence carried by each identified component in relation to original tonal content, without formulating any assumptions on the shape of the tonal components. The original five speaker corpus is preprocessed using a locally weighted least squares smoother to produce F0 curves. These smoothed curves are then utilized as input for the computation of FPC scores and their corresponding eigenfunctions. These scores are analyzed in a series of penalized mixed effect models, through which meaningful categorical prototypes are built. The prototypes appear to confirm known tonal characteristics of the language, as well as suggest the presence of a sinusoid tonal component that is previously undocumented.  相似文献   

6.
Three experiments were conducted to examine the effects of trial-to-trial variations in speaking style, fundamental frequency, and speaking rate on identification of spoken words. In addition, the experiments investigated whether any effects of stimulus variability would be modulated by phonetic confusability (i.e., lexical difficulty). In Experiment 1, trial-to-trial variations in speaking style reduced the overall identification performance compared with conditions containing no speaking-style variability. In addition, the effects of variability were greater for phonetically confusable words than for phonetically distinct words. In Experiment 2, variations in fundamental frequency were found to have no significant effects on spoken word identification and did not interact with lexical difficulty. In Experiment 3, two different methods for varying speaking rate were found to have equivalent negative effects on spoken word recognition and similar interactions with lexical difficulty. Overall, the findings are consistent with a phonetic-relevance hypothesis, in which accommodating sources of acoustic-phonetic variability that affect phonetically relevant properties of speech signals can impair spoken word identification. In contrast, variability in parameters of the speech signal that do not affect phonetically relevant properties are not expected to affect overall identification performance. Implications of these findings for the nature and development of lexical representations are discussed.  相似文献   

7.
This paper presents a systematic comparison of various measures of f0 range in female speakers of English and German. F0 range was analyzed along two dimensions, level (i.e., overall f0 height) and span (extent of f0 modulation within a given speech sample). These were examined using two types of measures, one based on "long-term distributional" (LTD) methods, and the other based on specific landmarks in speech that are linguistic in nature ("linguistic" measures). The various methods were used to identify whether and on what basis or bases speakers of these two languages differ in f0 range. Findings yielded significant cross-language differences in both dimensions of f0 range, but effect sizes were found to be larger for span than for level, and for linguistic than for LTD measures. The linguistic measures also uncovered some differences between the two languages in how f0 range varies through an intonation contour. This helps shed light on the relation between intonational structure and f0 range.  相似文献   

8.
The need for standardization of procedures in approaches to voice measurement has been recently emphasized. The purpose of this study was to determine the extent to which the acoustic perturbation measurements from three different analysis systems agree when standardized recording and analysis procedures are used. High-quality acoustic voice recordings from 20 patients were analyzed. The results showed that, although fundamental frequency measurements were in strong agreement among the three systems tested, frequency and amplitude perturbation measurements were not in agreement. The underlying approaches to perturbation measurement appeared to be sufficiently different to produce different results. An argument is made for a standardized set of acoustic signals representing normal, dysphonic, and synthesized voices with known characteristics to facilitate testing of new acoustic analysis systems and confirm measurement accuracy and sensitivity.  相似文献   

9.
The purpose of this paper was to evaluate the reliability of, and agreement among, six speech analysis systems in the determination of fundamental frequency. Five male and five female speakers provided oral reading and sustained vowel samples for analysis. Each sample was analyzed five times by each system. The results indicated high reliability for all of the systems for both sexes and both utterance types. Agreement among the systems was high for the male sustained vowels and the female oral reading samples. In contrast, poor agreement occurred among the systems for the male oral reading samples and the female sustained vowels. The findings indicate that the output of these automatic methods tends to be consistent over repeated trials within the systems in their extraction of fundamental frequency; however, agreement among these systems varies.  相似文献   

10.
Measured in this study was the ability of eight hearing and five deaf subjects to identify the stress pattern in a short sentence from the variation in voice fundamental frequency (F0), when presented aurally (for hearing subjects) and when transformed into vibrotactile pulse frequency. Various transformations from F0 to pulse frequency were tested in an attempt to determine an optimum transformation, the amount of F0 information that could be transmitted, and what the limitations in the tactile channel might be. The results indicated that a one- or two-octave reduction of F0 vibrotactile frequency (transmitting every second or third glottal pulse) might result in a significant ability to discriminate the intonation patterns associated with moderate-to-strong patterns of sentence stress in English. However, accurate reception of the details of the intonation pattern may require a slower than normal pronounciation because of an apparent temporal indeterminacy of about 200 ms in the perception of variations in vibrotactile frequency. A performance deficit noted for the two prelingually, profoundly deaf subjects with marginally discriminable encodings offers some support for our previous hypothesis that there is a natural association between auditory pitch and perceived vibrotactile frequency.  相似文献   

11.
Thresholds (F0DLs) were measured for discrimination of the fundamental frequency (F0) of a group of harmonics (group B) embedded in harmonics with a fixed F0. Miyazono and Moore [(2009). Acoust. Sci. & Tech. 30, 383386] found a large training effect for tones with high harmonics in group B, when the harmonics were added in cosine phase. It is shown here that this effect was due to use of a cue related to pitch pulse asynchrony (PPA). When PPA cues were disrupted by introducing a temporal offset between the envelope peaks of the harmonics in group B and the remaining harmonics, F0DLs increased markedly. Perceptual learning was examined using a training stimulus with cosine-phase harmonics, F0 = 50 Hz, and high harmonics in group B, under conditions where PPA was not useful. Learning occurred, and it transferred to other cosine-phase tones, but not to random-phase tones. A similar experiment with F0 = 100 Hz showed a learning effect which transferred to a cosine-phase tone with mainly high unresolved harmonics, but not to cosine-phase tones with low harmonics, and not to random-phase tones. The learning found here appears to be specific to tones for which F0 discrimination is based on distinct peaks in the temporal envelope.  相似文献   

12.
The aim of this paper is to answer the question whether "perception-action" dissociation, which is well documented in vision, may also be found in auditory information processing. Trained singers were asked to produce vowel sounds into a microphone. The sound that each singer produced was fed back to their ears via headphones. Two seconds after the sound production had begun, the auditory feedback was shifted in pitch by a certain degree (9, 19, 50, or 99 cents in either direction). In every set of sounds, instances without any pitch shifts also appeared. After each trial, participants reported whether they were aware of a pitch change or not. It was found that even though the participants were unaware of subtle pitch changes, the fundamental frequency of their vowel production was found to shift slightly in the opposite direction to the pitch shift. These results show that auditory information is processed by two separate systems: one for perception and one for action. They also show that the function of the auditory control system differs from the visual control system. The latter is used to control bodily movements while the function of the former is a nonconscious, instant control of vocalization.  相似文献   

13.
A series of experiments was carried out to investigate how fundamental frequency declination is perceived by speakers of English. Using linear predictor coded speech, nonsense sentences were constructed in which fundamental frequency on the last stressed syllable had been systematically varied. Listeners were asked to judge which stressed syllable was higher in pitch. Their judgments were found to reflect normalization for expected declination; in general, when two stressed syllables sounded equal in pitch, the second was actually lower. The pattern of normalization reflected certain major features of production patterns: A greater correction for declination was made for wide pitch range stimuli than for narrow pitch range stimuli. The slope of expected declination was less for longer stimuli than for shorter ones. Lastly, amplitude was found to have a significant effect on judgments, suggesting that the amplitude downdrift which normally accompanies fundamental frequency declination may have an important role in the perception of phrasing.  相似文献   

14.
Intrinsic fundamental frequency of vowels in sentence context   总被引:1,自引:0,他引:1  
High vowels have a higher intrinsic fundamental frequency (F0) than low vowels. This phenomenon has been verified in several languages. However, most studies of intrinsic F0 of vowels have used words either in isolation or bearing the main phrasal stress in a carrier sentence. As a first step towards an understanding of how the intrinsic F0 of vowels interacts with intonation in running speech, this study examined F0 of the vowels [i,a,u] in four sentence positions. The four speakers used for this study showed a statistically significant main effect of intrinsic F0 (high vowels had higher F0). Three of the four speakers also showed an interaction between intrinsic F0 and sentence position such that no significant F0 difference was observed in the unaccented, sentence-final position. The interaction was shown not to be due to vowel neutralization or correlated with changes in the glottal waveform shape, as evidenced by measures of the first formant frequency and spectral slope. Comparison with studies of tone languages and speech of the deaf suggests that both the lack of accent and the lower F0 caused the reduction in the intrinsic F0 difference.  相似文献   

15.
The dependency of the timbre of musical sounds on their fundamental frequency (F0) was examined in three experiments. In experiment I subjects compared the timbres of stimuli produced by a set of 12 musical instruments with equal F0, duration, and loudness. There were three sessions, each at a different F0. In experiment II the same stimuli were rearranged in pairs, each with the same difference in F0, and subjects had to ignore the constant difference in pitch. In experiment III, instruments were paired both with and without an F0 difference within the same session, and subjects had to ignore the variable differences in pitch. Experiment I yielded dissimilarity matrices that were similar at different F0's, suggesting that instruments kept their relative positions within timbre space. Experiment II found that subjects were able to ignore the salient pitch difference while rating timbre dissimilarity. Dissimilarity matrices were symmetrical, suggesting further that the absolute displacement of the set of instruments within timbre space was small. Experiment III extended this result to the case where the pitch difference varied from trial to trial. Multidimensional scaling (MDS) of dissimilarity scores produced solutions (timbre spaces) that varied little across conditions and experiments. MDS solutions were used to test the validity of signal-based predictors of timbre, and in particular their stability as a function of F0. Taken together, the results suggest that timbre differences are perceived independently from differences of pitch, at least for F0 differences smaller than an octave. Timbre differences can be measured between stimuli with different F0's.  相似文献   

16.
Linguistic modality effects on fundamental frequency in speech   总被引:2,自引:0,他引:2  
This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information.  相似文献   

17.
Hydrogenic (two-body) systems are the only atomic systems for which uncertainties in calculations of the energy levels approach the current state of the art in frequency measurement. This article discusses progress in the theory and measurement of transition frequencies in hydrogenic systems. These studies have relevance to the determination of fundamental constants and the testing of physical theories, especially quantum electrodynamics. A set of high accuracy calculable frequency standards could also be realized by using hydrogenic systems.  相似文献   

18.
Georgia Dacakis   《Journal of voice》2000,14(4):549-556
Fundamental frequency for 10 male-to-female transsexuals at longterm follow-up (FUP0) was compared to fundamental frequency at initial consultation (IF0) and at discharge from treatment (DF0). Fundamental frequency (F0) values for the three occasions were significantly different [F(2,18) = 24.79, P < .0001] Group mean fundamental frequencies were 125.5 Hz at initial consultation, 168.1 Hz at discharge, and 146.5 Hz at follow-up. There was a moderate but nonsignificant correlation [r(8) = 0.474, P > .05 ns] between the number of intervention sessions and mean F0 achieved by subjects at discharge (DF0). There was a significant correlation between the number of treatment sessions and the maintenance of F0 increases [r(8) = 0.745, P < .05], although the size of the correlation was more modest (r = 0.476) when the data from one subject who had received 90 treatment sessions were removed.  相似文献   

19.
This paper formalizes and tests two key assumptions of the concept of suprasegmental timing: segmental independence and suprasegmental mediation. Segmental independence holds that the duration of a suprasegmental unit such as a syllable or foot is only minimally dependent on its segments. Suprasegmental mediation states that the duration of a segment is determined by the duration of its suprasegmental unit and its identity, but not directly by the specific prosodic context responsible for suprasegmental unit duration. Both assumptions are made by various versions of the isochrony hypothesis [I. Lehiste, J. Phonetics 5, 253-263 (1977)], and by the syllable timing hypothesis [W. Campbell, Speech Commun. 9, 57-62 (1990)]. The validity of these assumptions was studied using the syllable as suprasegmental unit in American English and Mandarin Chinese. To avoid unnatural timing patterns that might be induced when reading carrier phrase material, meaningful, nonrepetitive sentences were used with a wide range of lengths. Segmental independence was tested by measuring how the average duration of a syllable in a fixed prosodic context depends on its segmental composition. A strong association was found; in many cases the increase in average syllabic duration when one segment was substituted for another (e.g., bin versus pin) was the same as the difference in average duration between the two segments (i.e., [b] versus [p]). Thus, the [i] and [n] were not compressed to make room for the longer [p], which is inconsistent with segmental independence. Syllabic mediation was tested by measuring which locations in a syllable are most strongly affected by various contextual factors, including phrasal position, within-word position, tone, and lexical stress. Systematic differences were found between these factors in terms of the intrasyllabic locus of maximal effect. These and earlier results obtained by van Son and van Santen [R. J. J. H van Son and J. P. H. van Santen, "Modeling the interaction between factors affecting consonant duration," Proceedings Eurospeech-97, 1997, pp. 319-322] showing a three-way interaction between consonantal identity (coronals vs labials), within-word position of the syllable, and stress of surrounding vowels, imply that segmental duration cannot be predicted by compressing or elongating segments to fit into a predetermined syllabic time interval. In conclusion, while there is little doubt that suprasegmental units play important predictive and explanatory roles as phonological units, the concept of suprasegmental timing is less promising.  相似文献   

20.
Better place-coding of the fundamental frequency in cochlear implants   总被引:1,自引:0,他引:1  
In current cochlear implant systems, the fundamental frequency F0 of a complex sound is encoded by temporal fluctuations in the envelope of the electrical signals presented on the electrodes. In normal hearing, the lower harmonics of a complex sound are resolved, in contrast with a cochlear implant system. In the present study, it is investigated whether "place-coding" of the first harmonic improves the ability of an implantee to discriminate complex sounds with different fundamental frequencies. Therefore, a new filter bank was constructed, for which the first harmonic is always resolved in two adjacent filters, and the balance between both filter outputs is directly related to the frequency of the first harmonic. The new filter bank was compared with a filter bank that is typically used in clinical processors, both with and without the presence of temporal cues in the stimuli. Four users of the LAURA cochlear implant participated in a pitch discrimination task to determine detection thresholds for F0 differences. The results show that these thresholds decrease noticeably for the new filter bank, if no temporal cues are present in the stimuli. If temporal cues are included, the differences between the results for both filter banks become smaller, but a clear advantage is still observed for the new filter bank. This demonstrates the feasibility of using place-coding for the fundamental frequency.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号