首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
This study was designed to develop a database for the electroglottographic measurement of fundamental frequency (Fo) in normal subjects in running speech, for reference in the diagnosis and follow-up of dysphonic patients. A prospective pilot study included 20 healthy male volunteers without laryngeal disorder. Electroglottographic recordings of speaking Fo during connected speech (French) were obtained from two texts with different prosodic content. Fo histograms were sensitive to the variation of speaking Fo between both texts. Graphic representation of the range and distribution of the Fo of the speaker were designed as normalized Fo histograms with plot lines at 5th and 95th percentiles. Less than 5% variability of Fo histograms was recorded when recording more than 15 subjects. This pilot study designed a graphic display of standardized electroglottographic Fo measurements during the physiological condition of connected speech. As the degree of Fo variability depends on the phonetic contents of the text and on the language spoken, a separate histogram for normal subjects needs to be developed in each country or at least for each voice laboratory, with a standard, previously chosen text.  相似文献   

2.
The present study attempted to investigate the acoustic characteristics of Mandarin laryngeal and esophageal speech. Eight normal laryngeal and seven esophageal speakers participated in the acoustic experiments. Results from acoustic analyses of syllables /ma/and /ba/ indicated that, F0, intensity, and signal-to-noise ratio of laryngeal speech were significantly higher than those of esophageal speech. However, opposite results were found for vowel duration, jitter, and shimmer. Mean F0, intensity, and word per minute in reading were greater but number of pauses was smaller in laryngeal speech than those in esophageal speech. Similar patterns of F0 contours and vowel duration as a function of tone were found between laryngeal and esophageal speakers. Long-time spectra analysis indicated that higher first and second formant frequencies were associated with esophageal speech than that with normal laryngeal speech.  相似文献   

3.
The purpose of this research was to obtain information on the mean fundamental frequency (Fo) levels of Portuguese speakers and to investigate (1) whether differences exist between two groups with different voice qualities (normal and dysphonia), genders, ages, and speech tasks; and (2) which variables contribute to the variance. To this end, 109 subjects (52 dysphonics and 57 controls) participated in the study. Speech material included three sustained vowels ([u], [i], and [a]), a standard written passage, and a conversation produced at a comfortable conversational pitch and loudness level. Electroglottographic (EGG) data were obtained. Although the results show that, overall, the dysphonics had lower Fo than did controls for all speaking conditions, and that there were differences according to age, these contrasts were not statistically significant. Gender and speech task effects were statistically significant. Additionally, the mean Fo variance is explained by a high prediction model (between 59% and 73%).  相似文献   

4.
This study examined speech breathing patterns during reading bywomen with bilateral vocal fold nodules judged as mildly dysphonic and by women without vocal nodules. Although it might be predictable that the speech breathing patterns of individuals with laryngeal dysfunction will differ from those without laryngeal dysfunction, there is a lack of empirical data to support such assumptions.The results of the current study indicated that glottal airflow was greaterduring reading for the women with vocal nodules and that a larger volume of air was expended both per syllable and per breath group during reading. The rate of speech did not significantly differ between the two groups of women. There was no significant difference for the average duration of the breath groups and no significant difference for the number of syllables spoken per breath group. Additionally, both groups of women demonstrated a similar pattern of inspiratory pause location during the reading. The results suggest that speech breathing patterns associated with dysphonia be examined independently to distinguish specifically the nature of the interaction between the laryngeal dysfunction and the speech breathing pattern. Certainly, more information on how the severity of a voice disorder influences speech breathing is necessary.  相似文献   

5.
Previous studies have demonstrated that motor control of segmental features of speech rely to some extent on sensory feedback. Control of voice fundamental frequency (F0) has been shown to be modulated by perturbations in voice pitch feedback during various phonatory tasks and in Mandarin speech. The present study was designed to determine if voice Fo is modulated in a task-dependent manner during production of suprasegmental features of English speech. English speakers received pitch-modulated voice feedback (+/-50, 100, and 200 cents, 200 ms duration) during a sustained vowel task and a speech task. Response magnitudes during speech (mean 31.5 cents) were larger than during the vowels (mean 21.6 cents), response magnitudes increased as a function of stimulus magnitude during speech but not vowels, and responses to downward pitch-shift stimuli were larger than those to upward stimuli. Response latencies were shorter in speech (mean 122 ms) compared to vowels (mean 154 ms). These findings support previous research suggesting the audio vocal system is involved in the control of suprasegmental features of English speech by correcting for errors between voice pitch feedback and the desired F0.  相似文献   

6.
本文研究产生语音信号中F0曲线的控制机制模型化策略。基于对声带动态行为建立的若干假设,提出一个将复杂的F0喉控制机制简化成可定量化的物理模型,进而导出一种产生局部F0升降模式的控制机制模型函数。由模型参数定义的驱动命令,控制产生两类基本升降特征模式,在对数标尺上,相互独立的驱动命令产生的升降模式代数和,近似给定的F0曲线局部特征。分析合成汉语普通话F0曲线结果表明,该模型函数不仅能高精度地拟合给定F0曲线的局部变化特征,而且主要模型参数同F0模式时序结构存在很好的相关性。提出的模型函数有助于韵律规则总结,为按规则合成F0曲线奠定坚实基础。  相似文献   

7.
Vocal fundamental frequency (Fo) characteristics were sampled for a group of seven young children. The children were followed longitudinally for a 12-month period, spanning preword, single-word, and multiword vocalizations. The Fo characteristics were analyzed with reference to chronological age, vocalization length, and lexicon size. Measures of average Fo and Fo variability changed little during the 12-month period for each child. A rising-falling intonation contour was the most prevalent Fo contour among the children. In general, the influence of vocalization length and language acquisition on measures of Fo was negligible. It is suggested that relative uniformity in vocal Fo exists in early vocalizations across preword and meaningful speech periods.  相似文献   

8.
Covariation in the size of laryngeal and vocal tract structures leads to a moderate correlation between fundamental frequency (F0) and formant frequencies (FFs) in natural speech. A method of adjustment procedure was used to test whether listeners prefer combinations of F0 and FFs that reflect this covariation. Vowel sequences spoken by two men and two women were processed by the STRAIGHT vocoder to construct three sets of frequency-shifted continua. The distributions of "best choice" responses in all three experiments confirm that listeners prefer coordinated patterns of F0 and FF similar to those of natural speech.  相似文献   

9.
This study presents an approach to visualizing intensity regulation in speech. The method expresses a voice sample in a two-dimensional space using amplitude-domain values extracted from the glottal flow estimated by inverse filtering. The two-dimensional presentation is obtained by expressing a time-domain measure of the glottal pulse, the amplitude quotient (AQ), as a function of the negative peak amplitude of the flow derivative (d(peak)). The regulation of vocal intensity was analyzed with the proposed method from voices varying from extremely soft to very loud with a SPL range of approximately 55 dB. When vocal intensity was increased, the speech samples first showed a rapidly decreasing trend as expressed on the proposed AQ-d(peak) graph. When intensity was further raised, the location of the samples converged toward a horizontal line, the asymptote of a hypothetical hyperbola. This behavior of the AQ-d(peak) graph indicates that the intensity regulation strategy changes from laryngeal to respiratory mechanisms and the method chosen makes it possible to quantify how control mechanisms underlying the regulation of vocal intensity change gradually between the two means. The proposed presentation constitutes an easy-to-implement method to visualize the function of voice production in intensity regulation because the only information needed is the glottal flow wave form estimated by inverse filtering the acoustic speech pressure signal.  相似文献   

10.
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction-adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.  相似文献   

11.
The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information.  相似文献   

12.
Strained, strangled, and tremulous vocal qualities that are typically seen in adductor spasmodic dysphonia (ADSD), voice tremor (Tremor), and the spastic dysarthria of amyotrophic lateral sclerosis (ALS) may sound similar and be difficult to differentiate. The purpose of this study was to determine if these vocal qualities of neurologic origin could be differentiated on the basis of acoustic and motor speech parameters. Three groups of subjects (ADSD, ALS, and Tremor) were analyzed by the Motor Speech Profile System (Kay Elemetrics, Lincoln Park, NJ) for fundamental frequency (Fo), standard deviation of Fo, diadochokinetic rate (ddk), standard deviation of ddk, mean intensity and standard deviation of ddk, frequency and amplitude variability in connected speech, and speaking rate in connected speech. Profiles of the three groups are presented with the significant features that differentiated one from the other.  相似文献   

13.
In measuring the effect of subglottal pressure changes on fundamental frequency (Fo) of phonation, the effects of changing laryngeal muscle activity must be eliminated. Several investigators have used a strategy in which pulsatile increases of subglottal pressure are induced by pushing on the chest or abdomen of a phonating subject. Fundamental frequency is then correlated with subglottal pressure changes during an interval before laryngeal response is assumed to occur. The present study was undertaken to repeat such an experiment while monitoring electromyographic (EMG) activity of some laryngeal muscles, to discover empirically the latency of the laryngeal response. The results showed a consistent response to each push, with a latency of about 30 ms. Despite this response, analyses of fundamental frequency versus subglottal pressure changes during the interval of constant EMG activity were in general agreement with previously published values. With respect to the nature of the electromyographic response itself, its timing was found to be within the range of latencies appropriate for peripheral feedback, and was also similar to that for an acoustically--or tactually--elicited startle reflex.  相似文献   

14.
Numerous clinical findings indicate that viscosity of laryngeal mucosa is a crucial factor in glottal perfomance. Experience using experimental test benches has shown the importance of humidifying air stream used to induce vibration in excised larynges. Nevertheless, there is a lack of knowledge particularly regarding the physicochemical properties of laryngeal mucus. The purpose of this study was to research vocal fold vibration in excised larynges using artificial mucus of precisely known viscosity. Eight freshly harvested porcine larynges were examined. Parameters measured were Fo and vocal fold contact time. Measurements were performed under three conditions: basal (no fluid application on vocal cord surface), after application of a fluid of 60cP viscosity (Visc60), and after application of a fluid of 100cP viscosity (Visc100). Electroglottographic measurements were performed at two different times for each condition: 1 s after airflow onset (T1) and 6 seconds after airflow onset (T2). Statistical analysis consisted of comparing data obtained under each condition at T1 and T2. The results showed a significant decrease in Fo after application of Visc60 and Visc100 fluids and a decrease in Fo at T2. Closure time was significantly higher under Visc60 conditions and under Visc100 conditions than under basal conditions. Application of artificial mucus to the mucosa of the vocal folds lowered vibratory frequency and prolonged the contact phase. Our interpretation of this data is that the presence of mucus on the surface of the vocal folds generated superficial tension and caused adhesion, which is a source of nonlinearity in vocal vibration.  相似文献   

15.
Due to the limited number of cochlear implantees speaking Mandarin Chinese, it is extremely difficult to evaluate new speech coding algorithms designed for tonal languages. Access to an intelligibility index that could reliably predict the intelligibility of vocoded (and non-vocoded) Mandarin Chinese is a viable solution to address this challenge. The speech-transmission index (STI) and coherence-based intelligibility measures, among others, have been examined extensively for predicting the intelligibility of English speech but have not been evaluated for vocoded or wideband (non-vocoded) Mandarin speech despite the perceptual differences between the two languages. The results indicated that the coherence-based measures seem to be influenced by the characteristics of the spoken language. The highest correlation (r = 0.91-0.97) was obtained in Mandarin Chinese with a weighted coherence measure that included primarily information from high-intensity voiced segments (e.g., vowels) containing F0 information, known to be important for lexical tone recognition. In contrast, in English, highest correlation was obtained with a coherence measure that included information from weak consonants and vowel/consonant transitions. A band-importance function was proposed that captured information about the amplitude envelope contour. A higher modulation rate (100 Hz) was found necessary for the STI-based measures for maximum correlation (r = 0.94-0.96) with vocoded Mandarin and English recognition.  相似文献   

16.
The abduction quotient, a measure of effective glottal width, was obtained for electroglottographic recordings from a professional operatic baritone singer. The subject produced repeated tokens of the voice qualities breathy, normal, and pressed (or constricted) in both a speech and a singing manner. In the singing manner, the subject produced the three vocal qualities at three pitch levels and three loudness levels. The abduction quotient decreased from breathy to pressed voice, suggesting that the measure corresponds to effective glottal width. The measure was found to be consistently low during all conditions of singing, suggesting that the subject produced all singing tokens with relatively strong laryngeal adduction at the vocal process level. Although the results of this study support the validity and usefulness of the abduction quotient, further verification is needed.  相似文献   

17.
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.  相似文献   

18.
Traditional measures of dysphonia vary in their reliability and in their correlations with perceptions of grade. Measurements of cepstral peak prominence (CPP) have been shown to correlate well with perceptions of breathiness. Because it is a measure of periodicity, CPP should also predict roughness. The ability of CPP and other acoustic measures to predict overall dysphonia and the subcategories of breathiness and roughness in pathological voice samples is explored. Preoperative and postoperative speech samples from 19 patients with unilateral recurrent laryngeal nerve paralysis who underwent operative intervention were analyzed by trained listeners and by measures of smoothed CPP (CPPS), noise-to-harmonic ratio (NHR), amplitude perturbation quotient (APQ), relative average perturbation (RAP), and smoothed pitch perturbation quotient (sPPQ). The data were analyzed with bivariate Pearson correlation statistics. Grade of dysphonia and breathiness ratings correlated better with measurements of CPPS than with the other measures. CPPS from samples of connected speech (CPPS-s) best predicted overall dysphonia. None of the measures were useful in predicting roughness.  相似文献   

19.
Vowel prolongation is often used to evaluate disordered voice production. In light of previous findings showing that co-articulation has significant influence on laryngeal function measures, the practice of using prolonged vowels to represent a speech sample is questioned. To test whether disordered and normal voice during vowel production is generalizable to connected speech, three speaking tasks were investigated: sustained vowel prolongation, syllable repetition and reading. Statistical differences were found between these tasks for certain amplitude and time based laryngeal function measures for adult women with disordered and normal voice. However, for the specific measures which were statistically different, the actual numerical and perceptual differences may be quite small. From a clinical assessment standpoint, the choice of the speech task may not make an apparent difference in the objective evaluation of disordered voice.  相似文献   

20.
Existing objective speech-intelligibility measures are suitable for several types of degradation, however, it turns out that they are less appropriate in cases where noisy speech is processed by a time-frequency weighting. To this end, an extensive evaluation is presented of objective measure for intelligibility prediction of noisy speech processed with a technique called ideal time frequency (TF) segregation. In total 17 measures are evaluated, including four advanced speech-intelligibility measures (CSII, CSTI, NSEC, DAU), the advanced speech-quality measure (PESQ), and several frame-based measures (e.g., SSNR). Furthermore, several additional measures are proposed. The study comprised a total number of 168 different TF-weightings, including unprocessed noisy speech. Out of all measures, the proposed frame-based measure MCC gave the best results (ρ?=?0.93). An additional experiment shows that the good performing measures in this study also show high correlation with the intelligibility of single-channel noise reduced speech.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号