首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 343 毫秒
1.
The categorization of voice into quality type (ie, normal, breathy, hoarse, rough) is often a traditional part of the voice diagnostic. The goal of this study was to assess the contributions of various time and spectral-based acoustic measures to the categorization of voice type for a diverse sample of voices collected from both functionally dysphonic (breathy, hoarse, and rough) (n=83) and normal women (n=51). Before acoustic analyses, 12 judges rated all voice samples for voice quality type. Discriminant analysis, using the modal rating of voice type as the dependent variable, produced a 5-variable model (comprising time and spectral-based measures) that correctly classified voice type with 79.9% accuracy (74.6% classification accuracy on cross-validation). Voice type classification was achieved based on two significant discriminant functions, interpreted as reflecting measures related to "Phonatory Instability" and "F(0) Characteristics." A cepstrum-based measure (CPP/EXP ratio) consistently emerged as a significant factor in predicting voice type; however, variables such as shimmer (RMS dB) and a measure of low- vs. high-frequency spectral energy (the Discrete Fourier Transformation ratio) also added substantially to the accurate profiling and prediction of voice type. The results are interpreted and discussed with respect to the key acoustic characteristics that contributed to the identification of specific voice types, and the value of identifying a subset of time and spectral-based acoustic measures that appear sensitive to a perceptually diverse set of dysphonic voices.  相似文献   

2.
The purpose of the present study was to compare the speech performance of four types of alaryngeal phonation-electrolaryngeal (EL), pneumatic artificial laryngeal (PA), tracheoesophageal (TE), and standard esophageal (SE) speech-by adult Cantonese-speaking laryngectomees. Subjective ratings of (1) voice quality, (2) articulation proficiency, (3) quietness of speech, (4) pitch variability, and (5) overall speech intelligibility were given by eight naive individuals who had no prior experience with any form of alaryngeal speech. Results indicated that SE and TE speech was perceived to be more hoarse than PA and EL speech. EL speech was associated with significantly less pitch variability, and PA speakers produced speech with the least amount of perceived noise. However, articulation proficiency and overall speech intelligibility were found to be comparable in all four types of alaryngeal speakers.  相似文献   

3.
The aim of this study was to investigate the acoustic and electroglottographic characteristics of patients with mutational dysphonia before and after voice therapy. The clinical records of 15 patients with mutational dysphonia were reviewed, and their voice recordings were analyzed with the help of the Lx Speech Studio program (Laryngograph Ltd, London, UK). After voice therapy combined with the manual compression method, the subjects' voices lowered in pitch and improved in quality. In addition, we classified the mutational dysphonia into four categories according to the presence of diplophonia and closed quotients. The most common type among the categories was characterized by a bimodal distribution of fundamental frequency (diplophonia), accompanied by a low closed quotient (falsetto voice) at high frequencies. However, the results also showed that mutational dysphonia cannot be generalized as always having a falsetto voice, as shown in other types. The effect of therapy was different for each type, and those cases with both diplophonia and a non-trained falsetto voice could be treated more readily. Consequently, the diplophonia and closed quotient, which were easily analyzed using Lx Speech Studio program, are important factors in the classification of mutational dysphonia. Identification of these characteristics may affect treatment choices, facilitate monitoring of the efficacy of therapy, and aid in estimating prognosis.  相似文献   

4.
To quantify several acoustic features of the voice in patients with essentialtremor (ET), 28 patients and 28 age- and sex-matched controls were studied. ET severity was assessed with the rating scale for tremor of Fahn, Tolosa, and Marín. The Computerized Speech Lab 4300 program (Kay Elemetrics) was used. Two-second samples of a sustained /a/ and a sentence were captured with a microphone and laryngograph equipment. Measures included fundamental frequency (F0), frequency perturbation (fitter, Koike algorithm), intensity perturbation (shimmer, Horii algorithm), and harmonic-to-noise ratio (H/N, Yumoto algorithm) of the vowel /a/, and the frequency and intensity variability of the sentence, phonational range, and dynamic range at the natural frequency, maximum phonational time, and s/z ratio. All subjects underwent indirect laryngoscopy and/or laryngeal fibroscopy. When compared with controls, ET patients showed higher jitter, lower H/N ratio (the last one only with laryngographic signal), of the vowel /a/, lower frequency variability in the microphonc signal, lower intensity variability in the laryngographic signal of the sentence, and significantly lower dynamic range at natural frequency of phonation. ET patients reported higher frequency of the presence of high voice intensity, tremor, and struggle. Several acoustic parameters were influenced by the severity of the disease, including shimmer, jitter, H/N ratio, frequency variability of the sentence, and s/z ratio, although neither of the acoustic analysis values or the phonetometric measurements were affected by the presence of voice tremor or by a successful pharmacological treatment of ET.  相似文献   

5.
The purpose of the current study was to assess the anatomic and functional correlates of voice quality in tracheoesophageal speech, with dynamic imaging studies of the neoglottis. Videofluoroscopy (providing a lateral view), digital high-speed endoscopy (providing a "birds-eye" view), and their relationships with perceptual evaluations of voice quality were investigated. Several significant relationships were found. Imaging with videofluoroscopy revealed that the following anatomic and functional parameters (established during phonation) are related to voice quality: presence of a neoglottic bar, regurgitation of barium, tonicity of the neoglottis, and minimal neoglottic distance. Furthermore, the index of the increase of the maximal subneoglottic distance from rest to phonation also showed a significant relationship with voice quality. Imaging with digital high-speed endoscopy revealed features relevant to voice quality, including amount of saliva, visibility of the origin of the neoglottis, shape of the neoglottis, and regularity of the vibration. Knowledge of the anatomic and functional correlates of tracheoesophageal voice quality provides prerequisite information for future (phono-) surgical and/or clinical improvements to the voice quality of postlaryngectomy (prosthetic) voice production.  相似文献   

6.
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.  相似文献   

7.
This study investigated the relation of symptoms of vocal fatigue to acoustic variables reflecting type of voice production and the effects of vocal loading. Seventy-nine female primary school teachers volunteered as subjects. Before and after a working day, (1) a 1-minute text reading sample was recorded at habitual loudness and loudly (as in large classroom), (2) a prolonged phonation on [a:] was recorded at habitual speaking pitch and loudness, and (3) a questionnaire about voice quality, ease, or difficulty of phonation and tiredness of throat was completed. The samples were analyzed for average fundamental frequency (F0), sound pressure level (SPL), and phonation type reflecting alpha ratio (SPL [1-5 kHz]-SPL [50 Hz-1 kHz]). The vowel samples were additionally analyzed for perturbation (jitter and shimmer). After a working day, F0, SPL, and alpha ratio were higher, jitter and shimmer values were lower, and more tiredness of throat was reported. The average levels of the acoustic parameters did not correlate with the symptoms. Increase in jitter and mean F0 in loud reading correlated with tiredness of throat. The results seem to suggest that, at least among experienced vocal professionals, voice production type had little relevance from the point of view of vocal fatigue reported. Differences in the acoustic parameters after a vocally loading working day mainly seem to reflect increased muscle activity as a consequence of vocal loading.  相似文献   

8.
胡涵  顾文涛 《声学学报》2022,47(2):276-286
个体依恋风格可基于依恋回避、依恋焦虑这两个维度加以定义,并根据其取值高低划分为4种依恋类型.为探究依恋风格对亲密话语语音特征的影响,我们选取12对年轻异性情侣,采用亲密关系体验量表测出各人的依恋回避与焦虑值.通过半开放式的约会剧本,诱导被试产出亲密语气的目标句,再单独朗读这些目标句作为中性话语.基于9个韵律及嗓音参数的...  相似文献   

9.
Accuracy of acoustic voice analysis is influenced by the quality of recording. Lately, articles have suggested that soundcards perform equivalently to specialized professional-grade data acquisition (DA) systems. The purpose of this study was to investigate the influence of DA environment (DA system and microphone) on acoustic voice quality measurement (VQM) while balancing for gender, age, intersubject and intrasubject variability, and analysis software. More specifically, the relative performance of different hardware environments and the relationship between their technical characteristics and VQM performance was investigated. The discretization error and the effective dynamic range of the different DA environments were measured. We used 3 software systems to record and measure separately 2000 acoustic samples of sustained phonation for fundamental frequency, jitter, and shimmer. Analyses of variance (ANOVA) were performed with these parameters as the dependent variables. The results of the study suggested that professional-grade DA hardware is strongly recommended to provide accurate and valid voice assessment. The fundamental frequency measurement differences across DA environments were highly correlated to the discretization error (r=1.00), whereas jitter and shimmer were highly correlated to the effective dynamic range of the DA environments (r=-0.68 and r=-0.86, respectively).  相似文献   

10.
The purpose was to determine the clinical value of a multiparametric objective voice evaluation protocol including acoustic and aerodynamic parameters measured mainly on a sustained /a/. This was done by comparison with perceptual analysis of continuous speech by a jury composed of 6 experienced listeners. Voice samples (continuous speech) from 63 male patients with dysphonia and 21 control subjects with normal voices were recorded and assesed by a jury of listeners. The jury was instructed to classify voice samples according to the G (overall dysphonia) component of the GRBAS score on a 4-point scale ranging from 0 for normal to 3 for severe dysphonia. Objective parameters were recorded on an EVA® workstation. As usual with this type of system, parameters were measured mainly on a sustained /a/. Measured parameters included fundamental frequency (F0), intensity, jitter, shimmer, signal-to-noise ratio, Lyapunov coefficient (LC), oral airflow (OAF), maximum phonatory time (MPT), and vocal range (range). Estimated subglottic pressure (ESGP) was determined on a series of /pa/. Discriminant analysis was performed to detect correlation between jury classification and combinations of parameters. Results showed that a nonlinear combination of only six parameters (range, LC, ESGP, MPT, signal-to-noise ratio, and F0) allowed 86% concordance with jury classification. Discussion deals with the relative importance of the different objective parameters for discriminant analysis. Special emphasis is placed on two measurements rarely made in routine clinical workup, i.e., estimated subglottic pressure and Lyapunov coefficient.  相似文献   

11.
12.
The purpose of this study was (1) to determine the relationship between acoustic measures and auditory-perceptual dimensions of overall voice severity and pleasantness and (2) to evaluate the ability of acoustic and auditory-perceptual measures to discriminate normal from dysphonic voices. Thirty adult dysphonic speakers and six, age-matched normal control speakers were asked to provide oral reading samples of the Rainbow Passage. Acoustic analysis of the speech samples was used to identify abnormal phonatory events associated with dysphonia. The acoustic program calculated long-term average spectral measures, glottal noise measures, and those measures based on linear prediction (LP) modeling. Twelve adult listeners judged overall voice severity and pleasantness from the connected speech samples using direct magnitude estimation (DME) procedures. The acoustic measures accounted for 48% of overall voice severity and 40% of voice pleasantness for dysphonic speakers. The classification performance of the acoustic measures and auditory-perceptual measures was quantified using logistic regression analysis. When acoustic measures or auditory-perceptual measures were considered in isolation, classification was generally accurate and similar across measures. Classification accuracy improved to 100% when acoustic and auditory-perceptual measures were combined. These data provide further support for use of both auditory-perceptual evaluation and acoustic analyses for classifying and evaluating dysphonia.  相似文献   

13.
The aim of this comparative, controlled, cross-sectional study is to evaluate the voice quality in patients with multiple sclerosis (MS) by subjective and objective methods. Female patients with MS (n=27) and age- and sex-matched healthy controls (n=27) were included in this study. Vocal functions were evaluated by a multidimensional set composed of videolaryngostroboscopic examination, acoustic analysis, and subjective measurements (GRBAS and "Voice Handicap Index"). Jitter percent, shimmer percent, and soft phonation index (SPI) values were higher in MS patients compared to controls (Jitt, P=0.001; Shim, P=0.033; SPI P<0.0001). Maximum phonation time was significantly shorter for MS patients compared to controls (P<0.0001). Stroboscopic examination revealed that 16 out of 27 MS patients have a "posterior chink" as glottic closure pattern with higher SPI values (40%). Noise to harmonic ratio (NHR) and mean fundamental frequency (F0) values were similar for MS and control groups (NHR, P=0.737; F0, P=0.976). In this study, most of the MS patients had dysphonia due to weakness of voice. MS tends to worsen acoustic parameters including fundamental frequency, SPI, and jitter values. These results are consistent with the more asthenic voice quality observed in MS group.  相似文献   

14.
SUMMARY: In recent years, the multiparametric approach for evaluating perceptual rating of voice quality has been advocated. This study evaluates the accuracy of predicting perceived overall severity of voice quality with a minimal set of aerodynamic, voice range profile (phonetogram), and acoustic perturbation measures. One hundred and twelve dysphonic persons (93 women and 19 men) with laryngeal pathologies and 41 normal controls (35 women and six men) with normal voices participated in this study. Perceptual severity judgement was carried out by four listeners rating the G (overall grade) parameter of the GRBAS scale. The minimal set of instrumental measures was selected based on the ability of the measure to discriminate between dysphonic and normal voices, and to attain at least a moderate correlation with perceived overall severity. Results indicated that perceived overall severity was best described by maximum phonation time of sustained /a/, peak intraoral pressure of the consonant-vowel /pi/ strings production, voice range profile area, and acoustic jitter. Direct-entry discriminant function analysis revealed that these four voice measures in combination correctly predicted 67.3% of perceived overall severity levels.  相似文献   

15.
A pitch-synchronous analysis of hoarseness in running speech   总被引:3,自引:0,他引:3  
A method of pitch-synchronous acoustic analysis of hoarseness requiring a voice sample of only four fundamental periods is presented. This method calculates a noise-to-signal (N/S) ratio, which indicates the depth of valleys between harmonic peaks in the power spectrum. The spectrum is calculated pitch synchronously from a Fourier transform of the signal, windowed through a continuously variable Hanning window spanning exactly four fundamental periods. A two-stage procedure is used to determine the exact duration of the four fundamental periods. An initial estimate is obtained using autocorrelation in the time domain. A more precise estimate is obtained in the frequency domain by minimizing the errors between the preliminary calculated power spectrum and the predicted spectrum spread of a windowed harmonic signal. Analysis of synthesized voices showed that the N/S ratio is sensitive to additive noise, jitter, and shimmer, and is insensitive to slow (8 Hz) modulation in fundamental frequency and amplitude. An analysis of pre- and postoperative voices of six patients with benign laryngeal disease showed that the N/S ratio for vowel /u/ in running speech consistently improved after surgery for all subjects, in agreement with their successful therapeutic results.  相似文献   

16.
This study was designed to examine the relationship between the Voice Handicap Index (VHI) and acoustic measures of voice samples common in clinical practice. Fifty participants, 38 women and 12 men, ranging in age from 19 to 80 years, with a mean age of 49 years, served as participants. Of these 50 participants, 17 participants could be included in the acoustic analysis of voice based on measures of error calculated with the TF32 software. All participants completed the VHI and provided voice samples including three trials of the sustained vowel /A/ at a comfortable loudness level as well as a connected speech sample consisting of the Zoo Passage. Acoustic measures were made with TF32 and Cool Edit software and included fundamental frequency, jitter %, shimmer %, signal-to-noise ratio, mean root-mean-square intensity, fundamental frequency standard deviation, aphonic periods, and breath groups. Results indicate that these measures were not predictive of overall VHI score, and no cohesive or predictable pattern was identified when comparing individual measures with overall VHI or with each subscale item. Likely contributions to this lack of correlation and subsequent clinical implications are discussed, as well as the direction for further research.  相似文献   

17.
This study searched for perceptual, acoustic, and physiological correlates of support in singing. Seven trained professional singers (four women and three men) sang repetitions of the syllable [pa:] at varying pitch and sound levels (1) habitually (with support) and (2) simulating singing without support. Estimate of subglottic pressure was obtained from oral pressure during [p]. Vocal fold vibration was registered with dual-channel electroglottography. Acoustic analyses were made on the recorded samples. All samples were also evaluated by the singers and other listeners, who were trained singers, singing students, and voice specialists without singing education (a total of 63 listeners). We rated both the overall voice quality and the amount of support. According to the results, it seemed impossible to observe any auditory differences between supported singing and good singing voice quality. The acoustic and physiological correlates of good voice quality in absolute values seem to be gender and task dependent, whereas the relative optimum seems to be reached at intermediate parameter values.  相似文献   

18.
Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.  相似文献   

19.
Harmonics-to-Noise Ratio: An Index of Vocal Aging   总被引:3,自引:0,他引:3  
Distinguishing between vocal changes that occur with normal aging and those that are associated with disease is an important goal of research in voice. Several acoustic measures have been used in an attempt to illuminate the integrity of the vocal mechanism, including harmonics-to-noise ratio (HNR), jitter, and fundamental frequency (F0). HNR is a measure that quantifies the amount of additive noise in the voice signal; jitter reflects the periodicity of vocal fold vibration. In this study, measures of HNR, jitter and F0 were used to compare vocal function in three groups of normally speaking women: young adults, middle-aged adults, and elderly adults. Significant differences in HNR emerged between the elderly women and the other two groups. F0 differences were also apparent between the elderly group and the two younger groups; there were no significant differences in jitter between the three groups. HNR was found to be a more sensitive index of vocal function than jitter. The significant lowering of HNR evident in the elderly speakers may be attributable in part to medications taken by the majority of these elderly subjects.  相似文献   

20.
Down syndrome (DS) is the most frequent chromosomal disorder. Commonly, individuals with DS have difficulties with speech and show an unusual quality in the voice. Their phenotypic characteristics include general hypotonia and maxillary hypoplasia with relative macroglossia, and these contribute to particular acoustic alterations. Subjective perceptual and acoustic assessments of the voice (Praat-4.1 software) were performed in 66 children with DS, 36 boys and 30 girls, aged 3 to 8 years. These data were compared with those of an age-matched group of children from the general population. Perceptual evaluations showed significant differences in the group of children with DS. The voice of children with DS presented a lower fundamental frequency (F(0)) with elevated dispersion. The conjunction of frequencies for formants (F(1) and F(2)) revealed a decreased distinction between the vowels, reflecting the loss of articulatory processing. The DS vocalic anatomical functional ratio represents the main distinctive parameter between the two groups studied, and it may be useful in conducting assessments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号