首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The purpose of this study was to determine the validity of voice pleasantness and overall voice severity ratings of dysphonic and normal speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) auditory-perceptual scaling procedures. Twelve naive listeners perceptually evaluated voice pleasantness and severity from connected speech samples produced by 24 adult dysphonic speakers and 6 normal adult speakers. A statistical comparison of the two auditory-perceptual scales yielded a linear relationship representative of a metathetic continuum for voice pleasantness. A statistical relationship that is consistent with a prothetic continuum was revealed for ratings of voice severity. These data provide support for the use of either DME or EAI scales when making auditory-perceptual judgments of pleasantness, but only DME scales when judging overall voice severity for dysphonic speakers. These results suggest further psychophysical study of perceptual dimensions of voice and speech must be undertaken in order to avoid the inappropriate and invalid use of EAI scales used in the auditory-perceptual evaluation of the normal and dysphonic voice.  相似文献   

2.
Spectral- and cepstral-based acoustic measures are preferable to time-based measures for accurately representing dysphonic voices during continuous speech. Although these measures show promising relationships to perceptual voice quality ratings, less is known regarding their ability to differentiate normal from dysphonic voice during continuous speech and the consistency of these measures across multiple utterances by the same speaker. The purpose of this study was to determine whether spectral moments of the long-term average spectrum (LTAS) (spectral mean, standard deviation, skewness, and kurtosis) and cepstral peak prominence measures were significantly different for speakers with and without voice disorders when assessed during continuous speech. The consistency of these measures within a speaker across utterances was also addressed. Continuous speech samples from 27 subjects without voice disorders and 27 subjects with mixed voice disorders were acoustically analyzed. In addition, voice samples were perceptually rated for overall severity. Acoustic analyses were performed on three continuous speech stimuli from a reading passage: two full sentences and one constituent phrase. Significant between-group differences were found for both cepstral measures and three LTAS measures (P < 0.001): spectral mean, skewness, and kurtosis. These five measures also showed moderate to strong correlations to overall voice severity. Furthermore, high degrees of within-speaker consistency (correlation coefficients ≥0.89) across utterances with varying length and phonemic content were evidenced for both subject groups.  相似文献   

3.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

4.
SUMMARY: In recent years, the multiparametric approach for evaluating perceptual rating of voice quality has been advocated. This study evaluates the accuracy of predicting perceived overall severity of voice quality with a minimal set of aerodynamic, voice range profile (phonetogram), and acoustic perturbation measures. One hundred and twelve dysphonic persons (93 women and 19 men) with laryngeal pathologies and 41 normal controls (35 women and six men) with normal voices participated in this study. Perceptual severity judgement was carried out by four listeners rating the G (overall grade) parameter of the GRBAS scale. The minimal set of instrumental measures was selected based on the ability of the measure to discriminate between dysphonic and normal voices, and to attain at least a moderate correlation with perceived overall severity. Results indicated that perceived overall severity was best described by maximum phonation time of sustained /a/, peak intraoral pressure of the consonant-vowel /pi/ strings production, voice range profile area, and acoustic jitter. Direct-entry discriminant function analysis revealed that these four voice measures in combination correctly predicted 67.3% of perceived overall severity levels.  相似文献   

5.
The objectives of this prospective and exploratory study are to determine: (1) na?ve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty na?ve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Na?ve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.  相似文献   

6.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

7.
The purpose of this study was (1) to determine the psychophysical character of auditory-perceptual ratings of voice pleasantness (VP) and voice acceptability (VA) for tracheoesophageal (TE) speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) scaling procedures and (2) to determine the relationship between listeners' ratings of VP and VA. Ten adult listeners judged overall VP and VA from connected speech samples produced by 20 adult male TE speakers. Although results yielded a prothetic continuum for VP and a metathetic continuum for VA, the amount of variance accounted for by a curvilinear model of VP was minimally more than that accounted for by a linear model. Results also revealed a significant relationship between VP and VA (r = 0.939). Findings from this study do not suggest any greater validity associated with VP and VA ratings obtained by the DME than the EAI method. As a result of the significant relationship between these ratings and to the ease of applying EAI scales, it is recommended that VA be used as a current clinical outcome measure. These data illustrate the need to identify attributes that best describe TE speech that are measured appropriately and are clinically useful.  相似文献   

8.
Speech of patients with abductor spasmodic dysphonia (ABSD) was analyzed using acoustic analyses to determine: (1) which acoustic measures differed from controls and were independent factors representing patients' voice control difficulties, and (2) whether acoustic measures related to blinded perceptual counts of the symptom frequency in the same patients. Patients' voice onset time for voiceless consonants in speech were significantly longer than the controls (p = 0.015). A principle components analysis identified three factors that accounted for 95% of the variance: the first factor included sentence and word duration, frequency shifts, and aperiodic instances; the second was phonatory breaks; and the third was voice onset time. Significant relationships with perceptual counts of symptoms were found for the measures of acoustic disruptions in sentences and sentence duration. Finally, a multiple regression demonstrated that the acoustic measures related well with the perceptual counts (r2 = 0.84) with word duration most highly related and none of the other measures contributing once the effect of word duration was partialed out. The results indicate that some of the voice motor control deficits, namely aperiodicity, phonatory breaks, and frequency shifts, which occur in patients with ABSD, are similar to those previously found in adductor spasmodic dysphonia. Results also indicate that acoustic measures of intermittent disruptions in speech, voice onset time, and speech duration are closely related to the perception of symptom frequency in the disorder.  相似文献   

9.
Traditional measures of dysphonia vary in their reliability and in their correlations with perceptions of grade. Measurements of cepstral peak prominence (CPP) have been shown to correlate well with perceptions of breathiness. Because it is a measure of periodicity, CPP should also predict roughness. The ability of CPP and other acoustic measures to predict overall dysphonia and the subcategories of breathiness and roughness in pathological voice samples is explored. Preoperative and postoperative speech samples from 19 patients with unilateral recurrent laryngeal nerve paralysis who underwent operative intervention were analyzed by trained listeners and by measures of smoothed CPP (CPPS), noise-to-harmonic ratio (NHR), amplitude perturbation quotient (APQ), relative average perturbation (RAP), and smoothed pitch perturbation quotient (sPPQ). The data were analyzed with bivariate Pearson correlation statistics. Grade of dysphonia and breathiness ratings correlated better with measurements of CPPS than with the other measures. CPPS from samples of connected speech (CPPS-s) best predicted overall dysphonia. None of the measures were useful in predicting roughness.  相似文献   

10.
SUMMARY: Because of the aperiodicity of many tracheoesophageal voices, acoustic analysis of the tracheoesophageal voice is less straightforward than that of the normal voice. This study presents the development and testing of an acoustic signal typing system based on visual inspection of a narrow-band spectrogram that can be used by researchers for classification of voice quality in tracheoesophageal speech. In addition to this classification system, a selection of acoustic measures [median fundamental frequency, standard deviation of fundamental frequency, jitter, percentage of voiced (%Voiced), harmonics-to-noise ratio (HNR), glottal-to-noise excitation (GNE) ratio, and band energy difference (BED)] was computed to provide more insight into the acoustic components of tracheoesophageal voice quality. For clinical relevance, relationships between the acoustic signal types and an overall judgment of the voice were investigated as well. Results showed that the four acoustic signal types form a good basis for performing more acoustic analyses and give a good impression of the overall quality of the voice.  相似文献   

11.
The categorization of voice into quality type (ie, normal, breathy, hoarse, rough) is often a traditional part of the voice diagnostic. The goal of this study was to assess the contributions of various time and spectral-based acoustic measures to the categorization of voice type for a diverse sample of voices collected from both functionally dysphonic (breathy, hoarse, and rough) (n=83) and normal women (n=51). Before acoustic analyses, 12 judges rated all voice samples for voice quality type. Discriminant analysis, using the modal rating of voice type as the dependent variable, produced a 5-variable model (comprising time and spectral-based measures) that correctly classified voice type with 79.9% accuracy (74.6% classification accuracy on cross-validation). Voice type classification was achieved based on two significant discriminant functions, interpreted as reflecting measures related to "Phonatory Instability" and "F(0) Characteristics." A cepstrum-based measure (CPP/EXP ratio) consistently emerged as a significant factor in predicting voice type; however, variables such as shimmer (RMS dB) and a measure of low- vs. high-frequency spectral energy (the Discrete Fourier Transformation ratio) also added substantially to the accurate profiling and prediction of voice type. The results are interpreted and discussed with respect to the key acoustic characteristics that contributed to the identification of specific voice types, and the value of identifying a subset of time and spectral-based acoustic measures that appear sensitive to a perceptually diverse set of dysphonic voices.  相似文献   

12.
Speech range profile (SRP) is a graphical display of frequency-intensity occurring interactions during functional speech activity. Few studies have suggested the potential clinical applications of SRP. However, these studies are limited to qualitative case comparisons and vocally healthy participants. The present study aimed to examine the effects of voice disorders on speaking and maximum voice ranges in a group of vocally untrained women. It also aimed to examine whether voice limit measures derived from SRP were as sensitive as those derived from voice range profile (VRP) in distinguishing dysphonic from healthy voices. Ninety dysphonic women with laryngeal pathologies and 35 women with normal voices, who served as controls, participated in this study. Each subject recorded a VRP for her physiological vocal limits. In addition, each subject read aloud the "North Wind and the Sun" passage to record SRP. All the recordings were captured and analyzed by Soundswell's computerized real-time phonetogram Phog 1.0 (Hitech Development AB, T?by, Sweden). The SRPs and the VRPs were compared between the two groups of subjects. Univariate analysis results demonstrated that individual SRP measures were less sensitive than the corresponding VRP measures in discriminating dysphonic from normal voices. However, stepwise logistic regression analyses revealed that the combination of only two SRP measures was almost as effective as a combination of three VRP measures in predicting the presence of dysphonia (overall prediction accuracy: 93.6% for SRP vs 96.0% for VRP). These results suggest that in a busy clinic where quick voice screening results are desirable, SRP can be an acceptable alternate procedure to VRP.  相似文献   

13.
To determine whether a correlation exists between the Grade, Roughness, Breathiness, Aesthenia, Strain (GRBAS) scale (a subjective measure of voice) and the Multi-Dimensional Voice Program (MDVP) scale (an objective measure of voice). A retrospective review of 37 voice patients (12 male/25 female) was conducted. Each voice was perceptually evaluated using the GRBAS scale by an experienced speech pathologist and acoustically analyzed using the MDVP scale. Statistical analysis using a multivariate regression model identified a significant correlation between the noise-related parameters of MDVP and the components of the GRBAS scale. Grade correlated with voice turbulence index (VTI), noise harmonic ratio (NHR), and soft phonation index (SPI). Roughness correlated with NHR only. Breathiness correlated with SPI only. Aesthenia also correlated with SPI only. Of the 19 acoustic variables measured by the MDVP system, only three noise parameters significantly correlated with the GRBAS perceptual voice analysis. Perhaps "noise" is the perceived acoustical quality of the dysphonic voice. A voice quantifying measure such as a "voice index score" could be proposed using the GRBAS scoring and the three clinically relevant MDVP values following further studies.  相似文献   

14.
Although considerable progress has been made in the development of acoustic and physiological measures of operatic singing voice, there is still no widely accepted objective tool for the evaluation of its multidimensional features. Auditory-perceptual evaluation, therefore, remains an important evaluation method for singing pedagogues, voice scientists, and clinicians who work with opera singers. Few investigators, however, have attempted to develop standard auditory-perceptual tools for evaluation of the operatic voice. This study aimed to pilot test a new auditory-perceptual rating instrument for operatic singing voice. Nine expert teachers of operatic singing used the instrument to rate the singing voices of 21 professional opera chorus artists from a national opera company. The findings showed that the instrument has good face validity, that it can be legitimately treated as a psychometrically sound scale, and that raters can use the scale consistently, both between and within judges. This new instrument, therefore, has the potential to allow opera singers, their teachers, voice care clinicians, and researchers to evaluate the important auditory-perceptual features of operatic voice quality.  相似文献   

15.
Functional (nonorganic) dysphonia is often characterized by vocal instability. The purpose of the prospective study was to examine whether there is a difference in vocal instability of functional dysphonic voices compared with healthy ones, this means whether electroglottographic perturbation values differ (1) between healthy and dysphonic voices and (2) between two subgroups of the dysphponic voices (hpertonic and hypotonic dysphonic voices). Twenty-three patients with hypertonic functional dysphonia, 9 with hypotonic functional dysphonia and 31 healthy nonsmokers, were each examined electroglottographically before (Ex 1), immediately after (Ex 2), and 1 hour after (Ex 3) voice loading. Perturbations of frequency, amplitude, quasi-open-quotient, and contact-index were calculated from the EGG signal. At all three times of examination, hypertonic dysphonic voices showed higher perturbations than healthy voices, and they had higher perturbations than hypotonic dysphonic voices before and 1 hour after voice loading. Hypotonic dysphonic voices showed higher perturbations than healthy voices only 1 hour after voice loading. Voice loading induced different reactions in dysphonic voices: Some voices showed increased perturbations, and others exhibited normal or even decreased perturbation immediately after voice loading. Examination of electroglottographic-derived perturbations immediately after voice loading seems not to be useful. Differentiation of hypertonic and hypotonic dysphonic voices was possible with an estimated sensitivity of 88.9% and a specificity of 87.0% by using the sum of the amplitude-perturbation and the quasi-open-quotient-perturbation measured before voice loading.  相似文献   

16.
Spectral amplitude measures are sensitive to varying degrees of vocal fold adduction in normal speakers. This study examined the applicability of harmonic amplitude differences to adductor spasmodic dysphonia (ADSD) in comparison with normal controls. Amplitudes of the first and second harmonics (H1, H2) and of harmonics affiliated with the first, second, and third formants (A1, A2, A3) were obtained from spectra of vowels and /i/ excerpted from connected speech. Results indicated that these measures could be made reliably in ADSD. With the exception of H1(*)-H2(*), harmonic amplitude differences (H1(*)-A1, H1(*)-A2, and H1(*)-A3(*)) exhibited significant negative linear relationships (P < 0.05) with clinical judgments of overall severity. The four harmonic amplitude differences significantly differentiated between pre-BT and post-BT productions (P < 0.05). After treatment, measurements from detected significant differences between ADSD and normal controls (P < 0.05), but measurements from /i/ did not. LTAS analysis of ADSD patients' speech samples proved a good fit with harmonic amplitude difference measures. Harmonic amplitude differences also significantly correlated with perceptual judgments of breathiness and roughness (P < 0.05). These findings demonstrate high clinical applicability for harmonic amplitude differences for characterizing phonation in the speech of persons with ADSD, as well as normal speakers, and they suggest promise for future application to other voice pathologies.  相似文献   

17.
Manual circumlaryngeal therapy (manual laryngeal musculoskeletal tension reduction) was used to treat 25 consecutive functional dysphonia patients. Pre- and post-treatment audio recordings of connected speech and sustained vowel samples were submitted to auditory-perceptual and acoustical analysis to assess the immediate and long-term effects of a single treatment session. To complement audio recordings, subjects were interviewed in follow-up regarding the stability of treatment effects. Pre- and post-treatment comparisons demonstrated significant voice improvements. No significant differences were observed between post-treatment measures, suggesting that vocal gains were maintained. Interviews revealed 68% of subjects reported occasional partial recurrences, typically less than 4 days in duration, which resolved spontaneously. These results replicate and extend previous research suggesting the utility of manual circumlaryngeal therapy for functional voice disorders.  相似文献   

18.
Laryngeal aerodynamic and acoustic characteristics of African American voice production were examined from vowel samples produced by ten adult female and ten adult male speakers. The data were compared with that for a control group consisting of ten adult female and ten adult male White speakers, matched for age, height, and weight. All measures were analyzed using Cspeech 4.0. Aerodynamic measurements, extracted from a glottal airflow waveform, included maximum flow declination rate, alternating glottal airflow, minimum glottal airflow, and airflow open quotient. Acoustic measures included fundamental frequency and sound pressure level. No significant mean differences between the African American and White speakers were found, except for maximum-flow declination rate. The White speakers produced significantly higher declination rates than the African American speakers. The factor of sex for the African American speakers was statistically significant for the measures of maximum-flow declination rate, alternating glottal airflow, open quotient, and fundamental frequency, consistent with the functioning of the White speakers. The results suggest that during vowel production, where the vocal tract is in a fairly static position, acoustic and aerodynamic characteristics for African American and White Speakers are comparable.  相似文献   

19.
The MPEG-1 Layer 3 compression schema of audio signal, commonly known as mp3, has caused a great impact in recent years as it has reached high compression rates while conserving a high sound quality. Music and speech samples compressed at high bitrates are perceptually indistinguishable from the original samples, but very little was known about how compression acoustically affects the voice signal. A previous work with normal voices showed a high fidelity at high-bitrate compressions both in voice parameters and the amplitude-frequency spectrum. In the present work, dysphonic voices were tested through two studies. In the first study, spectrograms, long-term average spectra (LTAS), and fast Fourier transform (FFT) spectra of compressed and original samples of running speech were compared. In the second study, intensities, formant frequencies, formant bandwidths, and a multidimensional set of voice parameters were tested in a set of sustained phonations. Results showed that compression at high bitrates (96 and 128 kbps) preserved the relevant acoustic properties of the pathological voices. With compressions at lower bitrates, fidelity decreases, introducing some important alterations. Results from both works, Gonzalez and Cervera and this paper, open up the possibility of using MPEG-compression at high bitrates to store or transmit high-quality speech recordings, without altering their acoustic properties.  相似文献   

20.
The need for standardization of procedures in approaches to voice measurement has been recently emphasized. The purpose of this study was to determine the extent to which the acoustic perturbation measurements from three different analysis systems agree when standardized recording and analysis procedures are used. High-quality acoustic voice recordings from 20 patients were analyzed. The results showed that, although fundamental frequency measurements were in strong agreement among the three systems tested, frequency and amplitude perturbation measurements were not in agreement. The underlying approaches to perturbation measurement appeared to be sufficiently different to produce different results. An argument is made for a standardized set of acoustic signals representing normal, dysphonic, and synthesized voices with known characteristics to facilitate testing of new acoustic analysis systems and confirm measurement accuracy and sensitivity.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号