首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 155 毫秒
1.
The purpose of this study was to determine the validity of voice pleasantness and overall voice severity ratings of dysphonic and normal speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) auditory-perceptual scaling procedures. Twelve naive listeners perceptually evaluated voice pleasantness and severity from connected speech samples produced by 24 adult dysphonic speakers and 6 normal adult speakers. A statistical comparison of the two auditory-perceptual scales yielded a linear relationship representative of a metathetic continuum for voice pleasantness. A statistical relationship that is consistent with a prothetic continuum was revealed for ratings of voice severity. These data provide support for the use of either DME or EAI scales when making auditory-perceptual judgments of pleasantness, but only DME scales when judging overall voice severity for dysphonic speakers. These results suggest further psychophysical study of perceptual dimensions of voice and speech must be undertaken in order to avoid the inappropriate and invalid use of EAI scales used in the auditory-perceptual evaluation of the normal and dysphonic voice.  相似文献   

2.
The purpose of this study was (1) to determine the relationship between acoustic measures and auditory-perceptual dimensions of overall voice severity and pleasantness and (2) to evaluate the ability of acoustic and auditory-perceptual measures to discriminate normal from dysphonic voices. Thirty adult dysphonic speakers and six, age-matched normal control speakers were asked to provide oral reading samples of the Rainbow Passage. Acoustic analysis of the speech samples was used to identify abnormal phonatory events associated with dysphonia. The acoustic program calculated long-term average spectral measures, glottal noise measures, and those measures based on linear prediction (LP) modeling. Twelve adult listeners judged overall voice severity and pleasantness from the connected speech samples using direct magnitude estimation (DME) procedures. The acoustic measures accounted for 48% of overall voice severity and 40% of voice pleasantness for dysphonic speakers. The classification performance of the acoustic measures and auditory-perceptual measures was quantified using logistic regression analysis. When acoustic measures or auditory-perceptual measures were considered in isolation, classification was generally accurate and similar across measures. Classification accuracy improved to 100% when acoustic and auditory-perceptual measures were combined. These data provide further support for use of both auditory-perceptual evaluation and acoustic analyses for classifying and evaluating dysphonia.  相似文献   

3.
The purpose of the present study was to compare the speech performance of four types of alaryngeal phonation-electrolaryngeal (EL), pneumatic artificial laryngeal (PA), tracheoesophageal (TE), and standard esophageal (SE) speech-by adult Cantonese-speaking laryngectomees. Subjective ratings of (1) voice quality, (2) articulation proficiency, (3) quietness of speech, (4) pitch variability, and (5) overall speech intelligibility were given by eight naive individuals who had no prior experience with any form of alaryngeal speech. Results indicated that SE and TE speech was perceived to be more hoarse than PA and EL speech. EL speech was associated with significantly less pitch variability, and PA speakers produced speech with the least amount of perceived noise. However, articulation proficiency and overall speech intelligibility were found to be comparable in all four types of alaryngeal speakers.  相似文献   

4.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

5.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

6.
The objectives of this prospective and exploratory study are to determine: (1) na?ve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty na?ve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Na?ve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.  相似文献   

7.
The purpose of this study was to compare oral pressure (Po), nasal airflow (Vn), and velopharyngeal (VP) orifice area estimates from 12 tracheoesophageal (TE) and 12 laryngeal speakers as they produced /p/ and /m/ in syllable series. The findings were as follows: (1) TE speakers produced greater Po than the laryngeal speakers; (2) for /p/, TE speakers generated Vn, and VP orifice area estimates comparable with, or less than, the laryngeal speakers; and (3) for /m/, TE speakers had Vn and VP orifice area estimates greater than the laryngeal speakers. The elevated Po could be the result of several factors such as high source driving pressures and vocal tract volume changes postlaryngectomy. Attempts at more precise articulation, and subsequently less coarticulation, by the TE speakers may explain the Vn and VP orifice area estimates for /p/ and /m/. TE speakers may be limiting the oral-nasal cavity coupling for /p/ (smaller VP gap, less Vn) in an attempt to produce a very precise oral /p/. For /m/, TE speakers may be attempting to overtly mark the consonant as a nasal (greater Vn, larger VP gap). Further studies are needed to confirm/refute the explanations postulated here regarding the VP aerodynamic differences that were identified.  相似文献   

8.
9.
This paper describes two experiments aimed at exploring the relationship between objective properties of speech and perceived fluency in read and spontaneous speech. The aim is to determine whether such quantitative measures can be used to develop objective fluency tests. Fragments of read speech (Experiment 1) of 60 non-native speakers of Dutch and of spontaneous speech (Experiment 2) of another group of 57 non-native speakers of Dutch were scored for fluency by human raters and were analyzed by means of a continuous speech recognizer to calculate a number of objective measures of speech quality known to be related to perceived fluency. The results show that the objective measures investigated in this study can be employed to predict fluency ratings, but the predictive power of such measures is stronger for read speech than for spontaneous speech. Moreover, the adequacy of the variables to be employed appears to be dependent on the specific type of speech material investigated and the specific task performed by the speaker.  相似文献   

10.
To determine whether expert fluency ratings of read speech can be predicted on the basis of automatically calculated temporal measures of speech quality, an experiment was conducted with read speech of 20 native and 60 non-native speakers of Dutch. The speech material was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in terms of quantitative measures such as speech rate, articulation rate, number and length of pauses, number of dysfluencies, mean length of runs, and phonation/time ratio. The results show that expert ratings of fluency in read speech are reliable (Cronbach's alpha varies between 0.90 and 0.96) and that these ratings can be predicted on the basis of quantitative measures: for six automatic measures the magnitude of the correlations with the fluency scores varies between 0.81 and 0.93. Rate of speech appears to be the best predictor: correlations vary between 0.90 and 0.93. Two other important determinants of reading fluency are the rate at which speakers articulate the sounds and the number of pauses they make. Apparently, rate of speech is such a good predictor of perceived fluency because it incorporates these two aspects.  相似文献   

11.
Spectral- and cepstral-based acoustic measures are preferable to time-based measures for accurately representing dysphonic voices during continuous speech. Although these measures show promising relationships to perceptual voice quality ratings, less is known regarding their ability to differentiate normal from dysphonic voice during continuous speech and the consistency of these measures across multiple utterances by the same speaker. The purpose of this study was to determine whether spectral moments of the long-term average spectrum (LTAS) (spectral mean, standard deviation, skewness, and kurtosis) and cepstral peak prominence measures were significantly different for speakers with and without voice disorders when assessed during continuous speech. The consistency of these measures within a speaker across utterances was also addressed. Continuous speech samples from 27 subjects without voice disorders and 27 subjects with mixed voice disorders were acoustically analyzed. In addition, voice samples were perceptually rated for overall severity. Acoustic analyses were performed on three continuous speech stimuli from a reading passage: two full sentences and one constituent phrase. Significant between-group differences were found for both cepstral measures and three LTAS measures (P < 0.001): spectral mean, skewness, and kurtosis. These five measures also showed moderate to strong correlations to overall voice severity. Furthermore, high degrees of within-speaker consistency (correlation coefficients ≥0.89) across utterances with varying length and phonemic content were evidenced for both subject groups.  相似文献   

12.
Previous studies have demonstrated that motor control of segmental features of speech rely to some extent on sensory feedback. Control of voice fundamental frequency (F0) has been shown to be modulated by perturbations in voice pitch feedback during various phonatory tasks and in Mandarin speech. The present study was designed to determine if voice Fo is modulated in a task-dependent manner during production of suprasegmental features of English speech. English speakers received pitch-modulated voice feedback (+/-50, 100, and 200 cents, 200 ms duration) during a sustained vowel task and a speech task. Response magnitudes during speech (mean 31.5 cents) were larger than during the vowels (mean 21.6 cents), response magnitudes increased as a function of stimulus magnitude during speech but not vowels, and responses to downward pitch-shift stimuli were larger than those to upward stimuli. Response latencies were shorter in speech (mean 122 ms) compared to vowels (mean 154 ms). These findings support previous research suggesting the audio vocal system is involved in the control of suprasegmental features of English speech by correcting for errors between voice pitch feedback and the desired F0.  相似文献   

13.
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.  相似文献   

14.
BACKGROUND: After total laryngectomy, the interruption of the upper digestive tube and the section of the cricopharyngeal segment alter the high-pressure zone of the pharyngoesophageal transition, which will not only start to have a digestive function, but also be stimulated to take on the production of voice and speech. The pressure observed in the cricopharyngeal segment seems to act as a critical factor for the development of esophageal sound production, and manometry is the procedure capable of quantifying the pressure observed in this region. OBJECTIVE: The objective of the current study was to assess the upper esophageal sphincter pressure in laryngectomized patients who are either successful or unsuccessful esophageal speakers, both at rest and during esophageal phonation, using manometry. METHODS: Twenty laryngectomized persons aged 32 to 83 years (mean, 44.2 years) were submitted to evaluation by a speech pathologist and divided into two groups, ie, successful esophageal speakers (N=12) and unsuccessful esophageal speakers (N=8), according to a scale validated by Wepman et al (1953). The upper esophageal sphincter (UES) pressure was assessed by manometry both at rest and during the following voice emissions in Portuguese: the vowel "a," the monosyllable "pa," and the sentence "papai papou pipoca." The amplitude, the duration of the pressure wave, and the area under the curve were measured. RESULTS: At rest, the mean UES pressure was 11.83 mm Hg for successful esophageal speakers and 9.92 mm Hg for unsuccessful esophageal speakers, with no significant difference between groups; the mean for the two groups as a whole was 11.06 mm Hg. During the voice and speech sequence tests, no significant difference was observed when the emissions in Portuguese of "a," "pa," and the sentence were analyzed separately. CONCLUSION: As the pressure observed at rest did not differ between the successful esophageal speakers and the unsuccessful esophageal speakers, and the amplitude, the duration of the pressure wave, and the area under the amplitude x duration curve were also equal for both groups, we conclude that the cricopharyngeal segment pressure is not a preponderant factor for the acquisition of esophageal voice and speech.  相似文献   

15.
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.  相似文献   

16.
Voice analysis was performed on 21 “standard” laryngectomized, male patients with a Provox® voice prosthesis, along with an age- and sex-matched control group of 20 normal speakers, using acoustical analyses (MDVP and CSL, Kay Elemetrics Corp.), maximum phonation time measurements, and perceptual evaluations. Comparison between MDVP and CSL revealed that the latter was not useful for the analysis of laryngectomized prosthetic voices. In contrast, MDVP seems suitable for this purpose, and contains a large number of parameters that significantly differentiate between patient and control speakers, as did the perceptual ratings and the maximum phonation time. Fundamental frequency appeared to be comparable for patients and control speakers. A significant influence of stoma occlusion and age was found for some voice parameters. Factor analyses showed correlations between the different MDVP parameters and correlations between the MDVP parameters and the perceptual ratings.  相似文献   

17.
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction-adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.  相似文献   

18.
19.
To determine if the speaking fundamental frequency (F0) profiles of English and Mandarin differ, a variety of voice samples from male and female speakers were compared. The two languages' F0 profiles were sometimes found to differ, but these differences depended on the particular speech samples being compared. Most notably, the physiological F0 ranges of the speakers, determined from tone sweeps, hardly differed between the two languages, indicating that the English and Mandarin speakers' voices are comparable. Their use of F0 in single-word utterances was, however, quite different, with the Mandarin speakers having higher maximums and means, and larger ranges, even when only the Mandarin high falling tone was compared with English. In contrast, for a prose passage, the two languages were more similar, differing only in the mean F0, Mandarin again being higher. The study thus contributes to the growing literature showing that languages can differ in their F0 profile, but highlights the fact that the choice of speech materials to compare can be critical.  相似文献   

20.
The purpose of this study was to investigate if there is an effect of task on determination of habitual loudness. Four tasks commonly used to elicit habitual loudness were compared (automatic speech, elicited speech, spontaneous speech, and reading aloud). Participants were adult female speakers (N=30) with normal voice. A one-way analysis of variance (ANOVA) revealed a statistically significant (p < 0.05) effect of task, with post-hoc analyses indicating that there was a statistically significant difference in habitual loudness elicited via automatic versus spontaneous speech (p < 0.05), and automatic speech versus reading aloud (p < 0.001). The issue of how habitual loudness is defined is considered. Implications of the use of one task for determination of habitual loudness are discussed, as is the possibility of a task effect on determination of other clinically useful vocal parameters.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号