首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The objectives of this prospective and exploratory study are to determine: (1) na?ve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty na?ve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Na?ve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.  相似文献   

2.
Voice analysis was performed on 21 “standard” laryngectomized, male patients with a Provox® voice prosthesis, along with an age- and sex-matched control group of 20 normal speakers, using acoustical analyses (MDVP and CSL, Kay Elemetrics Corp.), maximum phonation time measurements, and perceptual evaluations. Comparison between MDVP and CSL revealed that the latter was not useful for the analysis of laryngectomized prosthetic voices. In contrast, MDVP seems suitable for this purpose, and contains a large number of parameters that significantly differentiate between patient and control speakers, as did the perceptual ratings and the maximum phonation time. Fundamental frequency appeared to be comparable for patients and control speakers. A significant influence of stoma occlusion and age was found for some voice parameters. Factor analyses showed correlations between the different MDVP parameters and correlations between the MDVP parameters and the perceptual ratings.  相似文献   

3.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

4.
Spectral- and cepstral-based acoustic measures are preferable to time-based measures for accurately representing dysphonic voices during continuous speech. Although these measures show promising relationships to perceptual voice quality ratings, less is known regarding their ability to differentiate normal from dysphonic voice during continuous speech and the consistency of these measures across multiple utterances by the same speaker. The purpose of this study was to determine whether spectral moments of the long-term average spectrum (LTAS) (spectral mean, standard deviation, skewness, and kurtosis) and cepstral peak prominence measures were significantly different for speakers with and without voice disorders when assessed during continuous speech. The consistency of these measures within a speaker across utterances was also addressed. Continuous speech samples from 27 subjects without voice disorders and 27 subjects with mixed voice disorders were acoustically analyzed. In addition, voice samples were perceptually rated for overall severity. Acoustic analyses were performed on three continuous speech stimuli from a reading passage: two full sentences and one constituent phrase. Significant between-group differences were found for both cepstral measures and three LTAS measures (P < 0.001): spectral mean, skewness, and kurtosis. These five measures also showed moderate to strong correlations to overall voice severity. Furthermore, high degrees of within-speaker consistency (correlation coefficients ≥0.89) across utterances with varying length and phonemic content were evidenced for both subject groups.  相似文献   

5.
The purpose of this study was to determine the validity of voice pleasantness and overall voice severity ratings of dysphonic and normal speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) auditory-perceptual scaling procedures. Twelve naive listeners perceptually evaluated voice pleasantness and severity from connected speech samples produced by 24 adult dysphonic speakers and 6 normal adult speakers. A statistical comparison of the two auditory-perceptual scales yielded a linear relationship representative of a metathetic continuum for voice pleasantness. A statistical relationship that is consistent with a prothetic continuum was revealed for ratings of voice severity. These data provide support for the use of either DME or EAI scales when making auditory-perceptual judgments of pleasantness, but only DME scales when judging overall voice severity for dysphonic speakers. These results suggest further psychophysical study of perceptual dimensions of voice and speech must be undertaken in order to avoid the inappropriate and invalid use of EAI scales used in the auditory-perceptual evaluation of the normal and dysphonic voice.  相似文献   

6.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

7.
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.  相似文献   

8.
Longitudinal studies on vocal aging are scarce, and information on the impact of age-related voice changes on daily life is lacking. This longitudinal study reports on age-related voice changes and the impact on daily life over a time period of 5 years on 11 healthy male speakers, age ranging from 50 to 81 years. All males completed a questionnaire on vocal performance in daily life, and perceptual and acoustical analyses of vocal quality and analyses of maximum performance tasks of vocal function (voice range profile) were performed. Results showed a significant deterioration of the acoustic voice signal as well as increased ratings on vocal roughness judged by experts after the time period of 5 years. An increase of self-reported voice instability and the tendency to avoid social parties supported these findings. Smoking males had a lower speaking fundamental frequency compared with nonsmoking males, and this seemed reversible for males who stop smoking. This study suggests a normal gradual vocal aging process with clear consequences in daily life, which should be taken into consideration in clinical practice as well as in studies concerning communication in social life.  相似文献   

9.
The attainment of a feminine-sounding voice is a highly desirable goal among male-to-female transgender (MFT) persons, but this goal may be difficult for many to accomplish. The characteristics associated with a feminine vocal quality include increases in fundamental frequency and in vocal breathiness. In this study, we used inverse-filtering of the airflow signal to indirectly assess vocal fold function in 13 MFT persons. Each participant was asked to sustain the vowel /a/ first in her biological male voice and then again in her female voice. In addition, these vowel productions were compared with vowels produced by age-matched biologic women and men. The results of the study revealed a significant increase in maximum flow declination rate during female voice production. Perceptual ratings of a feminine voice were associated with a fundamental frequency (F0) of 180 Hz or greater, although F0 did not differ significantly between male and female voice production. These results are discussed relative to the mechanisms that obtained a feminine-sounding voice.  相似文献   

10.
To reduce degradation in speech recognition due to varied characteristics of different speakers,a method of perceptual frequency warping based on subglottal resonances for speaker normalization is proposed.The warping factor is extracted from the second subglottal resonance using acoustic coupling between subglottis and vocal tract.The second subglottal resonance is independent of the speech content,which reflects the speaker characteristics more than the third formant.The perceptual minimum variation distortionless response(PMVDR) coefficient is normalized,which is more robust and has better anti-noise capability than MFCC. The normalized coefficients are used in the speech-mode training and speech recognition.Experiments show that the word error rate,as compared with MFCC and the spectrum warping by the third formant,decreases by 4%and 3%respectively in clean speech recognition,and by 9%and 5%respectively in a noisy environment.The results indicate that the proposed method can improve the word recognition accuracy in a speaker-independent recognition system.  相似文献   

11.
Effects of Family Therapy on Children''s Voices   总被引:1,自引:0,他引:1  
The families of nine children with deviant voice qualities were selected for family treatment according to the SYGESTI model. Recordings of the children's speech were made before and after therapy. Perceptual evaluation of their voice quality showed significant improvement in various perceptual parameters after the therapy. Acoustical analysis confirmed changes of voice quality and mean fundamental frequency in speech. The therapy also was found to improve relations between family members, conflict management and other aspects of communication. The results suggest that these children's deviant voices were related to family conditions.  相似文献   

12.
13.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

14.
The present study assessed the effect of sex on voice fundamental frequency (F(0)) responses to pitch feedback perturbations during sustained vocalization. Sixty-four native-Mandarin speakers heard their voice pitch feedback shifted at ± 50, ± 100, or ± 200 cents for 200 ms, five times during each vocalization. The results showed that, as compared to female speakers, male speakers produced significantly larger but slower vocal responses to the pitch-shifted stimuli. These findings reveal a modulation of vocal response as a function of sex, and suggest that there may be a differential processing of vocal pitch feedback perturbations between men and women.  相似文献   

15.
To determine if the speaking fundamental frequency (F0) profiles of English and Mandarin differ, a variety of voice samples from male and female speakers were compared. The two languages' F0 profiles were sometimes found to differ, but these differences depended on the particular speech samples being compared. Most notably, the physiological F0 ranges of the speakers, determined from tone sweeps, hardly differed between the two languages, indicating that the English and Mandarin speakers' voices are comparable. Their use of F0 in single-word utterances was, however, quite different, with the Mandarin speakers having higher maximums and means, and larger ranges, even when only the Mandarin high falling tone was compared with English. In contrast, for a prose passage, the two languages were more similar, differing only in the mean F0, Mandarin again being higher. The study thus contributes to the growing literature showing that languages can differ in their F0 profile, but highlights the fact that the choice of speech materials to compare can be critical.  相似文献   

16.
This study explored whether acoustic and perceptual features could distinguish comfortable from maximally projected acting voice. Thirteen professional male actors performed a passage from William Shakespeare's Julius Caesar twice. The first delivery used their comfortably projected voices, whereas the second used maximal projection. Acoustic measures, expert ratings, and self-ratings of projection and voice quality were investigated. Long-term average spectra (LTAS) and sound pressure level (SPL) analyses were conducted. Perceptual variables included projection, breathiness, roughness, and strain. When comparing the intensity difference between the higher (2-4 kHz) and lower (0-2 kHz) regions of the spectrum in voice samples from the maximal projected condition, LTAS analyses demonstrated increased acoustic energy in the higher part of the spectrum. This LTAS pattern was not as evident in the comfortable projected condition. These findings offered some preliminary support for the existence of an actor's formant (prominent peak in the upper part of the spectrum) during maximal projection.  相似文献   

17.
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.  相似文献   

18.
Previous studies suggest that speakers are systematically inaccurate, or biased, when imitating self-produced vowels. The direction of these biases in formant space and their variation may offer clues about the organization of the vowel perceptual space. To examine these patterns, three male speakers were asked to imitate 45 self-produced vowels that were systematically distributed in F1/F2 space. All three speakers showed imitation bias, and the bias magnitudes were significantly larger than those predicted by a model of articulatory noise. Each speaker showed a different pattern of bias directions, but the pattern was unrelated to the locations of prototypical vowels produced by that speaker. However, there were substantial quantitative regularities: (1) The distribution of imitation variability and bias magnitudes were similar for all speakers, (2) the imitation variability was independent of the bias magnitudes, and (3) the imitation variability (a production measure) was commensurate with the formant discrimination limen (a perceptual measure). These results indicate that there is additive Gaussian noise in the imitation process that independently affects each formant and that there are speaker-dependent and potentially nonlinguistic biases in vowel perception and production.  相似文献   

19.
The purpose of this paper was to evaluate the reliability of, and agreement among, six speech analysis systems in the determination of fundamental frequency. Five male and five female speakers provided oral reading and sustained vowel samples for analysis. Each sample was analyzed five times by each system. The results indicated high reliability for all of the systems for both sexes and both utterance types. Agreement among the systems was high for the male sustained vowels and the female oral reading samples. In contrast, poor agreement occurred among the systems for the male oral reading samples and the female sustained vowels. The findings indicate that the output of these automatic methods tends to be consistent over repeated trials within the systems in their extraction of fundamental frequency; however, agreement among these systems varies.  相似文献   

20.
Until recently, speech analysis techniques have been built around the all-pole linear predictive model. This study examines the effectiveness of using the perceptual linear predictive method for analyzing nasal consonants. Six speakers (three men and three women) produced 300 CV syllables with initial nasal consonants /m/ and /n/. A threshold-based boundary detection algorithm was developed to extract nasal segments from the CV contexts. Poles of a fifth-order perceptual linear predictive model were calculated and the frequency of the second pole was used to characterize the place of articulation of nasal consonants. Results indicated that the frequency for the second transformed pole was significantly lower for /m/ than for /n/ and was independent of factors such as a vowel context and gender of the speaker. A nasal identification rate of 86% was obtained based on the frequency of the second pole. The use of the perceptual linear predictive method may thus overcome some difficulties associated with analyzing nasal consonants.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号