首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 873 毫秒
1.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

2.
To identify a speaker's sex, listeners may rely on sex-based differences in average fundamental frequency (F0), but overlap in male and female F0 ranges undermines such judgments. To test accuracy of sex-identification throughout the F0 range, listeners were asked to judge sex based on audio recordings of /ɑ/ spoken on a number of overlapping steady F0s by 10 male and 10 female English speakers. In general, listeners performed above chance (71.6% correct). However, near range extrema, listeners followed an apparent bias toward hearing high F0s as female and low as male; confidence was high when accuracy was high and vice-versa. At mid-range, listeners identified sex fairly accurately but were not very confident in their judgments. In a forced-choice task, vowels close in F0 (but beyond the difference limen) were presented in male-female or female-male pairs. Listeners weakly identified speaker sex (63.3% correct). Identification of the male voice was considerably above chance only when the male had the lower F0 of the pair. Reliance on stereotypes of speaking F0 may bias listeners to hear low F0s as male and high F0s as female, perhaps with a contribution from vocal-tract length information. No strong evidence for a contribution of voice quality obtained.  相似文献   

3.
This study investigated the perceptual and acoustical characteristicsof vocal presentation in both the masculine and the feminine modes by the same group of male subjects. Listeners (N = 88) evaluated 22 voice samples by using 18 semantic differential scales and 57 adjectives. The 22 voice samples were provided by I I biologically male speakers, who described themselves as heterosexual crossdressers. Each speaker read a standard passage under controlled conditions. In one reading, they demonstrated their typical masculine voice and in the other they spoke in their feminine voice. Acoustical analyses included mean fundamental frequency, frequency range, overall passage duration, and duration of a sample of stressed vowels. Results indicated that listeners heard significant differences between masculine and feminine presentations across the I I speakers and the 18 semantic differential scales. Masculine-feminine and high-low pitch were the most salient scales in the perceptual judgments. Acoustical analyses indicated wide variation according to speaker and condition. Clinical applications are provided.  相似文献   

4.
The objectives of this prospective and exploratory study are to determine: (1) na?ve listener preference for gender in tracheoesophageal (TE) speech when speech severity is controlled; (2) the accuracy of identifying TE speaker gender; (3) the effects of gender identification on judgments of speech acceptability (ACC) and naturalness (NAT); and (4) the acoustic basis of ACC and NAT judgments. Six male and six female adult TE speakers were matched for speech severity. Twenty na?ve listeners made auditory-perceptual judgments of speech samples in three listening sessions. First, listeners performed preference judgments using a paired comparison paradigm. Second, listeners made judgments of speaker gender, speech ACC, and NAT using rating scales. Last, listeners made ACC and NAT judgments when speaker gender was provided coincidentally. Duration, frequency, and spectral measures were performed. No significant differences were found for preference of male or female speakers. All male speakers were accurately identified, but only two of six female speakers were accurately identified. Significant interactions were found between gender and listening condition (gender known) for NAT and ACC judgments. Males were judged more natural when gender was known; female speakers were judged less natural and less acceptable when gender was known. Regression analyses revealed that judgments of female speakers were best predicted with duration measures when gender was unknown, but with spectral measures when gender was known; judgments of males were best predicted with spectral measures. Na?ve listeners have difficulty identifying the gender of female TE speakers. Listeners show no preference for speaker gender, but when gender is known, female speakers are least acceptable and natural. The nature of the perceptual task may affect the acoustic basis of listener judgments.  相似文献   

5.
Key features of the voice--fundamental frequency (F(0)) and formant frequencies (Fn)--can vary extensively among individuals. Some of this variation might cue fitness-related, biosocial dimensions of speakers. Three experiments tested the independent, joint and relative effects of F(0) and Fn on listeners' assessments of the body size, masculinity (or femininity), and attractiveness of male and female speakers. Experiment 1 replicated previous findings concerning the joint and independent effects of F(0) and Fn on these assessments. Experiment 2 established frequency discrimination thresholds (or just-noticeable differences, JND's) for both vocal features to use in subsequent tests of their relative salience. JND's for F(0) and Fn were consistent in the range of 5%-6% for each sex. Experiment 3 put the two voice features in conflict by equally discriminable amounts and found that listeners consistently tracked Fn over F(0) in rating all three dimensions. Several non-exclusive possibilities for this outcome are considered, including that voice Fn provides more reliable cues to one or more dimensions and that listeners' assessments of the different dimensions are partially interdependent. Results highlight the value of first establishing JND's for discrimination of specific features of natural voices in future work examining their effects on voice-based social judgments.  相似文献   

6.
A recent study [Smith and Patterson, J. Acoust. Soc. Am. 118, 3177-3186 (2005)] demonstrated that both the glottal-pulse rate (GPR) and the vocal-tract length (VTL) of vowel sounds have a large effect on the perceived sex and age (or size) of a speaker. The vowels for all of the "different" speakers in that study were synthesized from recordings of the sustained vowels of one, adult male speaker. This paper presents a follow-up study in which a range of vowels were synthesized from recordings of four different speakers--an adult man, an adult woman, a young boy, and a young girl--to determine whether the sex and age of the original speaker would have an effect upon listeners' judgments of whether a vowel was spoken by a man, woman, boy, or girl, after they were equated for GPR and VTL. The sustained vowels of the four speakers were scaled to produce the same combinations of GPR and VTL, which covered the entire range normally encountered in every day life. The results show that listeners readily distinguish children from adults based on their sustained vowels but that they struggle to distinguish the sex of the speaker.  相似文献   

7.
Two sounds with the same pitch may vary from each other based on saliency of their pitch sensation. This perceptual attribute is called "pitch strength." The study of voice pitch strength may be important in quantifying of normal and pathological qualities. The present study investigated how pitch strength varies across normal and dysphonic voices. A set of voices (vowel /a/) selected from the Kay Elemetrics Disordered Voice Database served as the stimuli. These stimuli demonstrated a wide range of voice quality. Ten listeners judged the pitch strength of these stimuli in an anchored magnitude estimation task. On a given trial, listeners heard three different stimuli. The first stimulus represented very low pitch strength (wide-band noise), the second stimulus consisted of the target voice and the third stimulus represented very high pitch strength (pure tone). Listeners estimated pitch strength of the target voice by positioning a continuous slider labeled with values between 0 and 1, reflecting the two anchor stimuli. Results revealed that listeners can judge pitch strength reliably in dysphonic voices. Moderate to high correlations with perceptual judgments of voice quality suggest that pitch strength may contribute to voice quality judgments.  相似文献   

8.
The purpose of this study was to determine the validity of voice pleasantness and overall voice severity ratings of dysphonic and normal speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) auditory-perceptual scaling procedures. Twelve naive listeners perceptually evaluated voice pleasantness and severity from connected speech samples produced by 24 adult dysphonic speakers and 6 normal adult speakers. A statistical comparison of the two auditory-perceptual scales yielded a linear relationship representative of a metathetic continuum for voice pleasantness. A statistical relationship that is consistent with a prothetic continuum was revealed for ratings of voice severity. These data provide support for the use of either DME or EAI scales when making auditory-perceptual judgments of pleasantness, but only DME scales when judging overall voice severity for dysphonic speakers. These results suggest further psychophysical study of perceptual dimensions of voice and speech must be undertaken in order to avoid the inappropriate and invalid use of EAI scales used in the auditory-perceptual evaluation of the normal and dysphonic voice.  相似文献   

9.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

10.
The present study explores the use of extrinsic context in perceptual normalization for the purpose of identifying lexical tones in Cantonese. In each of four experiments, listeners were presented with a target word embedded in a semantically neutral sentential context. The target word was produced with a mid level tone and it was never modified throughout the study, but on any given trial the fundamental frequency of part or all of the context sentence was raised or lowered to varying degrees. The effect of perceptual normalization of tone was quantified as the proportion of non-mid level responses given in F0-shifted contexts. Results showed that listeners' tonal judgments (i) were proportional to the degree of frequency shift, (ii) were not affected by non-pitch-related differences in talker, (iii) and were affected by the frequency of both the preceding and following context, although (iv) following context affected tonal decisions more strongly than did preceding context. These findings suggest that perceptual normalization of lexical tone may involve a "moving window" or "running average" type of mechanism, that selectively weights more recent pitch information over older information, but does not depend on the perception of a single voice.  相似文献   

11.
The purpose of this study was (1) to determine the psychophysical character of auditory-perceptual ratings of voice pleasantness (VP) and voice acceptability (VA) for tracheoesophageal (TE) speakers using direct magnitude estimation (DME) and equal-appearing interval (EAI) scaling procedures and (2) to determine the relationship between listeners' ratings of VP and VA. Ten adult listeners judged overall VP and VA from connected speech samples produced by 20 adult male TE speakers. Although results yielded a prothetic continuum for VP and a metathetic continuum for VA, the amount of variance accounted for by a curvilinear model of VP was minimally more than that accounted for by a linear model. Results also revealed a significant relationship between VP and VA (r = 0.939). Findings from this study do not suggest any greater validity associated with VP and VA ratings obtained by the DME than the EAI method. As a result of the significant relationship between these ratings and to the ease of applying EAI scales, it is recommended that VA be used as a current clinical outcome measure. These data illustrate the need to identify attributes that best describe TE speech that are measured appropriately and are clinically useful.  相似文献   

12.
Traditional interval or ordinal rating scale protocols appear to be poorly suited to measuring vocal quality. To investigate why this might be so, listeners were asked to classify pathological voices as having or not having different voice qualities. It was reasoned that this simple task would allow listeners to focus on the kind of quality a voice had, rather than how much of a quality it possessed, and thus might provide evidence for the validity of traditional vocal qualities. In experiment 1, listeners judged whether natural pathological voice samples were or were not primarily breathy and rough. Listener agreement in both tasks was above chance, but listeners agreed poorly that individual voices belonged in particular perceptual classes. To determine whether these results reflect listeners' difficulty agreeing about single perceptual attributes of complex stimuli, listeners in experiment 2 classified natural pathological voices and synthetic stimuli (varying in f0 only) as low pitched or not low pitched. If disagreements derive from difficulties dividing an auditory continuum consistently, then patterns of agreement should be similar for both kinds of stimuli. In fact, listener agreement was significantly better for the synthetic stimuli than for the natural voices. Difficulty isolating single perceptual dimensions of complex stimuli thus appears to be one reason why traditional unidimensional rating protocols are unsuited to measuring pathologic voice quality. Listeners did agree that a few aphonic voices were breathy, and that a few voices with prominent vocal fry and/or interharmonics were rough. These few cases of agreement may have occurred because the acoustic characteristics of the voices in question corresponded to the limiting case of the quality being judged. Values of f0 that generated listener agreement in experiment 2 were more extreme for natural than for synthetic stimuli, consistent with this interpretation.  相似文献   

13.
Can native listeners rapidly adapt to suprasegmental mispronunciations in foreign-accented speech? To address this question, an exposure-test paradigm was used to test whether Dutch listeners can improve their understanding of non-canonical lexical stress in Hungarian-accented Dutch. During exposure, one group of listeners heard a Dutch story with only initially stressed words, whereas another group also heard 28 words with canonical second-syllable stress (e.g., EEKhorn, "squirrel" was replaced by koNIJN "rabbit"; capitals indicate stress). The 28 words, however, were non-canonically marked by the Hungarian speaker with high pitch and amplitude on the initial syllable, both of which are stress cues in Dutch. After exposure, listeners' eye movements were tracked to Dutch target-competitor pairs with segmental overlap but different stress patterns, while they listened to new words from the same Hungarian speaker (e.g., HERsens, herSTEL, "brain," "recovery"). Listeners who had previously heard non-canonically produced words distinguished target-competitor pairs better than listeners who had only been exposed to Hungarian accent with canonical forms of lexical stress. Even a short exposure thus allows listeners to tune into speaker-specific realizations of words' suprasegmental make-up, and use this information for word recognition.  相似文献   

14.
The present study assessed the effect of sex on voice fundamental frequency (F(0)) responses to pitch feedback perturbations during sustained vocalization. Sixty-four native-Mandarin speakers heard their voice pitch feedback shifted at ± 50, ± 100, or ± 200 cents for 200 ms, five times during each vocalization. The results showed that, as compared to female speakers, male speakers produced significantly larger but slower vocal responses to the pitch-shifted stimuli. These findings reveal a modulation of vocal response as a function of sex, and suggest that there may be a differential processing of vocal pitch feedback perturbations between men and women.  相似文献   

15.
The primary goal of this study was to characterize a performer's singing and speaking voice. One woman was not admitted to a premier choral group, but her sister, who was comparable in physical characteristics and background, was admitted and provided a valuable control subject. The perceptual judgment of a vocal coach who conducted the group's auditions was decisive in discriminating these 2 singers. The singer not admitted to the group described a history of voice pathology, lacked a functional head register, and spoke with a voice characterized by hoarseness. Multiple listener judgments and acoustic and aerodynamic evaluations of both singers provided a more systematic basis for determining: 1) the phonatory basis for this judgment; 2) whether similar judgments would be made by groups of vocal coaches and speech-language pathologists; and 3) whether the type of tasks (e.g., sung vs. spoken) would influence these judgments. Statistically significant differences were observed between the ratings of vocal health provided by two different groups of listeners. Significant interactions were also observed as a function of the types of voice samples heard by these listeners. Instrumental analyses provided evidence that, in comparison to her sister, the rejected singer had a compromised vocal range, glottal insufficiencies as assessed aerodynamically and electroglottographically, and impaired acoustic quality, especially in her speaking voice.  相似文献   

16.
This study searched for perceptual, acoustic, and physiological correlates of support in singing. Seven trained professional singers (four women and three men) sang repetitions of the syllable [pa:] at varying pitch and sound levels (1) habitually (with support) and (2) simulating singing without support. Estimate of subglottic pressure was obtained from oral pressure during [p]. Vocal fold vibration was registered with dual-channel electroglottography. Acoustic analyses were made on the recorded samples. All samples were also evaluated by the singers and other listeners, who were trained singers, singing students, and voice specialists without singing education (a total of 63 listeners). We rated both the overall voice quality and the amount of support. According to the results, it seemed impossible to observe any auditory differences between supported singing and good singing voice quality. The acoustic and physiological correlates of good voice quality in absolute values seem to be gender and task dependent, whereas the relative optimum seems to be reached at intermediate parameter values.  相似文献   

17.
Little is known about the perceptual importance of changes in the shape of the source spectrum, although many measures have been proposed and correlations with different vocal qualities (breathiness, roughness, nasality, strain...) have frequently been reported. This study investigated just-noticeable differences in the relative amplitudes of the first two harmonics (H1-H2) for speakers of Mandarin and English. Listeners heard pairs of vowels that differed only in the amplitude of the first harmonic and judged whether or not the voice tokens were identical in voice quality. Across voices and listeners, just-noticeable-differences averaged 3.18 dB. This value is small relative to the range of values across voices, indicating that H1-H2 is a perceptually valid acoustic measure of vocal quality. For both groups of listeners, differences in the amplitude of the first harmonic were easier to detect when the source spectral slope was steeply falling so that F0 dominated the spectrum. Mandarin speakers were significantly more sensitive (by about 1 dB) to differences in first harmonic amplitudes than were English speakers. Two explanations for these results are possible: Mandarin speakers may have learned to hear changes in harmonic amplitudes due to changes in voice quality that are correlated with the tones of Mandarin; or Mandarin speakers' experience with tonal contrasts may increase their sensitivity to small differences in the amplitude of F0 (which is also the first harmonic).  相似文献   

18.
Traditionally, timbre has been defined as that perceptual attribute that differentiates two sounds when pitch and loudness are equal and thus is a measure of dissimilarity. By such a definition, each voice possesses a set of timbres, and the identity of any voice or voice category across different pitch-loudness-vowel combinations must be due to an abstraction of the pattern of timbre transformation. Using stimuli produced across the singing range by singers from different voice categories, this study sought to examine how timbre and pitch interact in the perception of dissimilarity. This study also investigated whether listener experience affects the perception of timbre as a function of pitch. The resulting multidimensional scaling (MDS) representations showed that for all stimuli and listeners, dimension 1 correlated with pitch, whereas dimension 2 correlated with spectral centroid and separated vocal stimuli into the categories mezzo-soprano and soprano. Dimension 3 appeared highly idiosyncratic depending on the nature of the stimuli and on the experience of the listener. Inexperienced listeners appeared to rely more heavily on pitch in making dissimilarity judgments than did experienced listeners. The resulting MDS representations of dissimilarity across pitch provide a glimpse of the timbre transformation of voice categories across pitch.  相似文献   

19.
Two experiments investigated whether listeners change their vowel categorization decisions to adjust to different accents of British English. Listeners from different regions of England gave goodness ratings on synthesized vowels embedded in natural carrier sentences that were spoken with either a northern or southern English accent. A computer minimization algorithm adjusted F1, F2, F3, and duration on successive trials according to listeners' goodness ratings, until the best exemplar of each vowel was found. The results demonstrated that most listeners adjusted their vowel categorization decisions based on the accent of the carrier sentence. The patterns of perceptual normalization were affected by individual differences in language background (e.g., whether the individuals grew up in the north or south of England), and were linked to the changes in production that speakers typically make due to sociolinguistic factors when living in multidialectal environments.  相似文献   

20.
Traditionally, timbre has been defined as that perceptual attribute that differentiates two sounds when pitch and loudness are equal, and thus is a measure of dissimilarity. By such a definition, each voice possesses a set of timbres, and the ability to identify any voice or voice category across different pitch-loudness-vowel combinations must be due to an ability to "link" these timbres by abstracting the "timbre transformation," the manner in which timbre subtly changes across pitch and loudness for a specific voice or voice category. Using stimuli produced across the singing range by singers from different voice categories, this study sought to examine how timbre and pitch interact in the perception of dissimilarity in male singing voices. This study also investigated whether or not listener experience affects the perception of timbre as a function of pitch. The resulting multidimensional scaling (MDS) representations showed that for all stimuli and listeners, dimension 1 correlated with pitch, while dimension 2 correlated with spectral centroid and separated vocal stimuli into the categories baritone and tenor. Dimension 3 appeared highly idiosyncratic depending on the nature of the stimuli and on the experience of the listener. Inexperienced listeners appeared to rely more heavily on pitch in making dissimilarity judgments than did experienced listeners. The resulting MDS representations of dissimilarity across pitch provide a glimpse of the timbre transformation of voice categories across pitch.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号