首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 21 毫秒
1.
Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.  相似文献   

2.
To determine the minimum difference in amplitude between spectral peaks and troughs sufficient for vowel identification by normal-hearing and hearing-impaired listeners, four vowel-like complex sounds were created by summing the first 30 harmonics of a 100-Hz tone. The amplitudes of all harmonics were equal, except for two consecutive harmonics located at each of three "formant" locations. The amplitudes of these harmonics were equal and ranged from 1-8 dB more than the remaining components. Normal-hearing listeners achieved greater than 75% accuracy when peak-to-trough differences were 1-2 dB. Normal-hearing listeners who were tested in a noise background sufficient to raise their thresholds to the level of a flat, moderate hearing loss needed a 4-dB difference for identification. Listeners with a moderate, flat hearing loss required a 6- to 7-dB difference for identification. The results suggest, for normal-hearing listeners, that the peak-to-trough amplitude difference required for identification of this set of vowels is very near the threshold for detection of a change in the amplitude spectrum of a complex signal. Hearing-impaired listeners may have difficulty using closely spaced formants for vowel identification due to abnormal smoothing of the internal representation of the spectrum by broadened auditory filters.  相似文献   

3.
The attainment of a feminine-sounding voice is a highly desirable goal among male-to-female transgender (MFT) persons, but this goal may be difficult for many to accomplish. The characteristics associated with a feminine vocal quality include increases in fundamental frequency and in vocal breathiness. In this study, we used inverse-filtering of the airflow signal to indirectly assess vocal fold function in 13 MFT persons. Each participant was asked to sustain the vowel /a/ first in her biological male voice and then again in her female voice. In addition, these vowel productions were compared with vowels produced by age-matched biologic women and men. The results of the study revealed a significant increase in maximum flow declination rate during female voice production. Perceptual ratings of a feminine voice were associated with a fundamental frequency (F0) of 180 Hz or greater, although F0 did not differ significantly between male and female voice production. These results are discussed relative to the mechanisms that obtained a feminine-sounding voice.  相似文献   

4.
The perception of breathiness in vowels is cued by multiple acoustic cues, including changes in aspiration noise (AH) and the open quotient (OQ) [Klatt and Klatt, J. Acoust. Soc. Am. 87(2), 820-857 (1990)]. A loudness model can be used to determine the extent to which AH masks the harmonic components in voice. The resulting "partial loudness" (PL) and loudness of AH ["noise loudness" (NL)] have been shown to be good predictors of perceived breathiness [Shrivastav and Sapienza, J. Acoust. Soc. Am. 114(1), 2217-2224 (2003)]. The levels of AH and OQ were systematically manipulated for ten synthetic vowels. Perceptual judgments of breathiness were obtained and regression functions to predict breathiness from the ratio of NL to PL (η) were derived. Results show that breathiness can be modeled as a power function of η. The power parameter of this function appears to be affected by the fundamental frequency of the vowel. A second experiment was conducted to determine if the resulting power function could estimate breathiness in a different set of voices. The breathiness of these stimuli, both natural and synthetic, was determined in a listening test. The model estimates of breathiness were highly correlated with perceptual data but the absolute predicted values showed some discrepancies.  相似文献   

5.
Experiments on disordered voice quality with multidimensional scaling (MDS) have resulted in solutions with low R-square and have failed to show consistent dimensions across different listeners. These findings have been suggested to indicate large individual differences in the perception of voice quality. However, these inconsistencies may originate from several factors, including random stimulus selection, instructions that encourage listeners to respond to global difference in pairs of voices, and noisy perceptual data. This experiment used MDS techniques to study individual differences in perception of breathiness. The voices in the experiment were selected to have a relatively wide variation in breathiness but only minimal variation in roughness, strain, and fundamental frequency. Additionally, listeners were instructed specifically to rate similarities in breathiness rather than judging global differences in voices, and several judgments from each listener were averaged to minimize noise in the data. It was hypothesized that these modifications would result in an MDS solution that accounted for greater variance in perceptual data than previously shown. Results show that averaging multiple responses from each listener increased the R-square from 45% to approximately 75%. The poor R-square and large individual differences in voice quality perception observed in past research may have partly resulted from the experimental procedures in previous studies. These findings suggest that individual differences in the perception of voice quality are not as large as previously thought, and a model of voice quality perception for an "average" listener may be a good representation for the general population.  相似文献   

6.
Despite much research, the relationship between vocal acoustic signals and perceived voice quality is not well understood. The present study used an auditory model proposed by Moore et al10 to study how changes in the acoustic spectrum may relate to changes in perceptual ratings of breathiness. Perceptual ratings of breathiness were obtained using a multidimensional scaling (MDS) design. The stimulus distances on the dominant MDS dimension were correlated with several commonly used acoustic measures for voice quality. These distances were also compared with measures obtained from the output of the auditory model. Results show that the partial loudness of the harmonic energy obtained with the aspiration noise acting as a masker was the most important predictor of perceptual ratings of breathiness. Results also demonstrate that measures obtained from the auditory spectrum were better predictors of perceptual ratings of breathiness than were commonly used acoustic spectral measures.  相似文献   

7.
The validity of the parameter Br index, which was designed as an indicator of the turbulent noise in breathy voice, was tested. The parameter was determined by the ratio between energy of the second derivative of the high-pass filtered wave and that of the nonderived high-pass filtered wave. The principle of this method is the utilization of the difference in the frequency range between the turbulent noise and other components present. The parameter was found to correlate with the perception of breathiness. Clinical applications of this index suggest the possibility of using it further as a detection tool for diseases that generate turbulent noises.  相似文献   

8.
The ability of listeners to detect level differences between two sinusoidal stimuli in a two-interval forced-choice procedure was measured as a function of duration and level in three conditions: (1) the pedestal was fixed in level and the stimuli in the two intervals had the same frequency of either 1 or 2 kHz (fixed-level condition); (2) the pedestal was roved in level over a 20-dB range from trial to trial, but the stimuli still had the same frequency of either 1 or 2 kHz (roving-level condition); and (3) the pedestal was roved in level over a 20-dB range and the two stimuli differed in frequency, such that one was around 1 kHz while the other was around 2 kHz (across-frequency condition). In the fixed-level conditions, difference limens decreased (improved) with both increasing duration and level, as found in previous studies. In the roving-level conditions, difference limens increased and the dependence on duration and level decreased. Difference limens in the across-frequency conditions were generally highest and showed very little dependence on either stimulus duration or level. The results may be understood in terms of different internal noise components with additive variances: In the fixed-level conditions, sensation noise, which is dependent on stimulus attributes such as duration and level, is dominant. In more difficult conditions, where trace-memory and/or across-channel comparisons are required, a more central, stimulus-independent noise dominates.  相似文献   

9.
The goal of this study was to measure detection thresholds for 12 isolated American English vowels naturally spoken by three male and three female talkers for young normal-hearing listeners in the presence of a long-term speech-shaped (LTSS) noise, which was presented at 70 dB sound pressure level. The vowel duration was equalized to 170 ms and the spectrum of the LTSS noise was identical to the long-term average spectrum of 12-talker babble. Given the same duration, detection thresholds for vowels differed by 19 dB across the 72 vowels. Thresholds for vowel detection showed a roughly U-shaped pattern as a function of the vowel category across talkers with lowest thresholds at /i/ and /ae/ vowels and highest thresholds at /u/ vowel in general. Both vowel category and talker had a significant effect on vowel detectability. Detection thresholds predicted from three excitation pattern metrics by using a simulation model were well matched with thresholds obtained from human listeners, suggesting that listeners could use a constant metric in the excitation pattern of the vowel to detect the signal in noise independent of the vowel category and talker. Application of the simulation model to predict thresholds of vowel detection in noise was also discussed.  相似文献   

10.
OBJECTIVES/HYPOTHESIS: The purpose of this study was (1) to determine whether changes in intra- and interrater reliability occur for inexperienced listeners' judgments of overall severity, roughness, and breathiness in dysphonic and normal speakers after 2 hours of listener training; and (2) to determine the acoustic bases of inexperienced listeners' judgments before and after training. STUDY DESIGN: Prospective, single group, pre- and postdesign. METHODS: Thirty adult dysphonic and six normal speaker samples were selected from a database. Samples included 21 test stimuli and 15 training stimuli of both sustained vowels and connected speech. Sixteen inexperienced listeners judged all samples for overall severity, roughness, and breathiness using visual analog scales. Each listener provided pretraining ratings at baseline. Listeners were then trained using 15 anchor voice samples and 15 training stimuli. During training, listeners were provided with definitions of rating dimensions, accuracy feedback, and anchor samples. Listeners then judged test stimuli in a posttraining session. Speaker samples also were analyzed acoustically. RESULTS: Intrarater reliability was least variable for judgments of overall severity, but improved further with training. Listener judgments of roughness and breathiness in vowels were least reliable at baseline, but they significantly improved between listeners after training. Finally, measures of cepstral peak prominence significantly predicted all voice quality judgments except roughness in vowels, which was predicted by shimmer. The acoustic bases of group perceptual judgments did not seem to change with training. CONCLUSIONS: These findings have implications for developing training programs in perceptual evaluation and mapping relationships between acoustic and perceptual characteristics of voice disorders.  相似文献   

11.
The goal of this study was to determine if there are acoustical differences between male and female voices, and if there are, where exactly do these differences lie. Extended speech samples were used. The recorded readings of a text by 31 women and by 24 men were analyzed by means of the Long-term Spectrum (LTAS), extracting the amplitude values (in decibels) at intervals of 160 Hz over a range of 8 kHz. The results showed a significant difference between genders, as well as an interaction of gender and frequency level. The female voice showed greater levels of aspiration noise, located in the spectral regions corresponding to the third formant, which causes the female voice to have a more “breathy” quality than the male voice. The lower spectral tilt in the women's voices is another consequence of this presence of greater aspiration noise.  相似文献   

12.
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.  相似文献   

13.
The speech-reception threshold (SRT) for sentences presented in a fluctuating interfering background sound of 80 dBA SPL is measured for 20 normal-hearing listeners and 20 listeners with sensorineural hearing impairment. The interfering sounds range from steady-state noise, via modulated noise, to a single competing voice. Two voices are used, one male and one female, and the spectrum of the masker is shaped according to these voices. For both voices, the SRT is measured as well in noise spectrally shaped according to the target voice as shaped according to the other voice. The results show that, for normal-hearing listeners, the SRT for sentences in modulated noise is 4-6 dB lower than for steady-state noise; for sentences masked by a competing voice, this difference is 6-8 dB. For listeners with moderate sensorineural hearing loss, elevated thresholds are obtained without an appreciable effect of masker fluctuations. The implications of these results for estimating a hearing handicap in everyday conditions are discussed. By using the articulation index (AI), it is shown that hearing-impaired individuals perform poorer than suggested by the loss of audibility for some parts of the speech signal. Finally, three mechanisms are discussed that contribute to the absence of unmasking by masker fluctuations in hearing-impaired listeners. The low sensation level at which the impaired listeners receive the masker seems a major determinant. The second and third factors are: reduced temporal resolution and a reduction in comodulation masking release, respectively.  相似文献   

14.
Two sounds with the same pitch may vary from each other based on saliency of their pitch sensation. This perceptual attribute is called "pitch strength." The study of voice pitch strength may be important in quantifying of normal and pathological qualities. The present study investigated how pitch strength varies across normal and dysphonic voices. A set of voices (vowel /a/) selected from the Kay Elemetrics Disordered Voice Database served as the stimuli. These stimuli demonstrated a wide range of voice quality. Ten listeners judged the pitch strength of these stimuli in an anchored magnitude estimation task. On a given trial, listeners heard three different stimuli. The first stimulus represented very low pitch strength (wide-band noise), the second stimulus consisted of the target voice and the third stimulus represented very high pitch strength (pure tone). Listeners estimated pitch strength of the target voice by positioning a continuous slider labeled with values between 0 and 1, reflecting the two anchor stimuli. Results revealed that listeners can judge pitch strength reliably in dysphonic voices. Moderate to high correlations with perceptual judgments of voice quality suggest that pitch strength may contribute to voice quality judgments.  相似文献   

15.
Psychometric functions for level discrimination   总被引:1,自引:0,他引:1  
To determine the form of psychometric functions for 2I,2AFC level discrimination (commonly called intensity discrimination), ten increment levels were presented in random order within blocks of 100 trials. Stimuli were chosen to encompass a wide range of conditions and difference limens: eight 10-ms tones had frequencies of 0.25, 1, 8, or 14 kHz and levels of 30, 60, or 90 dB SPL; two 500-ms stimuli also were tested: a 1-kHz tone at 90 dB SPL and broadband noise at 63 dB SPL. For each condition, at least 20 blocks were presented in mixed order. Results for five normal listeners show that the sensitivity, d', is nearly proportional to delta L (= 20 log [(p + delta p)/p], where p is sound pressure) over the entire range of difference limens. When d' is plotted against Weber fractions for sound pressure, delta p/p, or intensity, delta I/I, exponents of the best-fitting power functions decrease with increasing difference limens and are less than unity for large difference limens. The approximately proportional relation between d' and delta L agrees with modern multichannel models of level discrimination and with psychometric functions derived for single auditory-nerve fibers. The results also support the notion that the difference limen, expressed as delta LDL and plotted on a logarithmic scale, is an appropriate representation of performance in level-discrimination experiments.  相似文献   

16.
To identify a speaker's sex, listeners may rely on sex-based differences in average fundamental frequency (F0), but overlap in male and female F0 ranges undermines such judgments. To test accuracy of sex-identification throughout the F0 range, listeners were asked to judge sex based on audio recordings of /ɑ/ spoken on a number of overlapping steady F0s by 10 male and 10 female English speakers. In general, listeners performed above chance (71.6% correct). However, near range extrema, listeners followed an apparent bias toward hearing high F0s as female and low as male; confidence was high when accuracy was high and vice-versa. At mid-range, listeners identified sex fairly accurately but were not very confident in their judgments. In a forced-choice task, vowels close in F0 (but beyond the difference limen) were presented in male-female or female-male pairs. Listeners weakly identified speaker sex (63.3% correct). Identification of the male voice was considerably above chance only when the male had the lower F0 of the pair. Reliance on stereotypes of speaking F0 may bias listeners to hear low F0s as male and high F0s as female, perhaps with a contribution from vocal-tract length information. No strong evidence for a contribution of voice quality obtained.  相似文献   

17.
It is commonly assumed that one can always assign a direction-upward or downward-to a percept of pitch change. The present study shows that this is true for some, but not all, listeners. Frequency difference limens (FDLs, in cents) for pure tones roved in frequency were measured in two conditions. In one condition, the task was to detect frequency changes; in the other condition, the task was to identify the direction of frequency changes. For three listeners, the identification FDL was about 1.5 times smaller than the detection FDL, as predicted (counterintuitively) by signal detection theory under the assumption that performance in the two conditions was limited by one and the same internal noise. For three other listeners, however, the identification FDL was much larger than the detection FDL. The latter listeners had relatively high detection FDLs. They had no difficulty in identifying the direction of just-detectable changes in intensity, or in the frequency of amplitude modulation. Their difficulty in perceiving the direction of small frequency/pitch changes showed up not only when the task required absolute judgments of direction, but also when the directions of two successive frequency changes had to be judged as identical or different.  相似文献   

18.
Recent experimental studies showed that isotropic vocal fold models were often blown wide apart and thus not able to maintain adductory position, resulting in voice production with noticeable breathy quality. This study showed that the capability of the vocal fold to resist deformation against airflow and maintain adductory position can be improved by stiffening the body-layer stiffness or increasing the anterior-posterior tension of the vocal folds, which presumably can be achieved through the contraction of the thyroarytenoid (TA) and cricothyroid (CT) muscles, respectively. Experiments in both physical models and excised larynges showed that, when these restraining mechanisms were activated, the vocal folds were better able to maintain effective adduction, resulting in voice production with much clearer quality and reduced breathiness. In humans, one or more restraining mechanisms may be activated at different levels to accommodate the varying degree of restraining required under different voice conditions.  相似文献   

19.
Studies have raised concerns about the reliability of traditional clinical perceptual voice evaluation. References, training, and the analysis-by-synthesis method were proposed to improve this reliability. However, no study has directly compared the effectiveness of these methods. This study compared two training programs that are based on an anchors method and a paired comparison method. The aim of the programs was to improve the ability of naive listeners to detect subtle differences in breathiness. This study found that trained listeners showed significant improvement after training. Equivocal results were found as to which of these training methods was more effective. However, it is suggested that listeners should be trained before they use the analysis-by-synthesis method. The findings of this study provided more information on developing the theoretical framework proposed by Kreiman et al, in particular the "nature" of the internal representations of voice quality. An exemplar-based approach for processing breathiness is proposed.  相似文献   

20.
A positive reinforcement conditioning procedure was used to train chinchillas to respond to intensity differences between successively occurring tone bursts. Intensity difference limens were measured at 0.5, 1, 4, and 8 kHz at five intensities ranging from 10- to 55-dB sensation level. The intensity difference limen decreased from approximately 8 dB near threshold to approximately 3.5 dB at the highest level. The intensity difference limens for the chinchilla were considerably larger than those for humans as well as several other mammals; however, the results were similar to those obtained for the parakeet. The present results from intensity discrimination appeared to be related to previous data for the discrimination of amplitude modulated noise.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号