首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 125 毫秒
1.
Perception of breathy voice quality appears to be cued by changes in the vowel spectrum. These changes are related to alterations in the intensity of aspiration noise and spectral slope of the harmonic energy [Shrivastav and Sapienza, J. Acoust. Soc. Am., 114 (4), 2217-2224 (2003)]. Ten young-adult listeners with normal hearing were tested using an adaptive listening task to determine the smallest change in signal-to-noise ratio that resulted in a change in breathiness. Six vowel continua, three female and three male, were generated using a Klatt synthesizer and served as stimuli. Results showed that listeners needed as much as 20-dB increase in aspiration noise to perceive a change in breathiness against a relatively normal voice. In contrast, listeners needed approximately an 11-dB increase in aspiration noise to discriminate breathiness against a severely breathy voice. The difference limens for breathiness were observed to vary across the six talkers. Voices having aspiration noise that was predominantly in the high frequencies had smaller difference limens. No significant differences for male and female voice were observed.  相似文献   

2.
A phonetogram is a graph showing the sound pressure level (SPL) of softest and loudest phonation over the entire fundamental frequency range of a voice. A physiological interpretation of a phonetogram is facilitated if the SPL is measured with a flat frequency curve and if the vowel /a/ is used. It was found that in soft phonation, the SPL is mainly dependent on the amplitude of the fundamental, while in loud phonation, the SPL is mainly determined by overtones. The short-term SPL variation, i.e., the level variation within a tone, was about 5 dB in soft phonation and close to 2 dB in loud phonation. For two normal voices the long-term SPL variation, calculated as the mean standard deviation of SPL for day-to-day variation, was found to be between 2.4 and 3.4 dB in soft and loud phonation. Speakers who raise their loudness of phonation also tend to raise their mean voice fundamental frequency. Measures obtained from speaking at various voice levels were combined so that typical pathways could be introduced into the phonetogram. The average slope of these pathways was 0.3–0.5 st/dB for healthy subjects. Averaged phonetograms for male singers and male nonsingers did not differ significantly, but averaged phonetograms for female singers and female nonsingers did, in that the upper contour was higher for the female singers. Averaged phonetograms for female patients with non-organic dysphonia showed significantly lower SPL values in loudest phonation as compared to healthy female subjects, while no corresponding difference was seen for males in this regard. With respect to the SPL values for softest phonation, male dysphonic patients showed significantly higher SPL values than healthy male subjects, while no corresponding difference was seen in female subjects. The subglottal pressure mirrored these phonetogram differences between healthy and pathological voices. The averaged phonetograms of female patients after voice therapy showed an increased similarity with those of normal voices. For the male patients the averaged phonetogram did not change significantly after therapy.  相似文献   

3.
Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.  相似文献   

4.
The purpose of this study was to examine the algorithm-measuring capabilities used in the Time-Frequency Analysis Software Program for 32-bit Windows (TF32) for measuring fundamental frequency (F0), its dependent measures, and signal-to-noise ratio (SNR). The stability, accuracy, and linearity of its algorithm to systematic changes in aspiration noise and/or spectral slope (to mimic the perceptual characteristics of breathiness, roughness, and hoarseness) were evaluated using its analysis output to five female and five male synthesized voices. TF32 was used to calculate F0, Jitter%, Shimmer%, and SNR for each of the synthesized signals. The findings indicate that although TF32 produced stable results for male synthesized samples, they were not accurate when measuring F0, Jitter%, and Shimmer% with the addition of noise and variations in open quotient independently and in combination. In contrast, TF32 was neither stable nor accurate in making the same measurements for female synthesized samples. However, TF32 was stable and accurate in measuring SNR for male and most of female voices. These results point to an inappropriate F0 extraction algorithm in TF32 and stress the need for further research to remediate the algorithm or to identify a superior one.  相似文献   

5.
Two experiments compared the effect of supplying visual speech information (e.g., lipreading cues) on the ability to hear one female talker's voice in the presence of steady-state noise or a masking complex consisting of two other female voices. In the first experiment intelligibility of sentences was measured in the presence of the two types of maskers with and without perceived spatial separation of target and masker. The second study tested detection of sentences in the same experimental conditions. Results showed that visual cues provided more benefit for both recognition and detection of speech when the masker consisted of other voices (versus steady-state noise). Moreover, visual cues provided greater benefit when the target speech and masker were spatially coincident versus when they appeared to arise from different spatial locations. The data obtained here are consistent with the hypothesis that lipreading cues help to segregate a target voice from competing voices, in addition to the established benefit of supplementing masked phonetic information.  相似文献   

6.
The speech-reception threshold (SRT) for sentences presented in a fluctuating interfering background sound of 80 dBA SPL is measured for 20 normal-hearing listeners and 20 listeners with sensorineural hearing impairment. The interfering sounds range from steady-state noise, via modulated noise, to a single competing voice. Two voices are used, one male and one female, and the spectrum of the masker is shaped according to these voices. For both voices, the SRT is measured as well in noise spectrally shaped according to the target voice as shaped according to the other voice. The results show that, for normal-hearing listeners, the SRT for sentences in modulated noise is 4-6 dB lower than for steady-state noise; for sentences masked by a competing voice, this difference is 6-8 dB. For listeners with moderate sensorineural hearing loss, elevated thresholds are obtained without an appreciable effect of masker fluctuations. The implications of these results for estimating a hearing handicap in everyday conditions are discussed. By using the articulation index (AI), it is shown that hearing-impaired individuals perform poorer than suggested by the loss of audibility for some parts of the speech signal. Finally, three mechanisms are discussed that contribute to the absence of unmasking by masker fluctuations in hearing-impaired listeners. The low sensation level at which the impaired listeners receive the masker seems a major determinant. The second and third factors are: reduced temporal resolution and a reduction in comodulation masking release, respectively.  相似文献   

7.
It can be difficult for the voice clinician to observe or measure how a patient uses his voice in a noisy environment. We consider here a novel method for obtaining this information in the laboratory. Worksite noise and filtered white noise were reproduced over high-fidelity loudspeakers. In this noise, 11 subjects read an instructional text of 1.5 to 2 minutes duration, as if addressing a group of people. Using channel estimation techniques, the site noise was suppressed from the recording, and the voice signal alone was recovered. The attainable noise rejection is limited only by the precision of the experimental setup, which includes the need for the subject to remain still so as not to perturb the estimated acoustic channel. This feasibility study, with 7 female and 4 male subjects, showed that small displacements of the speaker's body, even breathing, impose a practical limit on the attainable noise rejection. The noise rejection was typically 30 dB and maximally 40 dB down over the entire voice spectrum. Recordings thus processed were clean enough to permit voice analysis with the long-time average spectrum and the computerized phonetogram. The effects of site noise on voice sound pressure level, fundamental frequency, long-term average spectrum centroid, phonetogram area, and phonation time were much as expected, but with some interesting differences between females and males.  相似文献   

8.
The purpose of this study was to examine the influence of noise on voice profile statistics from female samples. Six young adult females served as subjects. Five had normal voices; one had a pathological voice with accompanying bilateral vocal nodules. Each female subject was required to match a generated 235 Hz tone (+/- 2 Hz) while maintaining a constant output level of 70 dB SPL (+/- 5 dB). Data collected from a previous study involving a normal male subject were included for comparative purposes. Noise was generated from a personal computer fan which had a strong center frequency component at 235 Hz. Six different A-weighted signal-to-noise [S/N(A)] conditions were created, ranging in 5 dB increments from 25 to 0 dB. Results revealed that fundamental frequency was reasonably resistant to the effects of noise and to the effects of the noisy (pathological) voice signal. Jitter and shimmer estimates generally increased as noise floors elevated. The greatest amount of measurement error was found for the pathological female voice when captured in the presence of environmental noise. Findings are discussed relative to clinical issues surrounding measurement error.  相似文献   

9.
Although the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope and of noise levels are both influenced by complex interactions between the spectral shape and relative levels of harmonic and noise energy in the voice source. Just-noticeable differences (JNDs) for the noise-to-harmonics ratio (NHR) varied significantly with the NHR and harmonic spectral slope, but NHR had no effect on JNDs for NHR when harmonic slopes were steepest, and harmonic slope had no effect when NHRs were highest. Perception of changes in the harmonic source slope depended on NHR and on the harmonic source slope: JNDs increased when spectra rolled off steeply, with this effect in turn depending on NHR. Finally, all effects were modulated by the shape of the noise spectrum. It thus appears that, beyond masking, understanding perception of individual parameters requires knowledge of the acoustic context in which they function, consistent with the view that voices are integral patterns that resist decomposition.  相似文献   

10.
Spectral analysis of vowels during connected speech can be performed using the spectral intensity distribution within critical bands corresponding to a natural scale on the basilar membrane. Normalization of the spectra provides the opportunity to make objective comparisons independent from the recording level. An increasing envelope peak between 3,150 and 3,700 Hz has been confirmed statistically for a combination of seven vowels in three groups of male speakers with hoarse, normal, and professional voices. Each vowel is also analyzed individually. The local energy maximum is called “the speaker's formant” and can be found in the region of the fourth formant. The steepness of the spectral slope (i.e. the rate of decline) becomes less pronounced when the sonority or the intensity of the voice increases. The speaker's formant is connected with the sonorous quality of the voice. It increases gradually and is approximately 10 dB higher in professional male voices than in normal male voices at neutral loudness (60 dB at 0.3 min). The peak intensity becomes stronger (30 dB above normal voices) when the overall speaking loudness is increased to 80 dB. Shouting increases the spectral energy of the adjacent critical bands but not the speaker's formant itself.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号