首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 672 毫秒
1.
Perception of breathy voice quality appears to be cued by changes in the vowel spectrum. These changes are related to alterations in the intensity of aspiration noise and spectral slope of the harmonic energy [Shrivastav and Sapienza, J. Acoust. Soc. Am., 114 (4), 2217-2224 (2003)]. Ten young-adult listeners with normal hearing were tested using an adaptive listening task to determine the smallest change in signal-to-noise ratio that resulted in a change in breathiness. Six vowel continua, three female and three male, were generated using a Klatt synthesizer and served as stimuli. Results showed that listeners needed as much as 20-dB increase in aspiration noise to perceive a change in breathiness against a relatively normal voice. In contrast, listeners needed approximately an 11-dB increase in aspiration noise to discriminate breathiness against a severely breathy voice. The difference limens for breathiness were observed to vary across the six talkers. Voices having aspiration noise that was predominantly in the high frequencies had smaller difference limens. No significant differences for male and female voice were observed.  相似文献   

2.
A phonetogram is a graph showing the sound pressure level (SPL) of softest and loudest phonation over the entire fundamental frequency range of a voice. A physiological interpretation of a phonetogram is facilitated if the SPL is measured with a flat frequency curve and if the vowel /a/ is used. It was found that in soft phonation, the SPL is mainly dependent on the amplitude of the fundamental, while in loud phonation, the SPL is mainly determined by overtones. The short-term SPL variation, i.e., the level variation within a tone, was about 5 dB in soft phonation and close to 2 dB in loud phonation. For two normal voices the long-term SPL variation, calculated as the mean standard deviation of SPL for day-to-day variation, was found to be between 2.4 and 3.4 dB in soft and loud phonation. Speakers who raise their loudness of phonation also tend to raise their mean voice fundamental frequency. Measures obtained from speaking at various voice levels were combined so that typical pathways could be introduced into the phonetogram. The average slope of these pathways was 0.3–0.5 st/dB for healthy subjects. Averaged phonetograms for male singers and male nonsingers did not differ significantly, but averaged phonetograms for female singers and female nonsingers did, in that the upper contour was higher for the female singers. Averaged phonetograms for female patients with non-organic dysphonia showed significantly lower SPL values in loudest phonation as compared to healthy female subjects, while no corresponding difference was seen for males in this regard. With respect to the SPL values for softest phonation, male dysphonic patients showed significantly higher SPL values than healthy male subjects, while no corresponding difference was seen in female subjects. The subglottal pressure mirrored these phonetogram differences between healthy and pathological voices. The averaged phonetograms of female patients after voice therapy showed an increased similarity with those of normal voices. For the male patients the averaged phonetogram did not change significantly after therapy.  相似文献   

3.
Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.  相似文献   

4.
The purpose of this study was to examine the algorithm-measuring capabilities used in the Time-Frequency Analysis Software Program for 32-bit Windows (TF32) for measuring fundamental frequency (F0), its dependent measures, and signal-to-noise ratio (SNR). The stability, accuracy, and linearity of its algorithm to systematic changes in aspiration noise and/or spectral slope (to mimic the perceptual characteristics of breathiness, roughness, and hoarseness) were evaluated using its analysis output to five female and five male synthesized voices. TF32 was used to calculate F0, Jitter%, Shimmer%, and SNR for each of the synthesized signals. The findings indicate that although TF32 produced stable results for male synthesized samples, they were not accurate when measuring F0, Jitter%, and Shimmer% with the addition of noise and variations in open quotient independently and in combination. In contrast, TF32 was neither stable nor accurate in making the same measurements for female synthesized samples. However, TF32 was stable and accurate in measuring SNR for male and most of female voices. These results point to an inappropriate F0 extraction algorithm in TF32 and stress the need for further research to remediate the algorithm or to identify a superior one.  相似文献   

5.
Two experiments compared the effect of supplying visual speech information (e.g., lipreading cues) on the ability to hear one female talker's voice in the presence of steady-state noise or a masking complex consisting of two other female voices. In the first experiment intelligibility of sentences was measured in the presence of the two types of maskers with and without perceived spatial separation of target and masker. The second study tested detection of sentences in the same experimental conditions. Results showed that visual cues provided more benefit for both recognition and detection of speech when the masker consisted of other voices (versus steady-state noise). Moreover, visual cues provided greater benefit when the target speech and masker were spatially coincident versus when they appeared to arise from different spatial locations. The data obtained here are consistent with the hypothesis that lipreading cues help to segregate a target voice from competing voices, in addition to the established benefit of supplementing masked phonetic information.  相似文献   

6.
The speech-reception threshold (SRT) for sentences presented in a fluctuating interfering background sound of 80 dBA SPL is measured for 20 normal-hearing listeners and 20 listeners with sensorineural hearing impairment. The interfering sounds range from steady-state noise, via modulated noise, to a single competing voice. Two voices are used, one male and one female, and the spectrum of the masker is shaped according to these voices. For both voices, the SRT is measured as well in noise spectrally shaped according to the target voice as shaped according to the other voice. The results show that, for normal-hearing listeners, the SRT for sentences in modulated noise is 4-6 dB lower than for steady-state noise; for sentences masked by a competing voice, this difference is 6-8 dB. For listeners with moderate sensorineural hearing loss, elevated thresholds are obtained without an appreciable effect of masker fluctuations. The implications of these results for estimating a hearing handicap in everyday conditions are discussed. By using the articulation index (AI), it is shown that hearing-impaired individuals perform poorer than suggested by the loss of audibility for some parts of the speech signal. Finally, three mechanisms are discussed that contribute to the absence of unmasking by masker fluctuations in hearing-impaired listeners. The low sensation level at which the impaired listeners receive the masker seems a major determinant. The second and third factors are: reduced temporal resolution and a reduction in comodulation masking release, respectively.  相似文献   

7.
It can be difficult for the voice clinician to observe or measure how a patient uses his voice in a noisy environment. We consider here a novel method for obtaining this information in the laboratory. Worksite noise and filtered white noise were reproduced over high-fidelity loudspeakers. In this noise, 11 subjects read an instructional text of 1.5 to 2 minutes duration, as if addressing a group of people. Using channel estimation techniques, the site noise was suppressed from the recording, and the voice signal alone was recovered. The attainable noise rejection is limited only by the precision of the experimental setup, which includes the need for the subject to remain still so as not to perturb the estimated acoustic channel. This feasibility study, with 7 female and 4 male subjects, showed that small displacements of the speaker's body, even breathing, impose a practical limit on the attainable noise rejection. The noise rejection was typically 30 dB and maximally 40 dB down over the entire voice spectrum. Recordings thus processed were clean enough to permit voice analysis with the long-time average spectrum and the computerized phonetogram. The effects of site noise on voice sound pressure level, fundamental frequency, long-term average spectrum centroid, phonetogram area, and phonation time were much as expected, but with some interesting differences between females and males.  相似文献   

8.
The purpose of this study was to examine the influence of noise on voice profile statistics from female samples. Six young adult females served as subjects. Five had normal voices; one had a pathological voice with accompanying bilateral vocal nodules. Each female subject was required to match a generated 235 Hz tone (+/- 2 Hz) while maintaining a constant output level of 70 dB SPL (+/- 5 dB). Data collected from a previous study involving a normal male subject were included for comparative purposes. Noise was generated from a personal computer fan which had a strong center frequency component at 235 Hz. Six different A-weighted signal-to-noise [S/N(A)] conditions were created, ranging in 5 dB increments from 25 to 0 dB. Results revealed that fundamental frequency was reasonably resistant to the effects of noise and to the effects of the noisy (pathological) voice signal. Jitter and shimmer estimates generally increased as noise floors elevated. The greatest amount of measurement error was found for the pathological female voice when captured in the presence of environmental noise. Findings are discussed relative to clinical issues surrounding measurement error.  相似文献   

9.
Although the amount of inharmonic energy (noise) present in a human voice is an important determinant of vocal quality, little is known about the perceptual interaction between harmonic and inharmonic aspects of the voice source. This paper reports three experiments investigating this issue. Results indicate that perception of the harmonic slope and of noise levels are both influenced by complex interactions between the spectral shape and relative levels of harmonic and noise energy in the voice source. Just-noticeable differences (JNDs) for the noise-to-harmonics ratio (NHR) varied significantly with the NHR and harmonic spectral slope, but NHR had no effect on JNDs for NHR when harmonic slopes were steepest, and harmonic slope had no effect when NHRs were highest. Perception of changes in the harmonic source slope depended on NHR and on the harmonic source slope: JNDs increased when spectra rolled off steeply, with this effect in turn depending on NHR. Finally, all effects were modulated by the shape of the noise spectrum. It thus appears that, beyond masking, understanding perception of individual parameters requires knowledge of the acoustic context in which they function, consistent with the view that voices are integral patterns that resist decomposition.  相似文献   

10.
Subglottal pressure is one of the main voice control factors, controlling vocal loudness. In this investigation the effects of subglottal pressure variation on the voice source in untrained female and male voices phonating at a low, a middle, and a high fundamental frequency are analyzed. The subjects produced a series of /pae/ syllables at varied degrees of vocal loudness, attempting to keep pitch constant. Subglottal pressure was estimated from the oral pressure during the /p/ occlusion. Ten subglottal pressure values, approximately equidistantly spaced within the pressure range used, were identified, and the voice source of the vowels following these pressure values was analyzed by inverse filtering the airflow signal as captured by a Rothenberg mask. The maximum flow declination rate (MFDR) was found to increase linearly with subglottal pressure, but a given subglottal pressure produced lower values for female than for male voices. The closed quotient increased quickly with subglottal pressure at low pressures and slowly at high pressures, such that the relationship can be approximated by a power function. For a given subglottal pressure value, female voices reached lower values of closed quotient than male voices.  相似文献   

11.
Spectral analysis of vowels during connected speech can be performed using the spectral intensity distribution within critical bands corresponding to a natural scale on the basilar membrane. Normalization of the spectra provides the opportunity to make objective comparisons independent from the recording level. An increasing envelope peak between 3,150 and 3,700 Hz has been confirmed statistically for a combination of seven vowels in three groups of male speakers with hoarse, normal, and professional voices. Each vowel is also analyzed individually. The local energy maximum is called “the speaker's formant” and can be found in the region of the fourth formant. The steepness of the spectral slope (i.e. the rate of decline) becomes less pronounced when the sonority or the intensity of the voice increases. The speaker's formant is connected with the sonorous quality of the voice. It increases gradually and is approximately 10 dB higher in professional male voices than in normal male voices at neutral loudness (60 dB at 0.3 min). The peak intensity becomes stronger (30 dB above normal voices) when the overall speaking loudness is increased to 80 dB. Shouting increases the spectral energy of the adjacent critical bands but not the speaker's formant itself.  相似文献   

12.
The third formant (F3) of /a/ recorded from 209 healthy children (104 male and 105 female; ages 3 to 12 years) and 40 adults (20 men and 20 women) was studied by spectral analysis. Contrary to the traditional concept, the results of this study showed that there is significant difference in voice F3 of /a/ between male and female children. This difference was found to begin to develop at the age of 3 years and became substantial by the age of 6 years. In this study, the value of F3 obtained from female children at the age of 6 years was unexpectedly higher than that from the male children at the same age, which indicates that there is a difference in timbre in small children of both sexes.  相似文献   

13.
Functional (nonorganic) dysphonia is often characterized by vocal instability. The purpose of the prospective study was to examine whether there is a difference in vocal instability of functional dysphonic voices compared with healthy ones, this means whether electroglottographic perturbation values differ (1) between healthy and dysphonic voices and (2) between two subgroups of the dysphponic voices (hpertonic and hypotonic dysphonic voices). Twenty-three patients with hypertonic functional dysphonia, 9 with hypotonic functional dysphonia and 31 healthy nonsmokers, were each examined electroglottographically before (Ex 1), immediately after (Ex 2), and 1 hour after (Ex 3) voice loading. Perturbations of frequency, amplitude, quasi-open-quotient, and contact-index were calculated from the EGG signal. At all three times of examination, hypertonic dysphonic voices showed higher perturbations than healthy voices, and they had higher perturbations than hypotonic dysphonic voices before and 1 hour after voice loading. Hypotonic dysphonic voices showed higher perturbations than healthy voices only 1 hour after voice loading. Voice loading induced different reactions in dysphonic voices: Some voices showed increased perturbations, and others exhibited normal or even decreased perturbation immediately after voice loading. Examination of electroglottographic-derived perturbations immediately after voice loading seems not to be useful. Differentiation of hypertonic and hypotonic dysphonic voices was possible with an estimated sensitivity of 88.9% and a specificity of 87.0% by using the sum of the amplitude-perturbation and the quasi-open-quotient-perturbation measured before voice loading.  相似文献   

14.
Speech recognition in noisy environments improves when the speech signal is spatially separated from the interfering sound. This effect, known as spatial release from masking (SRM), was recently shown in young children. The present study compared SRM in children of ages 5-7 with adults for interferers introducing energetic, informational, and/or linguistic components. Three types of interferers were used: speech, reversed speech, and modulated white noise. Two female voices with different long-term spectra were also used. Speech reception thresholds (SRTs) were compared for: Quiet (target 0 degrees front, no interferer), Front (target and interferer both 0 degrees front), and Right (interferer 90 degrees right, target 0 degrees front). Children had higher SRTs and greater masking than adults. When spatial cues were not available, adults, but not children, were able to use differences in interferer type to separate the target from the interferer. Both children and adults showed SRM. Children, unlike adults, demonstrated large amounts of SRM for a time-reversed speech interferer. In conclusion, masking and SRM vary with the type of interfering sound, and this variation interacts with age; SRM may not depend on the spectral peculiarities of a particular type of voice when the target speech and interfering speech are different sex talkers.  相似文献   

15.
Efficiency of automatic recognition of male and female voices based on solving the inverse problem for glottis area dynamics and for waveform of the glottal airflow volume velocity pulse is studied. The inverse problem is regularized through the use of analytical models of the voice excitation pulse and of the dynamics of the glottis area, as well as the model of one-dimensional glottal airflow. Parameters of these models and spectral parameters of the volume velocity pulse are considered. The following parameters are found to be most promising: the instant of maximum glottis area, the maximum derivative of the area, the slope of the spectrum of the glottal airflow volume velocity pulse, the amplitude ratios of harmonics of this spectrum, and the pitch. On the plane of the first two main components in the space of these parameters, an almost twofold decrease in the classification error relative to that for the pitch alone is attained. The male voice recognition probability is found to be 94.7%, and the female voice recognition probability is 95.9%.  相似文献   

16.
Listeners are more likely to hear a synthetic fricative ambiguous between /s/ and /integral/ as /integral/ if it is appended to a woman's voice than a man's voice [Strand and Johnson, in Natural Language Processing and Speech Technology: Results of the 3rd KONVENS Conference (Mouton de Gruyter, Berlin, 1996), pp. 14-26]. This study expanded on this finding by replicating the result with a much larger group of male and female talkers than had been examined previously, by examining whether phonetic context mediates the influence of talker sex on fricative identification, and by examining whether talkers' perceived sexual orientation influences fricative identification. Stimuli were created by pairing a synthetic nine-step /s/-/integral/ continuum with tokens of /ae k/ and /Ip/ taken from productions of shack and ship by 44 talkers whose perceived sexual orientation had been reported previously [Munson et al., J. Phonetics (in press)]. Listeners participated in a series of two-alternative sack-shack and sip-ship identification experiments. Listeners identified more /integral/ tokens for women's voices than for men's voices for both continua. Lesbian/bisexual-sounding women elicited more sack and sip responses than heterosexual-sounding women. No consistent influence of perceived sexual orientation on fricative identification was noted for men's voices. Results suggest that listeners are sensitive to the association between fricatives' center frequencies and perceived sexual orientation in women's voices, but not in men's voices.  相似文献   

17.
The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.  相似文献   

18.
Experiments on disordered voice quality with multidimensional scaling (MDS) have resulted in solutions with low R-square and have failed to show consistent dimensions across different listeners. These findings have been suggested to indicate large individual differences in the perception of voice quality. However, these inconsistencies may originate from several factors, including random stimulus selection, instructions that encourage listeners to respond to global difference in pairs of voices, and noisy perceptual data. This experiment used MDS techniques to study individual differences in perception of breathiness. The voices in the experiment were selected to have a relatively wide variation in breathiness but only minimal variation in roughness, strain, and fundamental frequency. Additionally, listeners were instructed specifically to rate similarities in breathiness rather than judging global differences in voices, and several judgments from each listener were averaged to minimize noise in the data. It was hypothesized that these modifications would result in an MDS solution that accounted for greater variance in perceptual data than previously shown. Results show that averaging multiple responses from each listener increased the R-square from 45% to approximately 75%. The poor R-square and large individual differences in voice quality perception observed in past research may have partly resulted from the experimental procedures in previous studies. These findings suggest that individual differences in the perception of voice quality are not as large as previously thought, and a model of voice quality perception for an "average" listener may be a good representation for the general population.  相似文献   

19.
Two experiments investigated the effect of reverberation on listeners' ability to perceptually segregate two competing voices. Culling et al. [Speech Commun. 14, 71-96 (1994)] found that for competing synthetic vowels, masked identification thresholds were increased by reverberation only when combined with modulation of fundamental frequency (F0). The present investigation extended this finding to running speech. Speech reception thresholds (SRTs) were measured for a male voice against a single interfering female voice within a virtual room with controlled reverberation. The two voices were either (1) co-located in virtual space at 0 degrees azimuth or (2) separately located at +/-60 degrees azimuth. In experiment 1, target and interfering voices were either normally intonated or resynthesized with a fixed F0. In anechoic conditions, SRTs were lower for normally intonated and for spatially separated sources, while, in reverberant conditions, the SRTs were all the same. In experiment 2, additional conditions employed inverted F0 contours. Inverted F0 contours yielded higher SRTs in all conditions, regardless of reverberation. The results suggest that reverberation can seriously impair listeners' ability to exploit differences in F0 and spatial location between competing voices. The levels of reverberation employed had no effect on speech intelligibility in quiet.  相似文献   

20.
The correlation structures in 15 Bach’s sinfonias were analyzed. Each sinfonia is characterized by the superposition of three voices. Each voice is a sequence of pitches. Each voice was transformed in a time series, in which the sampling time was given by the smallest pitch duration in that voice. The scaling properties of the three voices of each sinfonia was quantified by means of the estimate of the scaling exponent, performed using the power spectral density (PSD) and the detrended fluctuation analysis (DFA). The results show that the voice time series are persistent. The DFA was applied not only to any single voice time series, but also to couples (2-DFA) of voices and to the triple (3-DFA) of voices. It was found that the first voice of each sinfonia modulates the scaling behavior of the whole sinfonia.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号