首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 515 毫秒
1.
In this paper, we investigated the acoustic characteristics of sustained and running vowels from normal subjects and patients with laryngeal pathologies. Perturbation methods (including jitter and shimmer), signal-to-noise ratio (SNR), and nonlinear dynamic methods (such as correlation dimension and second-order entropy) were used to analyze sustained and running vowels. We found that the sustained vowels and running voices from normal subjects and patients with laryngeal pathologies had low-dimensional dynamic characteristics. For sustained vowels, the analyses of jitter, shimmer, correlation dimension, and second-order entropy revealed significant differences between normal and pathological voices. For running voices, jitter and shimmer did not statistically discriminate between normal and pathological voices, but a significant difference was found for SNR, correlation dimension, and second-order entropy. The results suggest that nonlinear dynamic analysis and traditional SNR analysis may be valuable for the analysis of sustained and running vowels; perturbation analysis may be applicable for the analysis of sustained vowels but should be applied with caution for running voice analysis.  相似文献   

2.
Spectral analysis of vowels during connected speech can be performed using the spectral intensity distribution within critical bands corresponding to a natural scale on the basilar membrane. Normalization of the spectra provides the opportunity to make objective comparisons independent from the recording level. An increasing envelope peak between 3,150 and 3,700 Hz has been confirmed statistically for a combination of seven vowels in three groups of male speakers with hoarse, normal, and professional voices. Each vowel is also analyzed individually. The local energy maximum is called “the speaker's formant” and can be found in the region of the fourth formant. The steepness of the spectral slope (i.e. the rate of decline) becomes less pronounced when the sonority or the intensity of the voice increases. The speaker's formant is connected with the sonorous quality of the voice. It increases gradually and is approximately 10 dB higher in professional male voices than in normal male voices at neutral loudness (60 dB at 0.3 min). The peak intensity becomes stronger (30 dB above normal voices) when the overall speaking loudness is increased to 80 dB. Shouting increases the spectral energy of the adjacent critical bands but not the speaker's formant itself.  相似文献   

3.
Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages.  相似文献   

4.
Electrical field interaction caused by current spread in a cochlear implant was modeled in an explicit way in an acoustic model (the SPREAD model) presented to six listeners with normal hearing. The typical processing of cochlear implants was modeled more closely than in traditional acoustic models by careful selection of parameters related to current spread or parameters that could amplify the electrical field interactions caused by current spread. These parameters were the insertion depth, electrode spacing, electrical dynamic range, and dynamic range compression function. The hypothesis was that current spread could account for the asymptote in performance in speech intelligibility experiments observed at around seven stimulation channels in a number of cochlear implant studies. Speech intelligibility for sentences, vowels, and consonants at three noise levels (SNR of +15 dB, +10 dB, and +5 dB) was measured as a function of the number of spectral channels (4, 7, and 16). The SPREAD model appears to explain the asymptote in speech intelligibility at seven channels for all noise levels for all speech material used in this study. It is shown that the compressive amplitude mapping used in cochlear implants can have a detrimental effect on the number of effective channels.  相似文献   

5.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

6.
In the n-of-m strategy, the signal is processed through m bandpass filters from which only the n maximum envelope amplitudes are selected for stimulation. While this maximum selection criterion, adopted in the advanced combination encoder strategy, works well in quiet, it can be problematic in noise as it is sensitive to the spectral composition of the input signal and does not account for situations in which the masker completely dominates the target. A new selection criterion is proposed based on the signal-to-noise ratio (SNR) of individual channels. The new criterion selects target-dominated (SNR > or = 0 dB) channels and discards masker-dominated (SNR<0 dB) channels. Experiment 1 assessed cochlear implant users' performance with the proposed strategy assuming that the channel SNRs are known. Results indicated that the proposed strategy can restore speech intelligibility to the level attained in quiet independent of the type of masker (babble or continuous noise) and SNR level (0-10 dB) used. Results from experiment 2 showed that a 25% error rate can be tolerated in channel selection without compromising speech intelligibility. Overall, the findings from the present study suggest that the SNR criterion is an effective selection criterion for n-of-m strategies with the potential of restoring speech intelligibility.  相似文献   

7.
Speech recognition in noise improves with combined acoustic and electric stimulation compared to electric stimulation alone [Kong et al., J. Acoust. Soc. Am. 117, 1351-1361 (2005)]. Here the contribution of fundamental frequency (F0) and low-frequency phonetic cues to speech recognition in combined hearing was investigated. Normal-hearing listeners heard vocoded speech in one ear and low-pass (LP) filtered speech in the other. Three listening conditions (vocode-alone, LP-alone, combined) were investigated. Target speech (average F0=120 Hz) was mixed with a time-reversed masker (average F0=172 Hz) at three signal-to-noise ratios (SNRs). LP speech aided performance at all SNRs. Low-frequency phonetic cues were then removed by replacing the LP speech with a LP equal-amplitude harmonic complex, frequency and amplitude modulated by the F0 and temporal envelope of voiced segments of the target. The combined hearing advantage disappeared at 10 and 15 dB SNR, but persisted at 5 dB SNR. A similar finding occurred when, additionally, F0 contour cues were removed. These results are consistent with a role for low-frequency phonetic cues, but not with a combination of F0 information between the two ears. The enhanced performance at 5 dB SNR with F0 contour cues absent suggests that voicing or glimpsing cues may be responsible for the combined hearing benefit.  相似文献   

8.
This study focuses on speaking voice quality in male teachers (n = 35) and male actors (n = 36), who represent untrained and trained voice users, because we wanted to investigate normal and supranormal voices. In this study, both substantial and methodologic aspects were considered. It includes a method for perceptual voice evaluation, and a basic issue was rater reliability. A listening group of 10 listeners, 7 experienced speech-language therapists, and 3 speech-language therapist students evaluated the voices by 15 vocal characteristics using VA scales. Two sets of voice signals were investigated: text reading (2 loudness levels) and sustained vowel (3 levels). The results indicated a high interrater reliability for most perceptual characteristics. Connected speech was evaluated more reliably, especially at the normal level, but both types of voice signals were evaluated reliably, although the reliability for connected speech was somewhat higher than for vowels. Experienced listeners tended to be more consistent in their ratings than did the student raters. Some vocal characteristics achieved acceptable reliability even with a smaller panel of listeners. The perceptual characteristics grouped in 4 factors reflected perceptual dimensions.  相似文献   

9.
Two experiments investigated the effect of reverberation on listeners' ability to perceptually segregate two competing voices. Culling et al. [Speech Commun. 14, 71-96 (1994)] found that for competing synthetic vowels, masked identification thresholds were increased by reverberation only when combined with modulation of fundamental frequency (F0). The present investigation extended this finding to running speech. Speech reception thresholds (SRTs) were measured for a male voice against a single interfering female voice within a virtual room with controlled reverberation. The two voices were either (1) co-located in virtual space at 0 degrees azimuth or (2) separately located at +/-60 degrees azimuth. In experiment 1, target and interfering voices were either normally intonated or resynthesized with a fixed F0. In anechoic conditions, SRTs were lower for normally intonated and for spatially separated sources, while, in reverberant conditions, the SRTs were all the same. In experiment 2, additional conditions employed inverted F0 contours. Inverted F0 contours yielded higher SRTs in all conditions, regardless of reverberation. The results suggest that reverberation can seriously impair listeners' ability to exploit differences in F0 and spatial location between competing voices. The levels of reverberation employed had no effect on speech intelligibility in quiet.  相似文献   

10.
提出了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法,用于在低信噪比环境下的语音信号预处理。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10 dB到10 dB的各种信噪比环境下,采用PSSB参数的端点检测算法,相对于其它端点检测算法,更有效地检测出语音的端点。在-10 dB的极低信噪比下,提出的方法仍然有75.2%的正确率。采用PSSB参数的端点检测算法,更适合于低信噪比白噪声环境下的语音端点检测。   相似文献   

11.
The MPEG-1 Layer 3 compression schema of audio signal, commonly known as mp3, has caused a great impact in recent years as it has reached high compression rates while conserving a high sound quality. Music and speech samples compressed at high bitrates are perceptually indistinguishable from the original samples, but very little was known about how compression acoustically affects the voice signal. A previous work with normal voices showed a high fidelity at high-bitrate compressions both in voice parameters and the amplitude-frequency spectrum. In the present work, dysphonic voices were tested through two studies. In the first study, spectrograms, long-term average spectra (LTAS), and fast Fourier transform (FFT) spectra of compressed and original samples of running speech were compared. In the second study, intensities, formant frequencies, formant bandwidths, and a multidimensional set of voice parameters were tested in a set of sustained phonations. Results showed that compression at high bitrates (96 and 128 kbps) preserved the relevant acoustic properties of the pathological voices. With compressions at lower bitrates, fidelity decreases, introducing some important alterations. Results from both works, Gonzalez and Cervera and this paper, open up the possibility of using MPEG-compression at high bitrates to store or transmit high-quality speech recordings, without altering their acoustic properties.  相似文献   

12.
The overall slope of long-term-average spectrum (LTAS) decreases if vocal loudness increases. Therefore, changes of vocal loudness also affects the alpha measure, defined as the ratio of spectrum intensity above and below 1000 Hz. The effect on alpha of loudness variation was analyzed in 15 male and 16 female voices reading a text at different degrees of vocal loudness. The mean range of equivalent sound level (L(eq)) amounted to about 28 dB and the mean range of alpha to 19.0 and 11.7 dB for the female and male subjects. The L(eq) vs. alpha relationship could be approximated with a quadratic function, or by a linear equation, if softest phonation was excluded. Using such equations alpha was computed for all values of L(eq) observed for each subject and compared with observed values. The maximum and the mean absolute errors were 2.4 dB and between 0.1 and 0.6 dB. When softest phonation was disregarded and linear equations were used, the maximum error was less than 2 dB and the mean absolute errors were between 0.2 and 0.7 dB. The strong correlation between L(eq) and alpha indicates that for a voice L(eq) can be used for predicting alpha.  相似文献   

13.
The Perception Spectrogram Structure Boundary(PSSB)parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment.  相似文献   

14.
I.IntroductionKa1manfilteringisjustamethodtoestimatestatistica1lythestateoftheobservedsystemfromthecorruptedsigna1s,andthiskindofcstimationisarecurrcneeestimationbasedon1inear,nonbiasandminimumvariance.Moreover,Ka1manfilteringisapplicabletonon-sta-honarysignalsandtime-variantdynamicsystem.Therefore,Kalmanfilteringisveryapplica-bletoenhancingthespeechsigna1sthatarecorruptedbynoise.ThispaperreportStheconcretcmethodofenhanccmentofnoisyspccchanditscxperimentresults.Experimentsindicate:Afterthes…  相似文献   

15.
The effects of vowels on voice perturbation measures   总被引:1,自引:0,他引:1  
This study examines voice perturbation parameters of the sustained [a] in English and of the eight vowels in Turkish to discover whether any difference exists between these languages, and whether a correlation exists between voice perturbation parameters and articulatory and acoustic properties of the Turkish vowels. Eight Turkish vowels uttered by 26 healthy nonsmoker volunteer males who are native Turkish speakers were compared with a voice database that includes samples of normal and disordered voices belonging to American English speakers. Fundamental frequencies, the first and second formants, and perturbation parameters, such as jitter percent, pitch perturbation quotient, shimmer percent, and amplitude perturbation quotient of the sustained vowels, were measured. Also, the first and second formants of the sustained [a] in English were measured, and other parameters have been obtained from the database. When the voice perturbation parameters in Turkish and English were compared, statistically significant differences were not found. However, when Turkish vowels compared with each other, statistically significant differences were found among perturbation values. Categorical comparisons of the Turkish vowels like high-low, rounded-unrounded, and front-back revealed significant differences in perturbation values. In correlation analysis, a weak linear inverse relation between jitter percent and the first formant (r=-0.260, p<0.05) was found.  相似文献   

16.
Spectrum factors relevant to phonetogram measurement   总被引:3,自引:0,他引:3  
Phonetograms showing the sound-pressure level (SPL) in loudest and softest possible phonation are frequently used in some voice clinics as an aid for describing the status of voice function. Spectrum analysis of the vowel /a/ produced by ten females and ten males with healthy, untrained voices revealed that the fundamental was mostly the strongest spectrum partial in soft phonation while the loudest partial in loud phonation was generally an overtone. Also, the first-formant frequency was generally lower in soft than in loud phonation. Measuring SPL in dB(A) rather than in dB lowered the phonetogram contour for soft phonation, an effect increasing with decreasing fundamental frequency. SPL measurements on a group of 22 females with healthy voices showed that the vowel /a/ gave higher SPL values than other vowels in loud phonation. The effect of using dB rather the dB(A) was great but similar for all vowels in soft phonation while, in loud phonation, the effect was small, particularly for /a/. In dB, the effect of using different vowels amounts to about +/- 5 dB, approximately. Interpretation of a phonetogram in terms of voice physiology is facilitated if SPL is given in dB and if a vowel with a high first-formant frequency is used.  相似文献   

17.
Acceleration target detection based on LFM radar   总被引:1,自引:0,他引:1  
In radar systems, the echo signal caused by an accelerated target can be similarly considered as linear frequency modulation (LFM) signal. In high signal-to-noise ratio (SNR), discrete polynomial-phase transform (DPT) algorithm can be used to detect the echo signal, as it has low computation complexity and high real-time performance. However, in low SNR, the DPT algorithm has a large mean square error of the rate of frequency modulation and a low detection probability. In order to detect LFM signal in low SNR, this paper proposes a detection method, segment discrete polynomial-phase transform (SDPT), which means, at first, dividing the whole echo pulses into several segments with same duration in time domain, and then, using coherent accumulation method of DFT to segments, at last, processing this signal with DPT in intra-segment. In the case of a large number of segments, the SDPT can improve the output SNR. In addition, in a certain SNR, to the target signal with big sampling interval, large acceleration and less segments, this paper proposes an algorithm to detect the LFM signal generated from the combination of an improved DPT (IDPT) and fractional Fourier transform (FRFT). The output SNR of this algorithm is connected with the length of time delay. In the simulation, when the length of the time delay is 0.2 N, the output SNR is 2.5 dB more than that which results from directly using DPT. Finally, the detection performance and algorithm complexity of the proposed algorithm were analyzed, and the simulated and measured data verify the effectiveness of the algorithm.  相似文献   

18.
Many studies have described and analyzed the singer's formant. A similar phenomenon produced by trained speakers led some authors to examine the speaker's ring. If we consider these phenomena as resonance effects associated with vocal tract adjustments and training, can we hypothesize that trained singers can carry over their singing formant ability into speech, also obtaining a speaker's ring? Can we find similar differences for energy distribution in continuous speech? Forty classically trained singers and forty untrained normal speakers performed an all-voiced reading task and produced a sample of a sustained spoken vowel /a/. The singers were also requested to perform a sustained sung vowel /a/ at a comfortable pitch. The reading was analyzed by the long-term average spectrum (LTAS) method. The sustained vowels were analyzed through power spectrum analysis. The data suggest that singers show more energy concentration in the singer's formant/speaker's ring region in both sung and spoken vowels. The singers' spoken vowel energy in the speaker's ring area was found to be significantly larger than that of the untrained speakers. The LTAS showed similar findings suggesting that those differences also occur in continuous speech. This finding supports the value of further research on the effect of singing training on the resonance of the speaking voice.  相似文献   

19.
单耳通信时,周边噪声对语言可懂度产生影响。针对信号侧语音信号强度70dB时,研究3种不同类型噪声下,干扰侧不同强度噪声和信号侧不同信噪比情况的语言可懂度。实验结果表明:当信号侧信噪比大于某一阈值时,干扰侧噪声对可懂度不产生显著影响,该阈值同噪声类型有关;而在信号侧低信噪比的情形下,干扰侧适当强度噪声可提高信号侧语言可懂度,最佳干扰噪声级为78—82dB,过大的干扰侧噪声级导致可懂度下降。基于心理声学和生理学的初步机理发现:噪声环境下的语音识别中,对侧耳中耳肌肉伸缩对噪声感知的抑制提高了信号侧语言可懂度。   相似文献   

20.
Previous non-invasive brain research has reported auditory cortical sensitivity to periodicity as reflected by larger and more anterior responses to periodic than to aperiodic vowels. The current study investigated whether there is a lower fundamental frequency (F0) limit for this effect. Auditory evoked fields (AEFs) elicited by natural-sounding 400 ms periodic and aperiodic vowel stimuli were measured with magnetoencephalography. Vowel F0 ranged from normal male speech (113 Hz) to exceptionally low values (9 Hz). Both the auditory N1m and sustained fields were larger in amplitude for periodic than for aperiodic vowels. The AEF sources for periodic vowels were also anterior to those for the aperiodic vowels. Importantly, the AEF amplitudes and locations were unaffected by the F0 decrement of the periodic vowels. However, the N1m latency increased monotonically as F0 was decreased down to 19 Hz, below which this trend broke down. Also, a cascade of transient N1m-like responses was observed in the lowest F0 condition. Thus, the auditory system seems capable of extracting the periodicity even from very low F0 vowels. The behavior of the N1m latency and the emergence of a response cascade at very low F0 values may reflect the lower limit of pitch perception.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号