首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到10条相似文献,搜索用时 171 毫秒
1.
A new method "simultaneous inverse filtering and model matching" (SIM) is proposed that allows one to calculate voice source measures without any user interaction. It is based on the discrete all-pole modeling (DAP) technique for inverse filtering (IF), which is modified to include a model of the glottal flow as integral part [LF model, Fant et al., STL-QPSR (Stockholm) 4/1985, 1-13 (1986)]. As the correct LF parameters are initially unknown, they are estimated in an iterative procedure using multi-dimensional optimization techniques that are initialized according to the results of an exhaustive search. The error criteria applied reflect how well the IF is performed after the spectral contribution of the glottal flow has been removed. The resulting optimal LF parameter constellation serves as the basis to calculate 11 voice source measures. The performance was evaluated using synthesized signals and recordings of natural utterances. For the synthesized signals, the accuracy to reproduce the original parameters was high (correlations exceeding 0.88) for measures where the starting point of the glottal cycle did not enter explicitly. Errors were smaller compared to conventional estimation methods where the measures were estimated from the IF signal. The analysis of natural utterances indicates that problems still exist with regard to robustness, but that under advantageous conditions the open quotient, the speed quotient, the closing quotient, the parabolic spectral parameter, and the negative peak amplitude of the glottal flow derivative can indeed be determined automatically by the SIM method.  相似文献   

2.
《Journal of voice》2019,33(6):945.e19-945.e25
Three electroglottographic parameters, fundamental frequency, contact quotient, and speed quotient were analyzed for two singers of Young girl role in Kunqu Opera. Each singer performed three conditions, singing, stage speech, and reading lyrics. The phonation types adopted in different conditions were explored based on electroglottographic parameters. Fundamental frequency, contact quotient, and speed quotient showed different distributions among conditions. Five phonation types were used in singing and stage speech, which include (1) breathy voice, (2) modal voice with low degree of posterior glottal adduction, (3) modal voice, (4) falsetto, and (5) falsetto with high degree of posterior glottal adduction. The phonation strategies partly showed differences between singers. Different phonation type collocations were employed in singing and stage speech. The relationship between phonation types and pitch was complex. The phonation types actually used were different from and more complex than those in traditional Kunqu Opera singing theory.  相似文献   

3.
Normalized amplitude quotient (NAQ) is presented as a method to parametrize the glottal closing phase using two amplitude-domain measurements from waveforms estimated by inverse filtering. In this technique, the ratio between the amplitude of the ac flow and the negative peak amplitude of the flow derivative is first computed using the concept of equivalent rectangular pulse, a hypothetical signal located at the instant of the main excitation of the vocal tract. This ratio is then normalized with respect to the length of the fundamental period. Comparison between NAQ and its counterpart among the conventional time-domain parameters, the closing quotient, shows that the proposed parameter is more robust against distortion such as measurement noise that make the extraction of conventional time-based parameters of the glottal flow problematic. Experiments with breathy, normal, and pressed vowels indicate that NAQ is also able to separate the type of phonation effectively.  相似文献   

4.
5.
Electroglottography is a common method for providing noninvasive measurements of glottal activity. The derivative of the electroglottographic signal, however, has not attracted much attention, although it yields reliable indicators of glottal closing instants. The purpose of this paper is to provide a guide to the usefulness of this signal. The main features that are to be found in this signal are presented on the basis of an extensive analysis of a database of items sung by 18 trained singers. Glottal opening and closing instants are related to peaks in the signal; the latter can be used to measure glottal parameters such as fundamental frequency and open quotient. In some cases, peaks are doubled or imprecise, which points to special (but by no means uncommon) glottal configurations. A correlation-based algorithm for the automatic measurement of fundamental frequency and open quotient using the derivative of electroglottographic signals is proposed. It is compared to three other electroglottographic-based methods with regard to the measurement of open quotient in inverse-filtered derived glottal flow. It is shown that agreement with the glottal-flow measurements is much better than most threshold-based measurements in the case of sustained sounds.  相似文献   

6.
Peter Murphy   《Journal of voice》2008,22(2):125-137
SUMMARY: An investigation of the effect of glottal source aperiodicities (jitter, shimmer, and aspiration noise) on the estimation of fundamental frequency (f0) perturbation and amplitude perturbation, of synthesized, glottal source and voiced speech waveforms, is considered. Firstly, 4, cycle-event f0 estimators are examined: (1) waveform matching of the low-pass filtered waveform, (2) positive peaks (PPs) from the speech waveform, (3) PPs from the low-pass filtered waveform, and (4) positive zero crossings from the low-pass filtered waveform. The analysis shows that f0 perturbation measures taken from the low-pass filtered waveform are affected by both amplitude perturbation and random glottal noise, whereas, f0 perturbation measures taken from the PPs of the original waveform are affected by noise but not by amplitude perturbation. It is shown for the low-pass filter methods that the effects of amplitude perturbation and noise lead to increased errors in the measurement of f0 perturbation for the synthesized speech waveforms when compared with the synthesized glottal waveforms. Shimmer of the synthesized speech waveform is approximately equal to shimmer of the synthesized glottal source. However, noise and jitter affect measures of amplitude perturbation. The estimation of f0 perturbation from the synthesized speech waveform is shown to be nonlinearly related to f0 perturbation estimation from the synthesized glottal waveform as a consequence of the filtering action of the vocal tract. Low-pass filtering the voiced speech waveform is shown to provide a partial solution to this problem.  相似文献   

7.
Voice source characteristics as derived from inverse filtering were analyzed in 6 country singers' speech and singing. Results showed that the closed quotient varied systematically with vocal loudness, and that glottal compliance (the ratio between transglottal AC volume displacement and subglottal pressure) decreased with increases in fundamental frequency but remained unaffected by vocal loudness. No striking differences were found in source characteristics between speech and singing within subjects. The degree of phonatory press, as judged by a panel of 19 expert listeners, appeared related to the range in which the singer was singing and to the sound pressure level gain from a doubling of subglottal pressure.  相似文献   

8.
In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed.  相似文献   

9.
Vocal fold vibratory asymmetry is often associated with inefficient sound production through its impact on source spectral tilt. This association is investigated in both a computational voice production model and a group of 47 human subjects. The model provides indirect control over the degree of left-right phase asymmetry within a nonlinear source-filter framework, and high-speed videoendoscopy provides in vivo measures of vocal fold vibratory asymmetry. Source spectral tilt measures are estimated from the inverse-filtered spectrum of the simulated and recorded radiated acoustic pressure. As expected, model simulations indicate that increasing left-right phase asymmetry induces steeper spectral tilt. Subject data, however, reveal that none of the vibratory asymmetry measures correlates with spectral tilt measures. Probing further into physiological correlates of spectral tilt that might be affected by asymmetry, the glottal area waveform is parameterized to obtain measures of the open phase (open/plateau quotient) and closing phase (speed/closing quotient). Subjects' left-right phase asymmetry exhibits low, but statistically significant, correlations with speed quotient (r=0.45) and closing quotient (r=-0.39). Results call for future studies into the effect of asymmetric vocal fold vibration on glottal airflow and the associated impact on voice source spectral properties and vocal efficiency.  相似文献   

10.
This study presents an approach to visualizing intensity regulation in speech. The method expresses a voice sample in a two-dimensional space using amplitude-domain values extracted from the glottal flow estimated by inverse filtering. The two-dimensional presentation is obtained by expressing a time-domain measure of the glottal pulse, the amplitude quotient (AQ), as a function of the negative peak amplitude of the flow derivative (d(peak)). The regulation of vocal intensity was analyzed with the proposed method from voices varying from extremely soft to very loud with a SPL range of approximately 55 dB. When vocal intensity was increased, the speech samples first showed a rapidly decreasing trend as expressed on the proposed AQ-d(peak) graph. When intensity was further raised, the location of the samples converged toward a horizontal line, the asymptote of a hypothetical hyperbola. This behavior of the AQ-d(peak) graph indicates that the intensity regulation strategy changes from laryngeal to respiratory mechanisms and the method chosen makes it possible to quantify how control mechanisms underlying the regulation of vocal intensity change gradually between the two means. The proposed presentation constitutes an easy-to-implement method to visualize the function of voice production in intensity regulation because the only information needed is the glottal flow wave form estimated by inverse filtering the acoustic speech pressure signal.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号