首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Modeling sources of listener variability in voice quality assessment is the first step in developing reliable, valid protocols for measuring quality, and provides insight into the reasons that listeners disagree in their quality assessments. This study examined the adequacy of one such model by quantifying the contributions of four factors to interrater variability: instability of listeners' internal standards for different qualities, difficulties isolating individual attributes in voice patterns, scale resolution, and the magnitude of the attribute being measured. One hundred twenty listeners in six experiments assessed vocal quality in tasks that differed in scale resolution, in the presence/absence of comparison stimuli, and in the extent to which the comparison stimuli (if present) matched the target voices. These factors accounted for 84.2% of the variance in the likelihood that listeners would agree exactly in their assessments. Providing listeners with comparison stimuli that matched the target voices doubled the likelihood that they would agree exactly. Listeners also agreed significantly better when assessing quality on continuous versus six-point scales. These results indicate that interrater variability is an issue of task design, not of listener unreliability.  相似文献   

3.
Although laser surgery has been widely advocated for use in the treatment of vocal fold papilloma because it does not incur bleeding, it has been questioned for use in treating Reinke's edema due to the possibility of heat dispersion to normal surrounding tissue and of scarring. We present a series of 8 cases in which laser surgery was the method of treatment for bilateral Reinke's edema. In each case, voice therapy was selected as the initial treatment; laser surgery was performed following voice therapy. Prior to and following surgery, videostroboscopic examinations were performed on the subjects. Only 4 subjects were available for assessment at the 1-month postoperative period. From the audio track of the videotape, the speaking fundamental frequency, perturbation measures for the vowel /i/, and noise-to-harmonic ratio of a completely voiced sentence were obtained. From the videostroboscopic recordings, the symmetry of the vocal folds, the presence or absence of the mucosal wave and the glottic closure pattern, prior to and after surgery, were judged independently by 3 examiners. The fundamental frequencies approximated the normal male and female ranges for those subjects seen 1 month after surgery. In addition, the noise-to-harmonic ratio and the relative average perturbation improved. Stroboscopy revealed irregularities in the symmetry of vocal folds, mucosal wave, and glottic closure 1 month after surgery.  相似文献   

4.
Although there has been continuing interest in voice quality, much of this research has focused on the vocal folds rather than the supraglottal structures. This paper reports the use of videoendoscopy for studying supraglottal participation in various singing tasks. In a preliminary study presented last year by the present authors, CT scanning was used to corroborate videoendoscopic observation. Vocal tract activities observed included variation of laryngeal height with pitch, variation of pharyngeal wall dimension with pitch and vowel, and marked supraglottic constriction with certain vocal imitations. In order to gain a better understanding of vocal training, and its effect upon vocal tract physiology, a study was designed using videoendoscopy to observe singers with significant experience and training while performing various vocal tasks. The tasks focused on the following: (1) vocal tract activity associated with pitch changes; (2) the physiology involved in the production of “cover”; (3) the structures involved in the production of vibrato; and (4) the physiology of the singer's “ring.” It would appear that videoendoscopy will become increasingly more valuable to the voice community as our understanding of vocal tract physiology improves.  相似文献   

5.
6.
The self-organizing map (a neural network) was applied to the spectral pattern recognition of voice quality in 34 subjects: 15 patients operated on because of insufficient glottal closure and 19 subjects not treated for voice disorders. The voice samples, segments of sustained /a/, were perceptually rated by six experts. A self-organized acoustic feature map was first computed from tokens of /a/ and then used for the analysis of the samples. The locations of the samples on the map were determined and the distances from a normal reference were compared with the perceptual ratings. The map locations corresponded to the degree of audible disorder: the samples judged as normal were overlapping or close to the normal reference, whereas the samples judged as dysphonie were located further away from it. The comparison of pre- and postoperative samples of the patients showed that the perceived improvement of voice quality was also detected by the map.  相似文献   

7.
Pitch is an important attribute of a musical sound. With it the melody of a song is established. With it the beauty of a voice is showcased. But how does pitch affect the perception of voice? Is it used to help to distinguish among voices or does it merely exist in the background, affecting the fine details of a voice but not radically altering the voice? The purpose of this paper is to review some of the evidence on the role of pitch in the perception of voice quality; specifically for the discrimination of one voice quality from another. The objective of the discussion is to understand how pitch affects our perception of voice quality and its importance to the perception of musical sound.  相似文献   

8.
9.
10.
11.
HearFones (HF) have been designed to enhance auditory feedback during phonation. This study investigated the effects of HF (1) on sound perceivable by the subject, (2) on voice quality in reading and singing, and (3) on voice production in speech and singing at the same pitch and sound level.

Test 1: Text reading was recorded with two identical microphones in the ears of a subject. One ear was covered with HF, and the other was free. Four subjects attended this test. Tests 2 and 3: A reading sample was recorded from 13 subjects and a song from 12 subjects without and with HF on. Test 4: Six females repeated [pa:p:a] in speaking and singing modes without and with HF on same pitch and sound level.

Long-term average spectra were made (Tests 1–3), and formant frequencies, fundamental frequency, and sound level were measured (Tests 2 and 3). Subglottic pressure was estimated from oral pressure in [p], and simultaneously electroglottography (EGG) was registered during voicing on [a:] (Test 4). Voice quality in speech and singing was evaluated by three professional voice trainers (Tests 2–4).

HF seemed to enhance sound perceivable at the whole range studied (0–8 kHz), with the greatest enhancement (up to ca 25 dB) being at 1–3 kHz and at 4–7 kHz. The subjects tended to decrease loudness with HF (when sound level was not being monitored). In more than half of the cases, voice quality was evaluated “less strained” and “better controlled” with HF. When pitch and loudness were constant, no clear differences were heard but closed quotient of the EGG signal was higher and the signal more skewed, suggesting a better glottal closure and/or diminished activity of the thyroarytenoid muscle.  相似文献   


12.
Previous studies have shown that trained listeners are highly reliable in making perceptual judgments of several parameters of normal and pathologic voices. This study investigated objective measures of acoustic characteristics of high and low preference voices as determined by previous perceptual study. Four acoustic parameters were measured including harmonics-to-noise ratio, autocorrelation function, average jitter, and the standard deviation of the fundamental frequency. Useful correlations between perceptual and measured results were identified. Normal voices differ from pathologic voices in terms of the acoustic-perceptual relationships.  相似文献   

13.
In dynamical motor theory, skill acquisition occurs as a modification of preexisting coordination patterns or attractor states. The purpose of this study was to assess how different levels of voice onset, voice quality, and fundamental frequency (F0) combine to form the attractor states common to voice motor control. Three levels of voice onset (glottal, simultaneous, and breathy), voice quality (modal speech, mixed, and falsetto), and fundamental frequency (low, mid, and high) were manipulated by vocally untrained, female subjects. Percent correct of acquisition trials and self-report of effort were used as measures of stable phonations indicative of an attractor state. Using intensity as a covariate, the results provided support for two of the three predicted triads representing attractor states in female speakers: (1) glottal onset/modal speech quality/low F0; and (2) breathy onset/falsetto quality/high F0. The results of this study suggest that certain parameters of voice motor control, such as onset, quality, and F0, exist as part of a dynamical system that can be identified and manipulated in voice motor acquisition and learning.  相似文献   

14.
This paper presents a parameter for objectively evaluating singing voice quality. Power spectrum of vowel sound / a / was analyzed by Fast Fourier Transform. The greatest harmonics peak between 2 and 4 kHz and the greatest harmonics peak between 0 and 2 kHz were identified. Power ratio of these peaks, termed singing power ratio (SPR), was calculated in 37 singers and 20 nonsingers. SPR of sung / a / in singers was significantly greater than in nonsingers. In singers, SPR of sung / a / was significantly greater than that of spoken / a /. By digital signal processing, power spectrum of sung / a / was varied, and the processed sounds were perceptually analyzed. SPR had a significant relationship with perceptual scores of “ringing” quality. SPR provides an important quantitative measurement for evaluating singing voice quality for all voice types, including soprano.  相似文献   

15.
To determine whether a correlation exists between the Grade, Roughness, Breathiness, Aesthenia, Strain (GRBAS) scale (a subjective measure of voice) and the Multi-Dimensional Voice Program (MDVP) scale (an objective measure of voice). A retrospective review of 37 voice patients (12 male/25 female) was conducted. Each voice was perceptually evaluated using the GRBAS scale by an experienced speech pathologist and acoustically analyzed using the MDVP scale. Statistical analysis using a multivariate regression model identified a significant correlation between the noise-related parameters of MDVP and the components of the GRBAS scale. Grade correlated with voice turbulence index (VTI), noise harmonic ratio (NHR), and soft phonation index (SPI). Roughness correlated with NHR only. Breathiness correlated with SPI only. Aesthenia also correlated with SPI only. Of the 19 acoustic variables measured by the MDVP system, only three noise parameters significantly correlated with the GRBAS perceptual voice analysis. Perhaps "noise" is the perceived acoustical quality of the dysphonic voice. A voice quantifying measure such as a "voice index score" could be proposed using the GRBAS scoring and the three clinically relevant MDVP values following further studies.  相似文献   

16.
Despite much research, the relationship between vocal acoustic signals and perceived voice quality is not well understood. The present study used an auditory model proposed by Moore et al10 to study how changes in the acoustic spectrum may relate to changes in perceptual ratings of breathiness. Perceptual ratings of breathiness were obtained using a multidimensional scaling (MDS) design. The stimulus distances on the dominant MDS dimension were correlated with several commonly used acoustic measures for voice quality. These distances were also compared with measures obtained from the output of the auditory model. Results show that the partial loudness of the harmonic energy obtained with the aspiration noise acting as a masker was the most important predictor of perceptual ratings of breathiness. Results also demonstrate that measures obtained from the auditory spectrum were better predictors of perceptual ratings of breathiness than were commonly used acoustic spectral measures.  相似文献   

17.
Previous research shows that listeners are sensitive to talker differences in phonetic properties of speech, including voice-onset-time (VOT) in word-initial voiceless stop consonants, and that learning how a talker produces one voiceless stop transfers to another word with the same voiceless stop [Allen, J. S., and Miller, J. L. (2004). J. Acoust. Soc. Am. 115, 3171-3183]. The present experiments examined whether transfer extends to words that begin with different voiceless stops. During training, listeners heard two talkers produce a given voiceless-initial word (e.g., pain). VOTs were manipulated such that one talker produced the voiceless stop with relatively short VOTs and the other with relatively long VOTs. At test, listeners heard a short- and long-VOT variant of the same word (e.g., pain) or a word beginning with a different voiceless stop (e.g., cane or coal), and were asked to select which of the two VOT variants was most representative of a given talker. In all conditions, which variant was selected at test was in line with listeners' exposure during training, and the effect was equally strong for the novel word and the training word. These findings suggest that accommodating talker-specific phonetic detail does not require exposure to each individual phonetic segment.  相似文献   

18.
19.
Seventeen healthy women, 45 to 61 years old, were examined using videofiberstroboscopy during phonation at three loudness levels. Two phoniatricians evaluated glottal closure using category and ratio scales. Transglottal airflow was studied by inverse filtering of the oral airflow signal recorded in a flow mask (Glottal Enterprises System) during the spoken phrase /ba:pa:pa:pa:p/ at three loudness levels. Subglottal pressure was estimated from the intraoral pressure during p occlusion. Running speech and the repeated /pa:/ syllables were perceptually evaluated by three speech pathologists regarding breathiness, hypo-, and hyperfunction, using continuous scales. Incomplete glottal closure was found in 35 of 46 phonations (76%). The degree of glottal closure increased significantly with raised loudness. Half of the women closed the glottis completely during loud phonation. Posterior glottal chink (PGC) was the most common gap configuration and was found in 28 of 46 phonations (61%). One third of the PGCs were in the cartilaginous glottis (PGCc) only. Two thirds extended into the membranous portion (PGCm); most of these occurred during soft phonation. Peak flow, peak-to-peak (AC) flow, and the maximum rate of change for the flow in the closing phase increased significantly with raised loudness. Minimum flow decreased significantly from normal to loud voice. Breathiness decreased with increased loudness. The results suggest that the incomplete closure patterns PGCc and PGCm during soft phonation ought primarily to be regarded as normal for Swedish women in this age group.  相似文献   

20.
将灰度图像的局部方差分布(QLS)作为表征图像结构信息的一个重要特征,对局部方差分布矩阵进行奇异值分解,计算得到相应的奇异值特征向量;通过计算降质图像与原参考图像局部方差矩阵奇异值特征向量的夹角大小度量两图像的结构相似度,实现了对降质图像的质量评价。实验结果表明:局部方差分布更能突出图像的结构特征,评价结果优于传统的均方误差(MSE)、峰值信噪比(PSNR)、结构相似度(SSIM)以及直接评价图像像素分布的奇异值分析(SVD)等方法,与人眼视觉感知效果的一致性较好。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号