首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Traditional interval or ordinal rating scale protocols appear to be poorly suited to measuring vocal quality. To investigate why this might be so, listeners were asked to classify pathological voices as having or not having different voice qualities. It was reasoned that this simple task would allow listeners to focus on the kind of quality a voice had, rather than how much of a quality it possessed, and thus might provide evidence for the validity of traditional vocal qualities. In experiment 1, listeners judged whether natural pathological voice samples were or were not primarily breathy and rough. Listener agreement in both tasks was above chance, but listeners agreed poorly that individual voices belonged in particular perceptual classes. To determine whether these results reflect listeners' difficulty agreeing about single perceptual attributes of complex stimuli, listeners in experiment 2 classified natural pathological voices and synthetic stimuli (varying in f0 only) as low pitched or not low pitched. If disagreements derive from difficulties dividing an auditory continuum consistently, then patterns of agreement should be similar for both kinds of stimuli. In fact, listener agreement was significantly better for the synthetic stimuli than for the natural voices. Difficulty isolating single perceptual dimensions of complex stimuli thus appears to be one reason why traditional unidimensional rating protocols are unsuited to measuring pathologic voice quality. Listeners did agree that a few aphonic voices were breathy, and that a few voices with prominent vocal fry and/or interharmonics were rough. These few cases of agreement may have occurred because the acoustic characteristics of the voices in question corresponded to the limiting case of the quality being judged. Values of f0 that generated listener agreement in experiment 2 were more extreme for natural than for synthetic stimuli, consistent with this interpretation.  相似文献   

2.
3.
The current study concerns speaking voice quality in two groups of professional voice users, teachers (n = 35) and actors (n = 36), representing trained and untrained voices. The voice quality of text reading at two intensity levels was acoustically analyzed. The central concept was the speaker's formant (SPF), related to the perceptual characteristics "better normal voice quality" (BNQ) and "worse normal voice quality" (WNQ). The purpose of the current study was to get closer to the origin of the phenomenon of the SPF, and to discover the differences in spectral and formant characteristics between the two professional groups and the two voice quality groups. The acoustic analyses were long-term average spectrum (LTAS) and spectrographical measurements of formant frequencies. At very high intensities, the spectral slope was rather quandrangular without a clear SPF peak. The trained voices had a higher energy level in the SPF region compared with the untrained, significantly so in loud phonation. The SPF seemed to be related to both sufficiently strong overtones and a glottal setting, allowing for a lowering of F4 and a closeness of F3 and F4. However, the existence of SPF also in LTAS of the WNQ voices implies that more research is warranted concerning the formation of SPF, and concerning the acoustic correlates of the BNQ voices.  相似文献   

4.

Objectives

The aim of this study was to look for visual subjective and objective parameters of vocal fold dynamics being capable of differentiating healthy from pathologic voices in daily clinical practice applying endoscopic high-speed digital imaging (HSI).

Study Design and Methods

Four hundred ninety-six datasets containing 80 healthy and 416 pathologic subjects (232 functional dysphonia (FD), 13 bilateral, and 171 unilateral vocal fold nerve paralysis) were analyzed retrospectively. Videos at 4000 Hz (256 × 256 pixel) were recorded during sustained phonation. Subjective parameters were visually evaluated and complemented by an analysis of objective parameters. Visual subjective parameters were mucosal wave, glottal closure type, glottal closure insufficiency (GI), asymmetries of the vocal folds, and phonovibrogram (PVG) symmetry. After image segmentation, objective parameters were computed: closed quotient, perturbation measures (PMs) of glottal area, and left-right asymmetry values.

Results

HSI evaluation enabled to distinguish healthy from pathologic voices. For visual subjective parameters, GI, symmetrical behavior, and PVG symmetry exhibited statistical significant differences. For 95% of the data, objective parameters could be computed. Among objective parameters, closed quotient, jitter, shimmer, harmonic-to-noise ratio, and signal-to-noise ratio for the glottal area function differentiated statistically significant normal from pathologic voices. Applying linear discriminant analysis by combining visual subjective and objective parameters, accurate classifications were made for 63.2% of the female and 87.5% of the male group for the three-class problem (healthy, FD, and unilateral vocal fold nerve paralysis).

Conclusion

Actual acoustically applied PMs can be transferred to clinical beneficial HSI analysis. Combining visual subjective and objective basic parameters succeeds in differentiating pathologic from healthy voices. The presented evaluation can easily be included into everyday clinical practice. However, further research is needed to broaden our understanding of the variability within and across healthy and pathologic vocal fold vibrations for diagnosing voice disorders and therapy control.  相似文献   

5.
This study investigates the relationship between rough voice and the presence of Subharmonics, which correspond to smaller yet distinct peaks located between two consecutive harmonic peaks in the power spectrum. Spectrum analysis was undertaken in 389 pathologic voices, of which 20 had subharmonics. Although all 20 voices had roughness perceptually, 8 had normal jitter and/or shimmer. The degree of roughness had a significant inverse relationship with the frequency of subharmonics. By digital signal processing, sound samples with various types of subharmonics were synthesized and perceptually analyzed. Power and frequency of subharmonics in the synthesized sound also had significant relationships with the degree of roughness. Rough voice is acoustically characterized not only by jitter and shimmer but also by the presence of subharmonics in the power spectrum. Subharmonics are important acoustic properties for objective evaluation of rough voices.  相似文献   

6.
Vocal training (VT) has, in part, been associated with the distinctions in the physiological, acoustic, and perceptual parameters found in singers' voices versus the voices of nonsingers. This study provides information on the changes in the singing voice as a function of VT over time. Fourteen college voice majors (12 females and 2 males; age range, 17–20 years) were recorded while singing, once a semester, for four consecutive semesters. Acoustic measures included fundamental frequency (F0) and sound pressure level (SPL) of the 10% and 90% levels of the maximum phonational frequency range (MPFR), vibrato pulses per second, vibrato amplitude variation, and the presence of the singer's formant. Results indicated that VT had a significant effect on the MPFR. F0 and SPL of the 90% level of the MPFR and the 90–10% range increased significantly as VT progressed. However, no vibrato or singers' formant differences were detected as a function of training. This longitudinal study not only validates previous cross-sectional research, ie, that VT has a significant effect on the singing voice, but also it demonstrates that these effects can be acoustically detected by the fourth semester of college vocal training.  相似文献   

7.
Voice-overs are professional voice users who use their voices to market products in the electronic media. The purposes of this study were to (1) analyze voice-overed and non-overed productions of an advertising text in two groups consisting of 10 male professional voice-overs and 10 male non-voice-overs; and (2) determine specific acoustic features of voice-over productions in both groups. A na?ve group of listeners were engaged for the perceptual analysis of the recorded advertising text. The voice-overed production samples from both groups were submitted for analysis of acoustic and temporal features. The following parameters were analyzed: (1) the total text length, (2) the length of the three emphatic pauses, (3) values of the mean, (4) minimum, (5) maximum fundamental frequency, and (6) the semitone range. The majority of voice-overs and non-voice-overs were correctly identified by the listeners in both productions. However voice-overs were more consistently correctly identified than non-voice-overs. The total text length was greater for voice-overs. The pause time distribution was statistically more homogeneous for the voice-overs. The acoustic analysis indicated that the voice-overs had lower values of mean, minimum, and maximum fundamental frequency and a greater range of semitones. The voice-overs carry the voice-overed production features to their non-voice-overed production.  相似文献   

8.
This study investigated selected acoustic cues in the speaking voices of five professional singers; cues that may have enabled na?ve listeners to differentiate them from nonsingers and other trained singers who were not consistently identified from their speaking voices. Subjects were divided into three groups based on listeners' perceptual judgments. Group I, the identified singers, consisted of five professional singers, three males and two females, with an average identification score, from their speaking utterances, of 79%. Group II, the unidentified singers, consisted of 15 trained singers, seven males and eight females, who, as a group, were identified correctly from their speaking utterances only 52% of the time. Group III consisted of 20 nonsingers who were incorrectly identified from their speaking utterances as singers only 36% of the time, that is, they were correctly identified as nonsingers from their speech 64% of the time. Acoustic parameters chosen for measurement from vowel productions were: (1) percent jitter, (2) percent shimmer, and (3) noise-to-harmonic ratio. The second sentence of the "Rainbow Passage" was selected to compare several frequency and duration measures between the three groups. These were: (1) mean speaking fundamental frequency, (2) standard deviation of the fundamental frequency, (3) sentence duration, (4) word duration, and (5) consonant/vowel ratio. The data indicated that the acoustic parameters that most consistently distinguished the identified singers from the unidentified singers and the nonsingers were fundamental frequency variation and durational differences. The identified singers varied their speaking fundamental frequency significantly more than did both the unidentified singers and the nonsingers. The identified singers also had longer vocalic segments than did the others.  相似文献   

9.
Values for acoustic voice measurements were obtained from 88 normal individuals and 98 pathological cases of mass lesions of vocal fold and 50 cases of unilateral vocal fold paralysis. Overall, all items reflecting perturbations of pitch and amplitude as well as glottal noise were significantly higher in the groups of patients compared with the normal group. The measurement of normalized noise energy (NNE) was found to be an optimum parameter for discrimination of normal/abnormal voices. The voices of patients with vocal fold nodules and vocal fold polyps were analyzed before endolaryngeal phonomicrosurgery (EPM) and 2 weeks after. Statistically significant (p < 0.01) improvement was achieved both in perceptual and acoustic analysis. EPM resulted in a significant decrease of mean jitter, shimmer, and NNE. Clinically, these measures provided documentable and measurable evidence of vocal function and were helpful for comparing patients with normal speakers. They also were useful for a thorough documentation of patient's voice pathology and for evaluation of the presurgical and postsurgical voice status.  相似文献   

10.
SUMMARY: In recent years, the multiparametric approach for evaluating perceptual rating of voice quality has been advocated. This study evaluates the accuracy of predicting perceived overall severity of voice quality with a minimal set of aerodynamic, voice range profile (phonetogram), and acoustic perturbation measures. One hundred and twelve dysphonic persons (93 women and 19 men) with laryngeal pathologies and 41 normal controls (35 women and six men) with normal voices participated in this study. Perceptual severity judgement was carried out by four listeners rating the G (overall grade) parameter of the GRBAS scale. The minimal set of instrumental measures was selected based on the ability of the measure to discriminate between dysphonic and normal voices, and to attain at least a moderate correlation with perceived overall severity. Results indicated that perceived overall severity was best described by maximum phonation time of sustained /a/, peak intraoral pressure of the consonant-vowel /pi/ strings production, voice range profile area, and acoustic jitter. Direct-entry discriminant function analysis revealed that these four voice measures in combination correctly predicted 67.3% of perceived overall severity levels.  相似文献   

11.
This project was undertaken to provide information about the sexual characteristics of preadolescent children's voices. In one series of experiments, perceptual judgments of sexual identity were obtained in response to 73 children's productions of isolated whispered and normally phonated vowels, normally spoken sentences, and sentences spoken in a monotonous fashion (Bennett and Weinberg, 1978). The purpose of this portion of the project was to describe certain acoustic and temporal characteristics of these children's speech samples, and to assess the relationship of these variables to perceptual judgments of sexual identity. Sexual differences in the frequency location of vocal tract resonances were significantly correlated with listener judgments of child sex in all four utterance conditions. The origin of the observed differences in vocal tract resonance characteristics is discussed with reference to possible sexual differences in vocal tract size as well as certain articulatory behaviors. Average fundamental frequency was significantly related to listeners' sex identifications in two utterance conditions. However, the influence of this variable was considerably less pronounced when compared to vocal tract information. Although certain measures of fundamental frequency variability (mean duration of level inflections and the rate of frequency change associated with upward shifts) were significantly related to perceptual measures of sexual identity, these cues were also interpreted to play a secondary role in defining maleness and femaleness in these children's voices.  相似文献   

12.
The purpose was to determine the clinical value of a multiparametric objective voice evaluation protocol including acoustic and aerodynamic parameters measured mainly on a sustained /a/. This was done by comparison with perceptual analysis of continuous speech by a jury composed of 6 experienced listeners. Voice samples (continuous speech) from 63 male patients with dysphonia and 21 control subjects with normal voices were recorded and assesed by a jury of listeners. The jury was instructed to classify voice samples according to the G (overall dysphonia) component of the GRBAS score on a 4-point scale ranging from 0 for normal to 3 for severe dysphonia. Objective parameters were recorded on an EVA® workstation. As usual with this type of system, parameters were measured mainly on a sustained /a/. Measured parameters included fundamental frequency (F0), intensity, jitter, shimmer, signal-to-noise ratio, Lyapunov coefficient (LC), oral airflow (OAF), maximum phonatory time (MPT), and vocal range (range). Estimated subglottic pressure (ESGP) was determined on a series of /pa/. Discriminant analysis was performed to detect correlation between jury classification and combinations of parameters. Results showed that a nonlinear combination of only six parameters (range, LC, ESGP, MPT, signal-to-noise ratio, and F0) allowed 86% concordance with jury classification. Discussion deals with the relative importance of the different objective parameters for discriminant analysis. Special emphasis is placed on two measurements rarely made in routine clinical workup, i.e., estimated subglottic pressure and Lyapunov coefficient.  相似文献   

13.
The speech signal may be divided into frequency bands, each containing temporal properties of the envelope and fine structure. For maximal speech understanding, listeners must allocate their perceptual resources to the most informative acoustic properties. Understanding this perceptual weighting is essential for the design of assistive listening devices that need to preserve these important speech cues. This study measured the perceptual weighting of young normal-hearing listeners for the envelope and fine structure in each of three frequency bands for sentence materials. Perceptual weights were obtained under two listening contexts: (1) when each acoustic property was presented individually and (2) when multiple acoustic properties were available concurrently. The processing method was designed to vary the availability of each acoustic property independently by adding noise at different levels. Perceptual weights were determined by correlating a listener's performance with the availability of each acoustic property on a trial-by-trial basis. Results demonstrated that weights were (1) equal when acoustic properties were presented individually and (2) biased toward envelope and mid-frequency information when multiple properties were available. Results suggest a complex interaction between the available acoustic properties and the listening context in determining how best to allocate perceptual resources when listening to speech in noise.  相似文献   

14.
15.
This study explored whether acoustic and perceptual features could distinguish comfortable from maximally projected acting voice. Thirteen professional male actors performed a passage from William Shakespeare's Julius Caesar twice. The first delivery used their comfortably projected voices, whereas the second used maximal projection. Acoustic measures, expert ratings, and self-ratings of projection and voice quality were investigated. Long-term average spectra (LTAS) and sound pressure level (SPL) analyses were conducted. Perceptual variables included projection, breathiness, roughness, and strain. When comparing the intensity difference between the higher (2-4 kHz) and lower (0-2 kHz) regions of the spectrum in voice samples from the maximal projected condition, LTAS analyses demonstrated increased acoustic energy in the higher part of the spectrum. This LTAS pattern was not as evident in the comfortable projected condition. These findings offered some preliminary support for the existence of an actor's formant (prominent peak in the upper part of the spectrum) during maximal projection.  相似文献   

16.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

17.
There is size information in natural sounds. For example, as humans grow in height, their vocal tracts increase in length, producing a predictable decrease in the formant frequencies of speech sounds. Recent studies have shown that listeners can make fine discriminations about which of two speakers has the longer vocal tract, supporting the view that the auditory system discriminates changes on the acoustic-scale dimension. Listeners can also recognize vowels scaled well beyond the range of vocal tracts normally experienced, indicating that perception is robust to changes in acoustic scale. This paper reports two perceptual experiments designed to extend research on acoustic scale and size perception to the domain of musical sounds: The first study shows that listeners can discriminate the scale of musical instrument sounds reliably, although not quite as well as for voices. The second experiment shows that listeners can recognize the family of an instrument sound which has been modified in pitch and scale beyond the range of normal experience. We conclude that processing of acoustic scale in music perception is very similar to processing of acoustic scale in speech perception.  相似文献   

18.
19.
《Journal of voice》2019,33(6):838-845
BackgroundA limited number of experiments have investigated the perception of strain compared to the voice qualities of breathiness and roughness despite its widespread occurrence in patients who have hyperfunctional voice disorders, adductor spasmodic dysphonia, and vocal fold paralysis among others.ObjectiveThe purpose of this study is to determine the perceptual basis of strain through identification and exploration of acoustic and psychoacoustic measures.MethodsTwelve listeners evaluated the degree of strain for 28 dysphonic phonation samples on a five-point rating scale task. Computational estimates based on cepstrum, sharpness, and spectral moments (linear and transformed with auditory processing front-end) were correlated to the perceptual ratings.ResultsPerceived strain was strongly correlated with cepstral peak prominence, sharpness, and a subset of the spectral metrics. Spectral energy distribution measures from the output of an auditory processing front-end (ie, excitation pattern and specific loudness pattern) accounted for 77–79% of the model variance for strained voices in combination with the cepstral measure.ConclusionsModeling the perception of strain using an auditory front-end prior to acoustic analysis provides better characterization of the perceptual ratings of strain, similar to our prior work on breathiness and roughness. Results also provide evidence that the sharpness model of Fastl and Zwicker (2007) is one of the strong predictors of strain perception.  相似文献   

20.
The MPEG-1 Layer 3 compression schema of audio signal, commonly known as mp3, has caused a great impact in recent years as it has reached high compression rates while conserving a high sound quality. Music and speech samples compressed at high bitrates are perceptually indistinguishable from the original samples, but very little was known about how compression acoustically affects the voice signal. A previous work with normal voices showed a high fidelity at high-bitrate compressions both in voice parameters and the amplitude-frequency spectrum. In the present work, dysphonic voices were tested through two studies. In the first study, spectrograms, long-term average spectra (LTAS), and fast Fourier transform (FFT) spectra of compressed and original samples of running speech were compared. In the second study, intensities, formant frequencies, formant bandwidths, and a multidimensional set of voice parameters were tested in a set of sustained phonations. Results showed that compression at high bitrates (96 and 128 kbps) preserved the relevant acoustic properties of the pathological voices. With compressions at lower bitrates, fidelity decreases, introducing some important alterations. Results from both works, Gonzalez and Cervera and this paper, open up the possibility of using MPEG-compression at high bitrates to store or transmit high-quality speech recordings, without altering their acoustic properties.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号