首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

2.
An assessment of vocal impairment is presented for separating healthy people from persons with early untreated Parkinson's disease (PD). This study's main purpose was to (a) determine whether voice and speech disorder are present from early stages of PD before starting dopaminergic pharmacotherapy, (b) ascertain the specific characteristics of the PD-related vocal impairment, (c) identify PD-related acoustic signatures for the major part of traditional clinically used measurement methods with respect to their automatic assessment, and (d) design new automatic measurement methods of articulation. The varied speech data were collected from 46 Czech native speakers, 23 with PD. Subsequently, 19 representative measurements were pre-selected, and Wald sequential analysis was then applied to assess the efficiency of each measure and the extent of vocal impairment of each subject. It was found that measurement of the fundamental frequency variations applied to two selected tasks was the best method for separating healthy from PD subjects. On the basis of objective acoustic measures, statistical decision-making theory, and validation from practicing speech therapists, it has been demonstrated that 78% of early untreated PD subjects indicate some form of vocal impairment. The speech defects thus uncovered differ individually in various characteristics including phonation, articulation, and prosody.  相似文献   

3.
Effects of noise on speech production: acoustic and perceptual analyses   总被引:4,自引:0,他引:4  
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.  相似文献   

4.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

5.
Acoustic cues related to the voice source, including harmonic structure and spectral tilt, were examined for relevance to prosodic boundary detection. The measurements considered here comprise five categories: duration, pitch, harmonic structure, spectral tilt, and amplitude. Distributions of the measurements and statistical analysis show that the measurements may be used to differentiate between prosodic categories. Detection experiments on the Boston University Radio Speech Corpus show equal error detection rates around 70% for accent and boundary detection, using only the acoustic measurements described, without any lexical or syntactic information. Further investigation of the detection results shows that duration and amplitude measurements, and, to a lesser degree, pitch measurements, are useful for detecting accents, while all voice source measurements except pitch measurements are useful for boundary detection.  相似文献   

6.
A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.  相似文献   

7.
The value of any measure of voice production is dependent on its repeatability over time. The purpose of the present study was to determine the consistency of selected acoustic and aerodynamic measures of voice production over 28 days, under various test/retest conditions. Three groups of healthy young adult females sustained three vowels at comfortable, high, and low pitch levels. Subjects in Group 1 chose their own intensity levels, but matched the fundamental frequencies produced at Test 1 during Test 2. Group 2 controlled intensity levels during both tests, but fundamental frequency was free to vary. Group 3 controlled both intensity and fundamental frequency. Measures of fundamental frequency, jitter, maximum phonation time, phonation volume, and flow rate were compared. Subjects who matched both fundamental frequency and intensity showed repeatable, consistent results for all measures during both tests. Controlling intensity but not fundamental frequency resulted in statistically significant differences in fundamental frequency at comfortable and high pitches, but there was minimal effect on other variables. Controlling fundamental frequency but not intensity led to the most inconsistency between tests, affecting both acoustic and aerodynamic measures. Results underscore the need to control the conditions under which measures are obtained.  相似文献   

8.
9.
It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.  相似文献   

10.
11.
A time-domain model of sound wave propagation in the branching airways of the subglottal system is presented. The model is formulated as an extension to an acoustic transmission-line modeling scheme originally developed for simulating the supraglottal system in the time-domain during speech production [Maeda (1982). Speech Commun. 1, 199-229; Mokhtari et al. (2008). Speech Commun. 50, 179-190]. The approach allows for predictions of time-varying acoustic pressure and volume velocity at any point along the various generations of subglottal airways from trachea to alveoli. In addition, the model can be configured so that its overall structure simulates different geometric forms, including airways that branch in a symmetric or asymmetric pattern. Three subglottal configurations, two symmetric and one asymmetric, were represented based on reported anatomical dimensions of the subglottal airways. Estimates of the acoustic input impedances of these subglottal configurations revealed resonant characteristics similar to those found in the previous studies. Simulations of voiced sound propagation into the subglottal airways, achieved by coupling the subglottal model to a two-mass vocal fold model and a supraglottal tract configured for different vowels, yielded predictions of time-domain sound pressure waveforms below the vocal folds that compare favorably to previous measurements in human subjects.  相似文献   

12.
Laryngeal aerodynamic and acoustic characteristics of African American voice production were examined from vowel samples produced by ten adult female and ten adult male speakers. The data were compared with that for a control group consisting of ten adult female and ten adult male White speakers, matched for age, height, and weight. All measures were analyzed using Cspeech 4.0. Aerodynamic measurements, extracted from a glottal airflow waveform, included maximum flow declination rate, alternating glottal airflow, minimum glottal airflow, and airflow open quotient. Acoustic measures included fundamental frequency and sound pressure level. No significant mean differences between the African American and White speakers were found, except for maximum-flow declination rate. The White speakers produced significantly higher declination rates than the African American speakers. The factor of sex for the African American speakers was statistically significant for the measures of maximum-flow declination rate, alternating glottal airflow, open quotient, and fundamental frequency, consistent with the functioning of the White speakers. The results suggest that during vowel production, where the vocal tract is in a fairly static position, acoustic and aerodynamic characteristics for African American and White Speakers are comparable.  相似文献   

13.
An enhanced linear acoustic tomograph is designed for the early medical diagnosis of pathological neoplasms of soft biological tissues. Problems associated with additional prospects for tomographic imaging are discussed.  相似文献   

14.
15.
Twenty-four normal adult women read part of the Rainbow Passage and sustained vowels three trials each. Utterances were assessed for selected parameters measured by Visi-Pitch (average and SD of fundamental frequency (F0), average and SD of dBA, perturbation, and percent voiced/unvoiced/pause). Assessment of each parameter included measures of central tendency, dispersion, and distribution characteristics (skewness and kurtosis) of the data and of the ranges of values that would include 95% of the scores (95% fiduciary limits). Generally, differences for the group between the three trials were not significant. Intersubject variability for only a few parameters was less than 20% of the parameter's mean. For vowels, variability of jitter was 30–48% of the mean. Eight subjects provided performances 2 months later to obtain an estimate of intrasubject variability over time. There were desirable intrasubject correlations between performances for mean F0, jitter in reading and on vowels /i/ and /a/, and percent of voicing. Inter- and intrasubject variability seems restricted and the data appear to resemble a normally distributed function for mean F0 on reading, jitter on /i/, and percent of voicing. Thus, these parameters may have statistical merit for use in vocal testing.  相似文献   

16.
Reiterant speech, or nonsense syllable mimicry, has been proposed as a way to study prosody, particularly syllable and word durations, unconfounded by segmental influences. Researchers have shown that segmental influences on durations can be neutralized in reiterant speech. If it is to be a useful tool in the study of prosody, it must also be shown that reiterant speech preserves the suprasegmental duration and intonation differences relevant to perception. In the present study, syllable durations for nonreiterant and reiterant ambiguous sentences were measured to seek evidence of the duration differences which can enable listeners to resolve surface structure ambiguities in nonreiterant speech. These duration patterns were found in both nonreiterant and reiterant speech. A perceptual study tested listeners' perception of these ambiguous sentences as spoken by four "good" speakers--speakers who neutralized intrinsic duration differences and whose sentences were independently rated by skilled listeners as good imitations of normal speech. The listeners were able to choose the correct interpretation when the ambiguous sentences were in reiterant form as well as they did when the sentences were spoken normally. These results support the notion that reiterant speech is like nonreiterant speech in aspects which are important in the study of prosody.  相似文献   

17.
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.  相似文献   

18.
Three alternative speech coding strategies suitable for use with cochlear implants were compared in a study of three normally hearing subjects using an acoustic model of a multiple-channel cochlear implant. The first strategy (F2) presented the amplitude envelope of the speech and the second formant frequency. The second strategy (F0 F2) included the voice fundamental frequency, and the third strategy (F0 F1 F2) presented the first formant frequency as well. Discourse level testing with the speech tracking method showed a clear superiority of the F0 F1 F2 strategy when the auditory information was used to supplement lipreading. Tracking rates averaged over three subjects for nine 10-min sessions were 40 wpm for F2, 52 wpm for F0 F2, and 66 wpm for F0 F1 F2. Vowel and consonant confusion studies and a test of prosodic information were carried out with auditory information only. The vowel test showed a significant difference between the strategies, but no differences were found for the other tests. It was concluded that the amplitude and duration cues common to all three strategies accounted for the levels of consonant and prosodic information received by the subjects, while the different tracking rates were a consequence of the better vowel recognition and the more natural quality of the F0 F1 F2 strategy.  相似文献   

19.
To determine whether a correlation exists between the Grade, Roughness, Breathiness, Aesthenia, Strain (GRBAS) scale (a subjective measure of voice) and the Multi-Dimensional Voice Program (MDVP) scale (an objective measure of voice). A retrospective review of 37 voice patients (12 male/25 female) was conducted. Each voice was perceptually evaluated using the GRBAS scale by an experienced speech pathologist and acoustically analyzed using the MDVP scale. Statistical analysis using a multivariate regression model identified a significant correlation between the noise-related parameters of MDVP and the components of the GRBAS scale. Grade correlated with voice turbulence index (VTI), noise harmonic ratio (NHR), and soft phonation index (SPI). Roughness correlated with NHR only. Breathiness correlated with SPI only. Aesthenia also correlated with SPI only. Of the 19 acoustic variables measured by the MDVP system, only three noise parameters significantly correlated with the GRBAS perceptual voice analysis. Perhaps "noise" is the perceived acoustical quality of the dysphonic voice. A voice quantifying measure such as a "voice index score" could be proposed using the GRBAS scoring and the three clinically relevant MDVP values following further studies.  相似文献   

20.
Laryngeal framework surgery can change the position and tensionof the vocal folds safely without direct surgical intervention in the vocal fold proper. Some 23 years of experience with phonosurgery have proved its usefulness in treating dysphonia related to unilateral vocal fold paralysis, vocal fold atrophy, and pitch-related dysphonias. Meanwhile, much information about the mechanism of voice production has been obtained through intraoperative findings of voice and fiberscopic examination of the larynx . Based on such knowledge together with information obtained through model experiments, the human vocal organ was reconsidered mainly from the mechanical view point, and the roles of voice therapy and singing pedagogy were discussed in relation to phonosurgery. The vocal organ may not be an ideal musical organ and is rather vulnerable, but its potential is enormous.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号