首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 500 毫秒
1.
Computer models of the process of speech articulation require a detailed knowledge of the vocal tract configurations employed in speech and the application of acoustic theory to calculate the sound waveform. Almost all currently available data on vocal tract dimensions come from x-ray films and are severely limited in quantity and coherence due to restrictions on radiation dosage and intersubject differences. We are using MRI techniques to obtain the pharyngeal dimensions of speakers producing sustained vowels. The fact that MRI does not employ ionizing radiation provides speech research with the opportunity to obtain comprehensive bodies of much-needed data on the articulatory characteristics of single subjects.  相似文献   

2.
3.
Over the last few decades, researchers have been investigating the mechanisms involved in speech production. Image analysis can be a valuable aid in the understanding of the morphology of the vocal tract. The application of magnetic resonance imaging to study these mechanisms has been proven to be reliable and safe. We have applied deformable models in magnetic resonance images to conduct an automatic study of the vocal tract; mainly, to evaluate the shape of the vocal tract in the articulation of some European Portuguese sounds, and then to successfully automatically segment the vocal tract's shape in new images. Thus, a point distribution model has been built from a set of magnetic resonance images acquired during artificially sustained articulations of 21 sounds, which successfully extracts the main characteristics of the movements of the vocal tract. The combination of that statistical shape model with the gray levels of its points is subsequently used to build active shape models and active appearance models. Those models have then been used to segment the modeled vocal tract into new images in a successful and automatic manner. The computational models have thus been revealed to be useful for the specific area of speech simulation and rehabilitation, namely to simulate and recognize the compensatory movements of the articulators during speech production.  相似文献   

4.
A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers.  相似文献   

5.
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.  相似文献   

6.
This study aimed to assess quantitatively the effect of bilateral subthalamic nucleus (STN) stimulation and medication on hypokinetic parkinsonian dysarthria. Twelve Italian patients (11 males and 1 female) with idiopathic Parkinson's disease (mean age 60.29+/-7.50 years) and bilateral STN implantation were studied. Neurological assessments and acoustic recordings were performed in four clinical conditions combining stimulation and medication to assess the degree of motor disabilities and speech impairment. Acoustic analysis was performed by means of the Multidimensional Voice Program and the Advanced Motor Speech Profile (Kay Elemetrics, Lincoln Park, NJ). None of the evaluated parameters deteriorated after STN deep brain stimulation. STN stimulation significantly improved motor performances and vocal tremor and provided a major stability to glottal vibration. Effect of stimulation on these parameters was superior to that of levodopa. No significant variations were observed in perceptual evaluation and in acoustic parameters related to prosody, articulation, and intensity after either stimulation or medication. The improvement of acoustic parameters related to glottal vibration and voice tremor was not accompanied by a substantial effect on speech intelligibility. STN stimulation was more effective on global motor limb dysfunctions than on dysarthria, but we did not report negative consequences on speech.  相似文献   

7.
SUMMARY: The aim of this study was to investigate how different acoustic parameters, extracted both from speech pressure waveforms and glottal flows, can be used in measuring vocal loading in modern working environments and how these parameters reflect the possible changes in the vocal function during a working day. In addition, correlations between objective acoustic parameters and subjective voice symptoms were addressed. The subjects were 24 female and 8 male customer-service advisors, who mainly use telephone during their working hours. Speech samples were recorded from continuous speech four times during a working day and voice symptom questionnaires were completed simultaneously. Among the various objective parameters, only F0 resulted in a statistically significant increase for both genders. No correlations between the changes in objective and subjective parameters appeared. However, the results encourage researchers within the field of occupational voice use to apply versatile measurement techniques in studying occupational voice loading.  相似文献   

8.
We evaluated acoustic voice characteristics of 18 male patients undergoing radiotherapy. The subjects were seen for voice assessment preradiotherapy and at 1 month, 3 months, 6 months, and 1 year following radiotherapy. A multidimensional voice analysis computer program (IVANS, Avaaz Innovations, 1998) was employed to evaluate measures of traditional frequency and amplitude perturbation as well as time-based and linear prediction (LP) modeled "noise" parameters of the acoustic output in conjunction with perceptual judgments of overall vocal quality. The results indicate vocal deterioration of vocal function immediately following radiotherapy with gradual and significant improvement in acoustic and perceptual features over 9 to 12 months following the radiation treatment. Measures of glottal noise demonstrated higher sensitivity than frequency-based measures of voice perturbation, and with more consistent, less variable changes in acoustical voice output from the preradiation to the 12 month postradiation periods. Future research evaluating vowel type and acoustic perturbation measures with a larger sample of subjects over a longer time period seems warranted.  相似文献   

9.
Information about the acoustic properties of a talker's voice is available in optical displays of speech, and vice versa, as evidenced by perceivers' ability to match faces and voices based on vocal identity. The present investigation used point-light displays (PLDs) of visual speech and sinewave replicas of auditory speech in a cross-modal matching task to assess perceivers' ability to match faces and voices under conditions when only isolated kinematic information about vocal tract articulation was available. These stimuli were also used in a word recognition experiment under auditory-alone and audiovisual conditions. The results showed that isolated kinematic displays provide enough information to match the source of an utterance across sensory modalities. Furthermore, isolated kinematic displays can be integrated to yield better word recognition performance under audiovisual conditions than under auditory-alone conditions. The results are discussed in terms of their implications for describing the nature of speech information and current theories of speech perception and spoken word recognition.  相似文献   

10.
Although listeners routinely perceive both the sex and individual identity of talkers from their speech, explanations of these abilities are incomplete. Here, variation in vocal production-related anatomy was assumed to affect vowel acoustics thought to be critical for indexical cueing. Integrating this approach with source-filter theory, patterns of acoustic parameters that should represent sex and identity were identified. Due to sexual dimorphism, the combination of fundamental frequency (F0, reflecting larynx size) and vocal tract length cues (VTL, reflecting body size) was predicted to provide the strongest acoustic correlates of talker sex. Acoustic measures associated with presumed variations in supralaryngeal vocal tract-related anatomy occurring within sex were expected to be prominent in individual talker identity. These predictions were supported by results of analyses of 2500 tokens of the /epsilon/ phoneme, extracted from the naturally produced speech of 125 subjects. Classification by talker sex was virtually perfect when F0 and VTL were used together, whereas talker classification depended primarily on the various acoustic parameters associated with vocal-tract filtering.  相似文献   

11.

Background

Dysarthria often is an early and prominent clinical feature of progressive supranuclear palsy (PSP). Based on perceptual analyses, speech impairment in PSP reportedly consists of prominent hypokinetic and spastic components with occasional ataxic features.

Objective

To measure objectively and quantitatively different speech parameters in PSP as compared with Parkinson's disease (PD) by acoustical analysis and to correlate these parameters with disease duration, global motor, and speech impairment and with the subtype of disease (Richardson's syndrome [RS] vs parkinsonian type of PSP [PSP-P]).

Patients and Methods

Twenty-six patients with clinical diagnosis of PSP (n = 14 classified as RS and n = 12 classified as PSP-P) and 30 age- and gender-matched patients with clinical diagnosis of PD were tested. Speech examination was based on the acoustical analysis of a standardized four-sentence reading task. Several speech variables were measured to assess phonation, intonation variability, speech velocity, and articulatory precision. All participants were tested according to Unified Parkinson's Disease Rating Scale/Motor Score (UPDRS-III) and staged according to Hoehn and Yahr stages. Global speech intelligibility was evaluated on the basis of the UPDRS-III speech item.

Results

In the PSP group, speech velocity, intonation variability, and the fraction of intraword pauses as a measure of articulatory precision were significantly reduced, whereas the percentage of speech pauses was prolonged as compared with the PD group. Only in the male PSP patients, vowel articulation was found to be impaired. Global speech performance was worse in the PSP group in comparison with the PD group and showed a correlation to some distinct speech dimensions. No differences of speech variables were seen between RS and PSP-P patients.

Conclusions

PSP patients feature a mixed type of dysarthria with hypokinetic and spastic components that differ significantly from the speech performance of PD speakers. This probably reflects the widespread neuropathological changes in PSP comprising basal ganglia as well as pontine and further brainstem regions.  相似文献   

12.

Background  

The cortical activity underlying the perception of vowel identity has typically been addressed by manipulating the first and second formant frequency (F1 & F2) of the speech stimuli. These two values, originating from articulation, are already sufficient for the phonetic characterization of vowel category. In the present study, we investigated how the spectral cues caused by articulation are reflected in cortical speech processing when combined with phonation, the other major part of speech production manifested as the fundamental frequency (F0) and its harmonic integer multiples. To study the combined effects of articulation and phonation we presented vowels with either high (/a/) or low (/u/) formant frequencies which were driven by three different types of excitation: a natural periodic pulseform reflecting the vibration of the vocal folds, an aperiodic noise excitation, or a tonal waveform. The auditory N1m response was recorded with whole-head magnetoencephalography (MEG) from ten human subjects in order to resolve whether brain events reflecting articulation and phonation are specific to the left or right hemisphere of the human brain.  相似文献   

13.
Mammalian vocal production mechanisms are still poorly understood despite their significance for theories of human speech evolution. Particularly, it is still unclear to what degree mammals are capable of actively controlling vocal-tract filtering, a defining feature of human speech production. To address this issue, a detailed acoustic analysis on the alarm vocalization of free-ranging Diana monkeys was conducted. These vocalizations are especially interesting because they convey semantic information about two of the monkeys' natural predators, the leopard and the crowned eagle. Here, vocal tract and sound source parameter in Diana monkey alarm vocalizations are described. It is found that a vocalization-initial formant downward transition distinguishes most reliably between eagle and leopard alarm vocalization. This finding is discussed as an indication of articulation and alternatively as the result of a strong nasalization effect. It is suggested that the formant modulation is the result of active vocal filtering used by the monkeys to encode semantic information, an ability previously thought to be restricted to human speech.  相似文献   

14.
Quantifiable aspects of vocal fold vibration may be inferred by means of the electrolaryngograph. Changes in inferred vocal fold closed quotient are considered as a possible correlate of acoustic efficiency variation; automatic measures of closed quotient compare favorably with results obtained from inverse filtering of the speech pressure waveform. This article describes closed quotient measures based on electrolaryngographic analysis of 18 trained and untrained men singers, and results show a significant difference in mean vocal fold closed quotient between trained and untrained singers.  相似文献   

15.
This study focuses on the extraction of robust acoustic cues of labial and alveolar voiceless obstruents in German and their acoustic differences in the speech signal to distinguish them in place and manner of articulation. The investigated obstruents include the affricates [pf] and [ts], the fricatives [f] and [s] and the stops [p] and [t]. The target sounds were analyzed in word-initial and word-medial positions. The speech data for the analysis were recorded in a natural environment, deliberately containing background noise to extract robust cues only. Three methods of acoustic analysis were chosen: (1) temporal measurements to distinguish the respective obstruents in manner of articulation, (2) static spectral characteristics in terms of logarithmic distance measure to distinguish place of articulation, and (3) amplitudinal analysis of discrete frequency bands as a dynamic approach to place distinction. The results reveal that the duration of the target phonemes distinguishes these in manner of articulation. Logarithmic distance measure, as well as relative amplitude analysis of discrete frequency bands, identifies place of articulation. The present results contribute to the question, which properties are robust with respect to variation in the speech signal.  相似文献   

16.
The purpose of this study was to explore the potential advantages, both theoretical and applied, of preserving low-frequency acoustic hearing in cochlear implant patients. Several hypotheses are presented that predict that residual low-frequency acoustic hearing along with electric stimulation for high frequencies will provide an advantage over traditional long-electrode cochlear implants for the recognition of speech in competing backgrounds. A simulation experiment in normal-hearing subjects demonstrated a clear advantage for preserving low-frequency residual acoustic hearing for speech recognition in a background of other talkers, but not in steady noise. Three subjects with an implanted "short-electrode" cochlear implant and preserved low-frequency acoustic hearing were also tested on speech recognition in the same competing backgrounds and compared to a larger group of traditional cochlear implant users. Each of the three short-electrode subjects performed better than any of the traditional long-electrode implant subjects for speech recognition in a background of other talkers, but not in steady noise, in general agreement with the simulation studies. When compared to a subgroup of traditional implant users matched according to speech recognition ability in quiet, the short-electrode patients showed a 9-dB advantage in the multitalker background. These experiments provide strong preliminary support for retaining residual low-frequency acoustic hearing in cochlear implant patients. The results are consistent with the idea that better perception of voice pitch, which can aid in separating voices in a background of other talkers, was responsible for this advantage.  相似文献   

17.
Currently, early phonatory changes in amyotrophic lateral sclerosis(ALS) are not well understood. The aim of this study was to compare acoustic parameters of voice in ALS subjects who demonstrated perceptually normal vocal quality on sustained phonation with a control group. We hypothesized that objective analysis of voice would reveal significant differences on specific acoustic parameters of voice compared to the control group. Results revealed statistically significant differences between the two groups on measures related to frequency range and phonatory stability. The findings suggest that early bulbar signs affecting the laryngeal system may be present in patients with ALS before the occurrence of perceptually aberrant vocal characteristics.  相似文献   

18.
Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator.  相似文献   

19.
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.  相似文献   

20.
An important clinical issue concerns the efficacy of current voice therapy approaches in treating voice disorders, such as vocal nodules. Much research focuses on finding reliable methods for documentation of treatment results. In this second treatment study of ten patients with vocal nodules, who participated in a behaviorally based voice therapy program, 11 aerodynamic (transglottal air pressure and glottal waveform) and acoustic (spl, f0, and spectrum slope) measures were used. Three pretherapy baseline assessments were carried out, followed by one assessment after each of five therapy phases. Measurements were made of two types of speech materials: Strings of repeated /pae/ syllables and sustained /ae/ phonations in two loudness conditions: comfortable loudness and loud voice. The data were normalized using z-scores, which were based on data from 22 normal subjects. The results showed that the aerodynamic measures reflected the presence of vocal pathology to a higher degree than did the acoustic spectral measures, and they should be useful in studies comparing nodule and normal voice production. Large individual session-to-session variation was found for all measures across pretherapy baseline recordings, which contributed to nonsignificant differences between baseline and therapy data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号