首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
The question of whether visual information can affect ongoing speech production arises from numerous studies demonstrating an interaction between auditory and visual information during speech perception. In a preliminary study, the effect of delayed visual feedback on speech production was examined. Two of the 13 subjects demonstrated speech errors that were directly related to the delayed visual signal. However, in the main experiment, providing immediate visual feedback of the articulators did not diminish the effects of delayed auditory feedback for 11 speakers.  相似文献   

3.
Application of the chirp z-transform to MRI data   总被引:1,自引:0,他引:1  
A version of the chirp z-transform (CZT) enabling signal intensity and phase-preserving field-of-view scaling has been programmed. The algorithm is important for all single-point imaging sequences such as SPRITE when used with multiple data acquisition for T2* mapping or signal averaging. CZT has particular utility for SPRITE imaging of nuclei with short relaxation times such as sodium at high field. Here, a complete theory of the properties of CZT is given. This method operates entirely in k-space. It is compared with a conventional interpolation approach that works in image space after the application of a fast Fourier transformation.  相似文献   

4.
Cluster analysis techniques are gaining widespread use for segmentation of MRI data, especially for volume measurement and 3-D display purposes. This paper describes four improvements to such techniques: (1) The use of intensity simulations to model cluster plots; (2) Correction of image nonuniformity; (3) Anisotropic smoothing of data; and (4) Automatic isolation of tissues of interest. Simulation of cluster plots allows an informed choice of pulse sequence(s) and acquisition parameters to be made. Correction of image nonuniformity and anisotropic smoothing reduce the spread of signal intensity from a single tissue thus producing significantly more compact clusters, whilst the isolation of tissues of interest prevents overlap of clusters from the tissues of interest with those not under consideration. These techniques may be used to improve the results of cluster analysis or traded off, for example to allow lower signal-to-noise images, shorter repetition time images, or fewer images to be used for segmentation.  相似文献   

5.
6.
Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production. Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal-tract shaping during fluent speech using MRI. The present study uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5-T CV/i scanner. This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and interarticulator timing can all be quantitatively evaluated. Data show clear real-time movements of the lips, tongue, and velum. Sample movies and data analysis strategies are presented.  相似文献   

7.
Application of an auditory model to speech recognition   总被引:3,自引:0,他引:3  
Some aspects of auditory processing are incorporated in a front end for the IBM speech-recognition system [F. Jelinek, "Continuous speech recognition by statistical methods," Proc. IEEE 64 (4), 532-556 (1976)]. This new process includes adaptation, loudness scaling, and mel warping. Tests show that the design is an improvement over previous algorithms.  相似文献   

8.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

9.
We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM.  相似文献   

10.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

11.
It has been suggested that pauses between words could act as indices of processes such as selection, retrieval or planning that are required before an utterance is articulated. For normal meaningful phrase utterances, there is hardly any information regarding the relationship between articulation and pause duration and their subsequent relation to the final phrase duration. Such associations could provide insights into the mechanisms underlying the planning and execution of a vocal utterance. To execute a fluent vocal utterance, children might adopt different strategies in development. We investigate this hypothesis by examining the roles of articulation time and pause duration in meaningful phrase utterances in 46 children between the ages of 4 and 8 years, learning English as a second language.Our results indicate a significant reduction in phrase, word and interword pause duration with increasing age. A comparison of pause, word and phrase duration for individual subjects belonging to different age groups indicates a changing relationship between pause and word duration for the production of fluent speech. For the youngest children, a strong correlation between pause and word duration indicates local planning at word level for speech production and thus greater dependence of pause on immediate word utterance. In contrast for the oldest children we find a significant drop in correlation between word and pause indicating the emergence of articulation and pause planning as two independent processes directed at producing a fluent utterance. Strong correlations between other temporal parameters indicate a more holistic approach being adopted by the older children for language production.  相似文献   

12.
13.
It appears that temperature instabilities are a major obstacle hindering the use of semiconductor strain gauge pressure transducers in speech research, especially when absolute pressure data are mandatory. In this paper a simple and reliable method for an in vivo calibration of this kind of transducer is described. The most important error source, the drift of the zero pressure level due to temperature changes, is discussed, and an estimation of the measurement accuracy which can be obtained is given. Moreover, some registrations of subglottal, supraglottal, and transglottal pressure are presented. It is shown that the pressure recordings allow us to obtain estimates of the volume flow in the trachea and pharynx. Analysis of those waveforms appears to lead to new insights into the physical processes underlying voice production. Specifically, an independent glottal contribution to the skewing of the glottal flow pulses is identified.  相似文献   

14.
15.
16.
Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed.  相似文献   

17.
Operations of the type of taking the logarithm, summation, and time delay can be used by a simple operator, which, with different values of frequency and time parameters, creates different transformations of the current spectrum. This operator simulates such properties of auditory perception as temporal and frequency masking, effects of turning on and off a stimulus, selective response to transients with different rates, detection of amplitude and frequency modulations, and adaptation to the mean signal level.  相似文献   

18.
Modeling the peripheral speech motor system can advance the understanding of speech motor control and audiovisual speech perception. A 3-D physical model of the human face is presented. The model represents the soft tissue biomechanics with a multilayer deformable mesh. The mesh is controlled by a set of modeled facial muscles which uses a standard Hill-type representation of muscle dynamics. In a test of the model, recorded intramuscular electromyography (EMG) was used to activate the modeled muscles and the kinematics of the mesh was compared with 3-D kinematics recorded with OPTOTRAK. Overall, there was a good match between the recorded data and the model's movements. Animations of the model are provided as MPEG movies.  相似文献   

19.
Although some therapeutics provide an opportunity for cure, recurrence is a major obstacle to achieve a complete remission for lung cancers. Therefore, precise assessment of lung cancers has been a task with challenge. In recent years, integration of positron emission tomography and computed tomography (PET–CT) and whole-body magnetic resonance imaging (WB-MRI) have been introduced as an alternative to standard multimodality imaging strategies and are now increasingly applied to various malignancies. However, there is little information on the surveillance capability of WB-MRI in patients with lung cancers. We aimed to investigate the clinical potential of WB-MRI as a novel surveillance modality after curative treatments for lung cancers, comparing it with PET–CT. Sixty two consecutive patients with lung malignancy who underwent both WB-MRI and PET–CT were selected to assess the recurrent malignant lesions. The clinical data including radiologic and pathologic findings were collected and analyzed retrospectively. On each lymph node station, the ability of WB-MRI to detect malignant lesions significantly correlated with that of PET–CT (γ= 0.86; P<.01). The correlation coefficient ranged from 0.999 to 1 for assessing distant metastases from lung cancers by two modalities (P<.01). Based on the pathologic confirmation, both modalities showed an equivalent diagnostic accuracy (PET–CT: sensitivity 85.71%, specificity 47.27% versus WB-MRI: sensitivity 85.71%, specificity 56.25%). This study demonstrates the clinical potential of WB-MRI, together with PET–CT, as a novel surveillance modality for lung cancers after curative treatments.  相似文献   

20.
We present results from a pilot study directed at developing an anchorable subjective speech quality test. The test uses multidimensional scaling techniques to obtain quantitative information about the perceptual attributes of speech. In the first phase of the study, subjects ranked perceptual distances between samples of speech produced by two different talkers, one male and one female, processed by a variety of codecs. The resulting distance matrices were processed to obtain, for each talker, a stimulus space for the various speech samples. This stimulus space has the properties that distances between stimuli in this space correspond to perceptual distances between stimuli and that the dimensions of this space correspond to attributes used by the subjects in determining perceptual distances. Mean opinion scores (MOS) scores obtained in an earlier study were found to be highly correlated with position in the stimulus space, and the three dimensions of the stimulus space were found to have identifiable physical and perceptual correlates. In the second phase of the study, we developed techniques for fitting speech generated by a new codec under investigation into a previously established stimulus space. The user is provided with a collection of speech samples and with the stimulus space for these speech samples as determined by a large-scale listening test. The user then carries out a much smaller listening test to determine the position of the new stimulus in the previously established stimulus space. This system is anchorable, so that different versions of a codec under development can be compared directly, and it provides more detailed information than the single number provided by MOS testing. We suggest that this information could be used to advantage in algorithm development and in development of objective measures of speech quality.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号