首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 609 毫秒
1.
The purpose of this study was to quantify the effect of timing errors on the intelligibility of deaf children's speech. Deviant timing patterns were corrected in the recorded speech samples of six deaf children using digital speech processing techniques. The speech waveform was modified to correct timing errors only, leaving all other aspects of the speech unchanged. The following six-stage approximation procedure was used to correct the deviant timing patterns: (1) original, unaltered utterances, (2) correction of pauses only, (3) correction of relative timing, (4) correction of absolute syllable duration, (5) correction of relative timing and pauses, and (6) correction of absolute syllable duration and pauses. Measures of speech intelligibility were obtained for the original and the computer-modified utterances. On the average, the highest intelligibility score was obtained when relative timing errors only were corrected. The correction of this type of error improved the intelligibility of both stressed and unstressed words within a phrase. Improvements in word intelligibility, which occurred when relative timing was corrected, appeared to be closely related to the number of phonemic errors present within a word. The second highest intelligibility score was obtained for the original, unaltered sentences. On the average, the intelligibility scores obtained for the other four forms of timing modification were poorer than those obtained for the original sentences. Thus, the data show that intelligibility improved, on the average, when only one type of error, relative timing, was corrected.  相似文献   

2.
Which acoustic properties of the speech signal differ between rhythmically prominent syllables and non-prominent ones? A production experiment was conducted to identify these acoustic properties. Subjects read out repetitive text to a metronome, trying to match stressed syllables to its beat. The analysis searched for the function of the speech signal that best predicts the timing of the metronome ticks. The most important factor in this function is found to be the contrast in loudness between a syllable and its neighbors. The prominence of a syllable can be deduced from the specific loudness in an (approximately) 360 ms window centered on the syllable in question relative to an (approximately) 800-ms-wide symmetric window.  相似文献   

3.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

4.
Automatic speech recognition using psychoacoustic models.   总被引:1,自引:0,他引:1  
An approach to automatic speech recognition is described, which, in a straightforward way, follows the concept of (1) preprocessing in terms of auditory parameters and (2) subsequent classification and recognition. The preprocessing system has been realized in analog hardware, while recognition is carried out on a digital computer. In the preprocessing system, the essential psychoacoustic principles of the perception of loudness, pitch, roughness, and subjective duration are implemented with some approximation. The system essentially consists of 24 bandpass filters, nonlinear transformation of each filter output into specific loudness and specific roughness, and final transformation of these parameters into total loudness, total roughness, and three spectral momenta. As a means to further reduce the information flow, continuous selection of dominant parameters is also considered on the basis of psychoacoustic data. The subsequent recognition process is mainly characterized by (1) discrimination between speech and silent periods, (2) detection of syllable peaks and classification of syllable nuclei, and (3) assumption of syllable boundaries and classification of consonant clusters. Though the entire system as yet is far from being complete and perfect, the present results indicate that the concept provides a systematic and promising way towards automatic recognition of continuous speech.  相似文献   

5.
Stray-field techniques are reported for 31P studies of solids for a variety of compounds including bone, bone meal and calcium hydroxyapatite. Long Hahn echo trains produced by the application of many pulses were used as in the long echo-train summation technique. Double-resonance enhancements of 31P by use of both direct and indirect experiments were attempted on a sample of NH4PF6:31P{19F} double resonance produced, at most, a 26% increase in the initial level of the 31P echo signal.  相似文献   

6.
A stray field (STRAFI) module has been added to the GAMMA magnetic resonance simulation platform in order to facilitate computational investigations of NMR experiments in large static field gradients that are on the order of 50 T/m. The package has been used to examine system response during echo trains generated by the application of shaped pulses. The associated echo amplitude maxima and effective slice thickness are presented. A new accurate method for STRAFI pulse calibration based on relative echo amplitudes is proposed.  相似文献   

7.
A dead region is a region of the cochlea where there are no functioning inner hair cells (IHCs) and/or neurons; it can be characterized in terms of the characteristic frequencies of the IHCs bordering that region. We examined the effect of high-frequency amplification on speech perception for subjects with high-frequency hearing loss with and without dead regions. The limits of any dead regions were defined by measuring psychophysical tuning curves and were confirmed using the TEN test described in Moore et al. [Br. J. Audiol. 34, 205-224 (2000)]. The speech stimuli were vowel-consonant-vowel (VCV) nonsense syllables, using one of three vowels (/i/, /a/, and /u/) and 21 different consonants. In a baseline condition, subjects were tested using broadband stimuli with a nominal input level of 65 dB SPL. Prior to presentation via Sennheiser HD580 earphones, the stimuli were subjected to the frequency-gain characteristic prescribed by the "Cambridge" formula, which is intended to give speech at 65 dB SPL the same overall loudness as for a normal listener, and to make the average loudness of the speech the same for each critical band over the frequency range important for speech intelligibility (in a listener without a dead region). The stimuli for all other conditions were initially subjected to this same frequency-gain characteristic. Then, the speech was low-pass filtered with various cutoff frequencies. For subjects without dead regions, performance generally improved progressively with increasing cutoff frequency. This indicates that they benefited from high-frequency information. For subjects with dead regions, two patterns of performance were observed. For most subjects, performance improved with increasing cutoff frequency until the cutoff frequency was somewhat above the estimated edge frequency of the dead region, but hardly changed with further increases. For a few subjects, performance initially improved with increasing cutoff frequency and then worsened with further increases, although the worsening was significant only for one subject. The results have important implications for the fitting of hearing aids.  相似文献   

8.
The purpose of this study was to investigate if there is an effect of task on determination of habitual loudness. Four tasks commonly used to elicit habitual loudness were compared (automatic speech, elicited speech, spontaneous speech, and reading aloud). Participants were adult female speakers (N=30) with normal voice. A one-way analysis of variance (ANOVA) revealed a statistically significant (p < 0.05) effect of task, with post-hoc analyses indicating that there was a statistically significant difference in habitual loudness elicited via automatic versus spontaneous speech (p < 0.05), and automatic speech versus reading aloud (p < 0.001). The issue of how habitual loudness is defined is considered. Implications of the use of one task for determination of habitual loudness are discussed, as is the possibility of a task effect on determination of other clinically useful vocal parameters.  相似文献   

9.
Preliminary data [M. Epstein and M. Florentine, Ear. Hear. 30, 234-237 (2009)] obtained using speech stimuli from a visually present talker heard via loudspeakers in a sound-attenuating chamber indicate little difference in loudness when listening with one or two ears (i.e., significantly reduced binaural loudness summation, BLS), which is known as "binaural loudness constancy." These data challenge current understanding drawn from laboratory measurements that indicate a tone presented binaurally is louder than the same tone presented monaurally. Twelve normal listeners were presented recorded spondees, monaurally and binaurally across a wide range of levels via earphones and a loudspeaker with and without visual cues. Statistical analyses of binaural-to-monaural ratios of magnitude estimates indicate that the amount of BLS is significantly less for speech presented via a loudspeaker with visual cues than for stimuli with any other combination of test parameters (i.e., speech presented via earphones or a loudspeaker without visual cues, and speech presented via earphones with visual cues). These results indicate that the loudness of a visually present talker in daily environments is little affected by switching between binaural and monaural listening. This supports the phenomenon of binaural loudness constancy and underscores the importance of ecological validity in loudness research.  相似文献   

10.
The indirect auditory feedback from one's own voice arises from sound reflections at the room boundaries or from sound reinforcement systems. The relative variations of indirect auditory feedback are quantified through room acoustic parameters such as the room gain and the voice support, rather than the reverberation time. Fourteen subjects matched the loudness level of their own voice (the autophonic level) to that of a constant and external reference sound, under different synthesized room acoustics conditions. The matching voice levels are used to build a set of equal autophonic level curves. These curves give an indication of the amount of variation in voice level induced by the acoustic environment as a consequence of the sidetone compensation or Lombard effect. In the range of typical rooms for speech, the variations in overall voice level that result in a constant autophonic level are on the order of 2 dB, and more than 3 dB in the 4 kHz octave band. By comparison of these curves with previous studies, it is shown that talkers use acoustic cues other than loudness to adjust their voices when speaking in different rooms.  相似文献   

11.
Three experiments on loudness of sounds with linearly increasing levels were performed: global loudness was measured using direct ratings, loudness change was measured using direct and indirect estimations. Results revealed differences between direct and indirect estimations of loudness change, indicating that the underlying perceptual phenomena are not the same. The effect of ramp size is small for the former and important for the latter. A similar trend was revealed between global loudness and direct estimations of loudness change according to the end level, suggesting they may have been confounded. Measures provided by direct estimations of loudness change are more participant-dependent.  相似文献   

12.
13.
Vocalization and breathing were studied in 40 healthy infants, including five boys and five girls each at ages 5 weeks, 2.5 months, 6.5 months, and 12 months. Breathing was monitored through the use of a variable inductance plethysmograph that enabled estimates of the volume changes of the rib cage, abdomen, and lung, as well as estimates of selected temporal features of the breathing cycle. Four vocalization types were studied intensively. These included cries, whimpers, grunts, and syllable utterances. Breathing behavior was highly variable across the four vocalization types, demonstrating the degrees of freedom of performance available to the infant to accomplish the aeromechanical drive required. Such behavior was influenced by body length, body position, and age, but not by vocalization type and sex. The protocol established is a useful tool for observing the natural course of the emergence of vocalization and breathing during the first year of life.  相似文献   

14.
A voice range profile (VRP) was obtained from each of eight professional actors and compared with two speech range profiles (SRPs). One speech profile was obtained during the dramatic reading of a scene in the laboratory and the other during a performance on stage in a professional theater. The objective was to determine the pitch and loudness ranges used by the actors in speech relative to the VRP. The principal question of interest was whether the actors stayed within the center of the VRP, or whether they tended to drift toward the boundaries of intensity and frequency. A second question was whether the performance within the laboratory accurately reflects that of a stage performance. The results suggest that some subjects tend to exceed the center of the VRP during the stage performance. It is hypothesized that these actors may stress their vocal mechanism during performance and are more likely candidates for vocal injury.  相似文献   

15.
The syllable repetitions of 24 child and eight teenage stutterers were investigated to assess whether the vowels neutralize and, if so, what causes this. In both groups of speakers, the vowel in CV syllable repetitions and the following fluent vowel were excised from conversational speech samples. Acoustic analyses showed the formant frequencies of vowels in syllable repetitions to be appropriate for the intended vowel and the duration of the dysfluent vowels to be shorter than those of the fluent vowels for both groups of speakers. The intensity of the fluent vowels was greater than that of the dysfluent vowels for the teenagers but not the children: For both age groups, excitation waveforms obtained by inverse filtering showed that the excitation spectra associated with dysfluent vowels fell off more rapidly with frequency than did those associated with the fluent vowels. The fundamental frequency of the children's dysfluent speech was higher than their fluent speech while there was no difference in the teenager's speech. The relationship between the intensities of the glottal volume velocities was the same as that of the speech waveforms. Perceptual tests were also conducted to assess whether duration and the differences found in the source excitation would make children's vowels sound neutral. The experiments show that in children neither vowel duration nor fundamental frequency differences cause the vowels to be perceived as neutral. The results suggest that the low intensity and characteristics of the source of excitation which cause vowels to sound neutral may only occur in late childhood. Furthermore, monitoring stuttered speech for the emergence of neutral vowels may be a way of indexing the progress of the disorder.  相似文献   

16.
Previous research has suggested that speech loudness is determined primarily by the vowel in consonant-vowel-consonant (CVC) monosyllabic words, and that consonant intensity has a negligible effect. The current study further examines the unique aspects of speech loudness by manipulating consonant-vowel intensity ratios (CVRs), while holding the vowel constant at a comfortable listening level (70 dB), to determine the extent to which vowels and consonants contribute differentially to the loudness of monosyllabic words with voiced and voiceless consonants. The loudness of words edited to have CVRs ranging from -6 to +6?dB was compared to that of standard words with unaltered CVR by 10 normal-hearing listeners in an adaptive procedure. Loudness and overall level as a function of CVR were compared for four CVC word types: both voiceless consonants modified; only initial voiceless consonants modified; both voiced consonants modified; and only initial voiced consonants modified. Results indicate that the loudness of CVC monosyllabic words is not based strictly on the level of the vowel; rather, the overall level of the word and the level of the vowel contribute approximately equally. In addition to furthering the basic understanding of speech perception, the current results may be of value for the coding of loudness by hearing aids and cochlear implants.  相似文献   

17.
Seventeen healthy women, 45 to 61 years old, were examined using videofiberstroboscopy during phonation at three loudness levels. Two phoniatricians evaluated glottal closure using category and ratio scales. Transglottal airflow was studied by inverse filtering of the oral airflow signal recorded in a flow mask (Glottal Enterprises System) during the spoken phrase /ba:pa:pa:pa:p/ at three loudness levels. Subglottal pressure was estimated from the intraoral pressure during p occlusion. Running speech and the repeated /pa:/ syllables were perceptually evaluated by three speech pathologists regarding breathiness, hypo-, and hyperfunction, using continuous scales. Incomplete glottal closure was found in 35 of 46 phonations (76%). The degree of glottal closure increased significantly with raised loudness. Half of the women closed the glottis completely during loud phonation. Posterior glottal chink (PGC) was the most common gap configuration and was found in 28 of 46 phonations (61%). One third of the PGCs were in the cartilaginous glottis (PGCc) only. Two thirds extended into the membranous portion (PGCm); most of these occurred during soft phonation. Peak flow, peak-to-peak (AC) flow, and the maximum rate of change for the flow in the closing phase increased significantly with raised loudness. Minimum flow decreased significantly from normal to loud voice. Breathiness decreased with increased loudness. The results suggest that the incomplete closure patterns PGCc and PGCm during soft phonation ought primarily to be regarded as normal for Swedish women in this age group.  相似文献   

18.
19.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

20.
Speech motor control timing was examined by means of a multiple correlational analysis involving interarticulatory delay and speech rate as predictor variables, and four subsyllabic time segments of the syllable [ka] as dependent variables. The hypothesis was that the two putative temporal constraints have differential predictive capacity for various segments of the syllable. Results from 11 subjects were in support of the hypothesis. Syllable onset duration was reliably predicted by the linear addition of interarticulatory delay and speech rate, while the duration of the midportion of the syllable was nearly exclusively predicted by the overall speech rate. This model was found to be applicable to all conditions of normal and clenched teeth, context-free and contextual, normally paced and rapid speech production, with minor differences in predictive capacity for different conditions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号