首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
The effect of background noise on speech production is an important issue, both from the practical standpoint of developing speech recognition algorithms and from the theoretical standpoint of understanding how speech is tuned to the environment in which it is spoken. Summers et al. [J. Acoust. Soc. Am. 84, 917-928 (1988]) address this issue by experimentally manipulating the level of noise delivered through headphones to two talkers and making several kinds of acoustic measurements on the resulting speech. They indicate that they have replicated effects on amplitude, duration, and pitch and have found effects on spectral tilt and first-formant frequency (F1). The authors regard these acoustic changes as effects in themselves rather than as consequences of a change in vocal effort, and thus treat equally the change in spectral tilt and the change in F1. In fact, the change in spectral tilt is a well-documented and understood consequence of the change in the glottal waveform, which is known to occur with increased effort. The situation with F1 is less clear and is made difficult by measurement problems. The bias in linear predictive coding (LPC) techniques related to two of the other changes-fundamental frequency and spectral tilt-is discussed.  相似文献   

2.
Acoustic parameters were measured for vowels spoken in /hVd/ context by four postlingually deafened recipients of multichannel (Ineraid) cochlear implants. Three of the subjects became totally deaf in adulthood after varying periods of partial hearing loss; the fourth became totally deaf at age four. The subjects received different degrees of perceptual benefit from the prosthesis. Recordings were made before, and at intervals following speech processor activation. The measured parameters included F1, F2, F0, SPL, duration, and amplitude difference between the first two harmonic peaks in the log magnitude spectrum (H 1-H2). Numerous changes in parameter values were observed from pre- to post-implant, with differences among subjects. Many changes, but not all, were in the direction of normative data, and most changes were consistent with hypotheses about relations among the parameters. Some of the changes tended to enhance phonemic contrasts; others had the opposite effect. For three subjects, H 1-H2 changed in a direction consistent with measurements of their average air flow when reading; that relation was more complex for the fourth subject. The results are interpreted with respect to: characteristics of the individual subjects, including vowel identification scores; mechanical interactions among glottal and supraglottal articulations; and hypotheses about the role of auditory feedback in the control of speech production. Almost all the observed differences could be attributed to changes in the average settings of speaking rate, F0 and SPL, which presumably can be perceived without the need for spectral place information. Some observed F2 realignment may be attributable to the reception of spectral cues.  相似文献   

3.
The effect of auditory feedback on speech production was investigated in five postlingually deafened adults implanted with the 22-channel Nucleus device. Changes in speech production were measured before implant and 1, 6, and 24 months postimplant. Acoustic measurements included: F1 and F2 of vowels in word-in-isolation and word-in-sentence context, voice-onset-time (VOT), spectral range of sibilants, fundamental frequency (F0) of word-in-isolation and word-in-sentence context, and word and sentence duration. Perceptual ratings of speech quality were done by ten listeners. The significant changes after cochlear implantation included: a decrease of F0, word and sentence duration, and F1 values, and an increase of voiced plosives' voicing lead (from positive to negative VOT values) and fricatives' spectral range. Significant changes occurred until 2 years postimplant when most measured values fell within Hebrew norms. Listeners were found to be sensitive to the acoustic changes in the speech from preimplant to 1, 6, and 24 months postimplant. Results suggest that when hearing is restored in postlingually deafened adults, calibration of speech is not immediate and occurs over time depending on the age-at-onset of deafness, years of deafness, and perception skills. The results also concur with hypothesis that the observed changes of some speech parameters are an indirect consequence of intentional changes in other articulatory parameters.  相似文献   

4.
Speech production parameters of three postlingually deafened adults who use cochlear implants were measured: after 24 h of auditory deprivation (which was achieved by turning the subject's speech processor off); after turning the speech processor back on; and after turning the speech processor off again. The measured parameters included vowel acoustics [F1, F2, F0, sound-pressure level (SPL), duration and H1-H2, the amplitude difference between the first two spectral harmonics, a correlate of breathiness] while reading word lists, and average airflow during the reading of passages. Changes in speech processor state (on-to-off or vice versa) were accompanied by numerous changes in speech production parameters. Many changes were in the direction of normalcy, and most were consistent with long-term speech production changes in the same subjects following activation of the processors of their cochlear implants [Perkell et al., J. Acoust. Soc. Am. 91, 2961-2978 (1992)]. Changes in mean airflow were always accompanied by H1-H2 (breathiness) changes in the same direction, probably due to underlying changes in laryngeal posture. Some parameters (different combinations of SPL, F0, H1-H2 and formants for different subjects) showed very rapid changes when turning the speech processor on or off. Parameter changes were faster and more pronounced, however, when the speech processor was turned on than when it was turned off. The picture that emerges from the present study is consistent with a dual role for auditory feedback in speech production: long-term calibration of articulatory parameters as well as feedback mechanisms with relatively short time constants.  相似文献   

5.
It was explored how three types of intensive cognitive load typical of military aviation (load on situation awareness, information processing, or decision-making) affect speech. The utterances of 13 male military pilots were recorded during simulated combat flights. Articulation rate was calculated from the speech samples, and the first formant (F1) and second formant (F2) were tracked from first-syllable short vowels in pre-defined phoneme environments. Articulation rate was found to correlate negatively (albeit with low coefficients) with loads on situation awareness and decision-making but not with changes in F1 or F2. Changes were seen in the spectrum of the vowels: mean F1 of front vowels usually increased and their mean F2 decreased as a function of cognitive load, and both F1 and F2 of back vowels increased. The strongest associations were seen between the three types of cognitive load and F1 and F2 changes in back vowels. Because fluent and clear radio speech communication is vital to safety in aviation and temporal and spectral changes may affect speech intelligibility, careful use of standard aviation phraseology and training in the production of clear speech during a high level of cognitive load are important measures that diminish the probability of possible misunderstandings.  相似文献   

6.
Speakers of rhotic dialects of North American English show a range of different tongue configurations for /r/. These variants produce acoustic profiles that are indistinguishable for the first three formants [Delattre, P., and Freeman, D. C., (1968). "A dialect study of American English r's by x-ray motion picture," Linguistics 44, 28-69; Westbury, J. R. et al. (1998), "Differences among speakers in lingual articulation for American English /r/," Speech Commun. 26, 203-206]. It is puzzling why this should be so, given the very different vocal tract configurations involved. In this paper, two subjects whose productions of "retroflex" /r/ and "bunched" /r/ show similar patterns of F1-F3 but very different spacing between F4 and F5 are contrasted. Using finite element analysis and area functions based on magnetic resonance images of the vocal tract for sustained productions, the results of computer vocal tract models are compared to actual speech recordings. In particular, formant-cavity affiliations are explored using formant sensitivity functions and vocal tract simple-tube models. The difference in F4/F5 patterns between the subjects is confirmed for several additional subjects with retroflex and bunched vocal tract configurations. The results suggest that the F4/F5 differences between the variants can be largely explained by differences in whether the long cavity behind the palatal constriction acts as a half- or a quarter-wavelength resonator.  相似文献   

7.
The effects of ingesting ethanol have been shown to be somewhat variable in humans. To date, there appear to be but few universals. Yet, the question often arises: is it possible to determine if a person is intoxicated by observing them in some manner? A closely related question is: can speech be used for this purpose and, if so, can the degree of intoxication be determined? One of the many issues associated with these questions involves the relationships between a person's paralinguistic characteristics and the presence and level of inebriation. To this end, young, healthy speakers of both sexes were carefully selected and sorted into roughly equal groups of light, moderate, and heavy drinkers. They were asked to produce four types of utterances during a learning phase, when sober and at four strictly controlled levels of intoxication (three ascending and one descending). The primary motor speech measures employed were speaking fundamental frequency, speech intensity, speaking rate and nonfluencies. Several statistically significant changes were found for increasing intoxication; the primary ones included rises in F0, in task duration and for nonfluencies. Minor gender differences were found but they lacked statistical significance. So did the small differences among the drinking category subgroups and the subject groupings related to levels of perceived intoxication. Finally, although it may be concluded that certain changes in speech suprasegmentals will occur as a function of increasing intoxication, these patterns cannot be viewed as universal since a few subjects (about 20%) exhibited no (or negative) changes.  相似文献   

8.
Auditory feedback during speech production is known to play a role in speech sound acquisition and is also important for the maintenance of accurate articulation. In two studies the first formant (F1) of monosyllabic consonant-vowel-consonant words (CVCs) was shifted electronically and fed back to the participant very quickly so that participants perceived the modified speech as their own productions. When feedback was shifted up (experiment 1 and 2) or down (experiment 1) participants compensated by producing F1 in the opposite frequency direction from baseline. The threshold size of manipulation that initiated a compensation in F1 was usually greater than 60 Hz. When normal feedback was returned, F1 did not return immediately to baseline but showed an exponential deadaptation pattern. Experiment 1 showed that this effect was not influenced by the direction of the F1 shift, with both raising and lowering of F1 exhibiting the same effects. Experiment 2 showed that manipulating the number of trials that F1 was held at the maximum shift in frequency (0, 15, 45 trials) did not influence the recovery from adaptation. There was a correlation between the lag-one autocorrelation of trial-to-trial changes in F1 in the baseline recordings and the magnitude of compensation. Some participants therefore appeared to more actively stabilize their productions from trial-to-trial. The results provide insight into the perceptual control of speech and the representations that govern sensorimotor coordination.  相似文献   

9.
A perceptual analysis of the French vowel [u] produced by 10 speakers under normal and perturbed conditions (Savariaux et al., 1995) is presented which aims at characterizing in the perceptual domain the task of a speaker for this vowel, and, then, at understanding the strategies developed by the speakers to deal with the lip perturbation. Identification and rating tests showed that the French [u] is perceptually fairly well described in the [F1, (F2-F0)] plane, and that the parameter (((F2-F0) + F1)/2) (all frequencies in bark) provides a good overall correlate of the "grave" feature classically used to describe the vowel [u] in all languages. This permitted reanalysis of the behavior of the speakers during the perturbation experiment. Three of them succeed in producing a good [u] in spite of the lip tube, thanks to a combination of limited changes on F1 and (F2-F0), but without producing the strong backward movement of the tongue, which would be necessary to keep the [F1,F2] pattern close to the one measured in normal speech. The only speaker who strongly moved his tongue back and maintained F1 and F2 at low values did not produce a perceptually well-rated [u], but additional tests demonstrate that this gesture allowed him to preserve the most important phonetic features of the French [u], which is primarily a back and rounded vowel. It is concluded that speech production is clearly guided by perceptual requirements, and that the speakers have a good representation of them, even if they are not all able to meet them in perturbed conditions.  相似文献   

10.
This study investigated whether speech produced in spontaneous interactions when addressing a talker experiencing actual challenging conditions differs in acoustic-phonetic characteristics from speech produced (a) with communicative intent under more ideal conditions and (b) without communicative intent under imaginary challenging conditions (read, clear speech). It also investigated whether acoustic-phonetic modifications made to counteract the effects of a challenging listening condition are tailored to the condition under which communication occurs. Forty talkers were recorded in pairs while engaged in "spot the difference" picture tasks in good and challenging conditions. In the challenging conditions, one talker heard the other (1) via a three-channel noise vocoder (VOC); (2) with simultaneous babble noise (BABBLE). Read, clear speech showed more extreme changes in median F0, F0 range, and speaking rate than speech produced to counter the effects of a challenging listening condition. In the VOC condition, where F0 and intensity enhancements are unlikely to aid intelligibility, talkers did not change their F0 median and range; mean energy and vowel F1 increased less than in the BABBLE condition. This suggests that speech production is listener-focused, and that talkers modulate their speech according to their interlocutors' needs, even when not directly experiencing the challenging listening condition.  相似文献   

11.
12.
Changes in the speech spectrum of vowels and consonants before and after tonsillectomy were investigated to find out the impact of the operation on speech quality. Speech recordings obtained from patients were analyzed using the Kay Elemetrics, Multi-Dimensional Voice Processing (MDVP Advanced) software. Examination of the time-course changes after the operation revealed that certain speech parameters changed. These changes were mainly F3 (formant center frequency) and B3 (formant bandwidth) for the vowel /o/ and a slight decrease in B1 and B2 for the vowel /a/. The noise-to-harmonic ratio (NHR) also decreased slightly, suggesting less nasalized vowels. It was also observed that the fricative, glottal consonant /h/ has been affected. The larger the tonsil had been, the more changes were seen in the speech spectrum. The changes in the speech characteristics (except F3 and B3 for the vowel /o/) tended to recover, suggesting an involvement of auditory feedback and/or replacement of a new soft tissue with the tonsils. Although the changes were minimal and, therefore, have little effect on the extracted acoustic parameters, they cannot be disregarded for those relying on their voice for professional reasons, that is, singers, professional speakers, and so forth.  相似文献   

13.
Three alternative speech coding strategies suitable for use with cochlear implants were compared in a study of three normally hearing subjects using an acoustic model of a multiple-channel cochlear implant. The first strategy (F2) presented the amplitude envelope of the speech and the second formant frequency. The second strategy (F0 F2) included the voice fundamental frequency, and the third strategy (F0 F1 F2) presented the first formant frequency as well. Discourse level testing with the speech tracking method showed a clear superiority of the F0 F1 F2 strategy when the auditory information was used to supplement lipreading. Tracking rates averaged over three subjects for nine 10-min sessions were 40 wpm for F2, 52 wpm for F0 F2, and 66 wpm for F0 F1 F2. Vowel and consonant confusion studies and a test of prosodic information were carried out with auditory information only. The vowel test showed a significant difference between the strategies, but no differences were found for the other tests. It was concluded that the amplitude and duration cues common to all three strategies accounted for the levels of consonant and prosodic information received by the subjects, while the different tracking rates were a consequence of the better vowel recognition and the more natural quality of the F0 F1 F2 strategy.  相似文献   

14.
Emotional information in speech is commonly described in terms of prosody features such as F0, duration, and energy. In this paper, the focus is on how F0 characteristics can be used to effectively parametrize emotional quality in speech signals. Using an analysis-by-synthesis approach, F0 mean, range, and shape properties of emotional utterances are systematically modified. The results show the aspects of the F0 parameter that can be modified without causing any significant changes in the perception of emotions. To model this behavior the concept of emotional regions is introduced. Emotional regions represent the variability present in the emotional speech and provide a new procedure for studying speech cues for judgments of emotion. The method is applied to F0 but can be also used on other aspects of prosody such as duration or loudness. Statistical analysis of the factors affecting the emotional regions, and discussion of the effects of F0 modifications on the emotion and speech quality perception are also presented. The results show that F0 range is more important than F0 mean for emotion expression.  相似文献   

15.
I.IntroductionTheF,patternsofspeechareimportantnotonlyforthcprosodicfeaturesbuta1soforvoicesourcecharactcristics.Nowmoreandmorespeechscientistsrecognizedthatvoiceexcitationsourceintcxt-to-spccchsystemsp1aysanimportantro1elnbothintclligibilityandnaturalnessorsynthcticspcech.Espccially,forChinese,atone1anguagewithmulti-tonesystem,thetonalpatternswhicharcmainlydcmonstratedintheF,con-tourscarry1exicalmeaning.SomecomparativestudiesoftheF,pattcrnsinbetweentonelanguage(Chinese)andstress1anguage(En…  相似文献   

16.
The effects of variations in vocal effort corresponding to common conversation situations on spectral properties of vowels were investigated. A database in which three degrees of vocal effort were suggested to the speakers by varying the distance to their interlocutor in three steps (close--0.4 m, normal--1.5 m, and far--6 m) was recorded. The speech materials consisted of isolated French vowels, uttered by ten naive speakers in a quiet furnished room. Manual measurements of fundamental frequency F0, frequencies, and amplitudes of the first three formants (F1, F2, F3, A1, A2, and A3), and on total amplitude were carried out. The speech materials were perceptually validated in three respects: identity of the vowel, gender of the speaker, and vocal effort. Results indicated that the speech materials were appropriate for the study. Acoustic analysis showed that F0 and F1 were highly correlated with vocal effort and varied at rates close to 5 Hz/dB for F0 and 3.5 Hz/dB for F1. Statistically F2 and F3 did not vary significantly with vocal effort. Formant amplitudes A1, A2, and A3 increased significantly; The amplitudes in the high-frequency range increased more than those in the lower part of the spectrum, revealing a change in spectral tilt. On the average, when the overall amplitude is increased by 10 dB, A1, A2, and A3 are increased by 11, 12.4, and 13 dB, respectively. Using "auditory" dimensions, such as the F1-F0 difference, and a "spectral center of gravity" between adjacent formants for representing vowel features did not reveal a better constancy of these parameters with respect to the variations of vocal effort and speaker. Thus a global view is evoked, in which all of the aspects of the signal should be processed simultaneously.  相似文献   

17.
The fundamental frequencies (F0) of daily life utterances of Japanese infants and their parents from the infant's birth until about 5 years of age were longitudinally analyzed. The analysis revealed that an infant's F0 mean decreases as a function of month of age. It also showed that within- and between-utterance variability in infant F0 is different before and after the onset of two-word utterances, probably reflecting the difference between linguistic and nonlinguistic utterances. Parents' F0 mean is high in infant-directed speech (IDS) before the onset of two-word utterances, but it gradually decreases and reaches almost the same value as in adult-directed speech after the onset of two-word utterances. The between-utterance variability of parents' F0 in IDS is large before the onset of two-word utterances and it subsequently becomes smaller. It is suggested that these changes of parents' F0 are closely related to the feasibility of communication between infants and parents.  相似文献   

18.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

19.
胡涵  顾文涛 《声学学报》2022,47(2):276-286
个体依恋风格可基于依恋回避、依恋焦虑这两个维度加以定义,并根据其取值高低划分为4种依恋类型.为探究依恋风格对亲密话语语音特征的影响,我们选取12对年轻异性情侣,采用亲密关系体验量表测出各人的依恋回避与焦虑值.通过半开放式的约会剧本,诱导被试产出亲密语气的目标句,再单独朗读这些目标句作为中性话语.基于9个韵律及嗓音参数的...  相似文献   

20.
While a large portion of the variance among listeners in speech recognition is associated with the audibility of components of the speech waveform, it is not possible to predict individual differences in the accuracy of speech processing strictly from the audiogram. This has suggested that some of the variance may be associated with individual differences in spectral or temporal resolving power, or acuity. Psychoacoustic measures of spectral-temporal acuity with nonspeech stimuli have been shown, however, to correlate only weakly (or not at all) with speech processing. In a replication and extension of an earlier study [Watson et al., J. Acoust. Soc. Am. Suppl. 1 71. S73 (1982)] 93 normal-hearing college students were tested on speech perception tasks (nonsense syllables, words, and sentences in a noise background) and on six spectral-temporal discrimination tasks using simple and complex nonspeech sounds. Factor analysis showed that the abilities that explain performance on the nonspeech tasks are quite distinct from those that account for performance on the speech tasks. Performance was significantly correlated among speech tasks and among nonspeech tasks. Either, (a) auditory spectral-temporal acuity for nonspeech sounds is orthogonal to speech processing abilities, or (b) the appropriate tasks or types of nonspeech stimuli that challenge the abilities required for speech recognition have yet to be identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号