共查询到20条相似文献,搜索用时 15 毫秒
1.
2.
Meister H Landwehr M Pyschny V Grugel L Walger M 《The Journal of the Acoustical Society of America》2011,129(5):EL204-EL209
The corruption of intonation contours has detrimental effects on sentence-based speech recognition in normal-hearing listeners Binns and Culling [(2007). J. Acoust. Soc. Am. 122, 1765-1776]. This paper examines whether this finding also applies to cochlear implant (CI) recipients. The subjects' F0-discrimination and speech perception in the presence of noise were measured, using sentences with regular and inverted F0-contours. The results revealed that speech recognition for regular contours was significantly better than for inverted contours. This difference was related to the subjects' F0-discrimination providing further evidence that the perception of intonation patterns is important for the CI-mediated speech recognition in noise. 相似文献
3.
Four experiments investigated the effect of the fundamental frequency (F0) contour on speech intelligibility against interfering sounds. Speech reception thresholds (SRTs) were measured for sentences with different manipulations of their F0 contours. These manipulations involved either reductions in F0 variation, or complete inversion of the F0 contour. Against speech-shaped noise, a flattened F0 contour had no significant impact on SRTs compared to a normal F0 contour; the mean SRT for the flattened contour was only 0.4 dB higher. The mean SRT for the inverted contour, however, was 1.3 dB higher than for the normal F0 contour. When the sentences were played against a single-talker interferer, the overall effect was greater, with a 2.0 dB difference between normal and flattened conditions, and 3.8 dB between normal and inverted. There was no effect of altering the F0 contour of the interferer, indicating that any abnormality of the F0 contour serves to reduce intelligibility of the target speech, but does not alter the masking produced by interfering speech. Low-pass filtering the F0 contour increased SRTs; elimination of frequencies between 2 and 4 Hz had the greatest effect. Filtering sentences with inverted contours did not have a significant effect on SRTs. 相似文献
4.
5.
6.
7.
A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test. 相似文献
8.
This study examined proportional frequency compression as a strategy for improving speech recognition in listeners with high-frequency sensorineural hearing loss. This method of frequency compression preserved the ratios between the frequencies of the components of natural speech, as well as the temporal envelope of the unprocessed speech stimuli. Nonsense syllables spoken by a female and a male talker were used as the speech materials. Both frequency-compressed speech and the control condition of unprocessed speech were presented with high-pass amplification. For the materials spoken by the female talker, significant increases in speech recognition were observed in slightly less than one-half of the listeners with hearing impairment. For the male-talker materials, one-fifth of the hearing-impaired listeners showed significant recognition improvements. The increases in speech recognition due solely to frequency compression were generally smaller than those solely due to high-pass amplification. The results indicate that while high-pass amplification is still the most effective approach for improving speech recognition of listeners with high-frequency hearing loss, proportional frequency compression can offer significant improvements in addition to those provided by amplification for some patients. 相似文献
9.
d'Alessandro C Rilliard A Le Beux S 《The Journal of the Acoustical Society of America》2011,129(3):1594-1604
Intonation stylization is studied using "chironomy," i.e., the analogy between hand gestures and prosodic movements. An intonation mimicking paradigm is used. The task of the ten subjects is to copy the intonation pattern of sentences with the help of a stylus on a graphic tablet, using a system for real-time manual intonation modification. Gestural imitation is compared to vocal imitation of the same sentences (seven for a male speaker, seven for a female speaker). Distance measures between gestural copies, vocal imitations, and original sentences are computed for performance assessment. Perceptual testing is also used for assessing the quality of gestural copies. The perceptual difference between natural and stylized contours is measured using a mean opinion score paradigm for 15 subjects. The results indicate that intonation contours can be stylized with accuracy by chironomic imitation. The results of vocal imitation and chironomic imitation are comparable, but subjects show better imitation results in vocal imitation. The best stylized contours using chironomy seems perceptually indistinguishable or almost indistinguishable from natural contours, particularly for female speech. This indicates that chironomic stylization is effective, and that hand movements can be analogous to intonation movements. 相似文献
10.
An experiment was performed in which a noise containing frequencies from 10 Hz to 47 Hz was used to mask speech. The behaviour of speech intelligibility with speech presentation level and masking noise level was examined briefly.The infrasonic and low frequency masking noise did reduce the intelligibility of speech. The effect only became significant when the masking noise level was present at levels of 115 dB OASPL or above. 相似文献
11.
12.
We describe an arrangement for simultaneous recording of speech and vocal tract geometry in patients undergoing surgery involving this area. Experimental design is considered from an articulatory phonetic point of view. The speech signals are recorded with an acoustic-electrical arrangement. The vocal tract is simultaneously imaged with MRI. A MATLAB-based system controls the timing of speech recording and MR image acquisition. The speech signals are cleaned from acoustic MRI noise by an adaptive signal processing algorithm. Finally, a vowel data set from pilot experiments is qualitatively compared both with validation data from the anechoic chamber and with Helmholtz resonances of the vocal tract volume, obtained using FEM. 相似文献
13.
In tone languages there are potential conflicts in the perception of lexical tone and intonation, as both depend mainly on the differences in fundamental frequency (F0) patterns. The present study investigated the acoustic cues associated with the perception of sentences as questions or statements in Cantonese, as a function of the lexical tone in sentence final position. Cantonese listeners performed intonation identification tasks involving complete sentences, isolated final syllables, and sentences without the final syllable (carriers). Sensitivity (d' scores) were similar for complete sentences and final syllables but were significantly lower for carriers. Sensitivity was also affected by tone identity. These findings show that the perception of questions and statements relies primarily on the F0 characteristics of the final syllables (local F0 cues). A measure of response bias (c) provided evidence for a general bias toward the perception of statements. Logistic regression analyses showed that utterances were accurately classified as questions or statements by using average F0 and F0 interval. Average F0 of carriers (global F0 cue) was also found to be a reliable secondary cue. These findings suggest that the use of F0 cues for the perception of intonation question in tonal languages is likely to be language-specific. 相似文献
14.
It is known that information contained within the filter skirts can provide cues important to speech intelligibility. However, the role of filter slope during temporal smoothing has received little attention. In experiment 1, smoothing filter slope angle was found to have a large effect on the intelligibility of sentences represented by three amplitude-modulated sinusoids. In experiment 2, the use of temporal cues above 16 Hz was examined across various regions of the spectrum. When increases in rate were presented to individual spectral bands, intelligibility only increased when presented in the higher spectral region. This result suggests a greater reliance on higher-rate cues in this region. However, intelligibility was greatest when these cues were distributed across the spectrum, indicating that their effective use is not restricted solely to this region. 相似文献
15.
Linguistic modality effects on fundamental frequency in speech 总被引:2,自引:0,他引:2
This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information. 相似文献
16.
I.IntroductionTheF,patternsofspeechareimportantnotonlyforthcprosodicfeaturesbuta1soforvoicesourcecharactcristics.Nowmoreandmorespeechscientistsrecognizedthatvoiceexcitationsourceintcxt-to-spccchsystemsp1aysanimportantro1elnbothintclligibilityandnaturalnessorsynthcticspcech.Espccially,forChinese,atone1anguagewithmulti-tonesystem,thetonalpatternswhicharcmainlydcmonstratedintheF,con-tourscarry1exicalmeaning.SomecomparativestudiesoftheF,pattcrnsinbetweentonelanguage(Chinese)andstress1anguage(En… 相似文献
17.
Cochlear implants allow most patients with profound deafness to successfully communicate under optimal listening conditions. However, the amplitude modulation (AM) information provided by most implants is not sufficient for speech recognition in realistic settings where noise is typically present. This study added slowly varying frequency modulation (FM) to the existing algorithm of an implant simulation and used competing sentences to evaluate FM contributions to speech recognition in noise. Potential FM advantage was evaluated as a function of the number of spectral bands, FM depth, FM rate, and FM band distribution. Barring floor and ceiling effects, significant improvement was observed for all bands from 1 to 32 with the additional FM cue both in quiet and noise. Performance also improved with greater FM depth and rate, which might reflect resolved sidebands under the FM condition. Having FM present in low-frequency bands was more beneficial than in high-frequency bands, and only half of the bands required the presence of FM, regardless of position, to achieve performance similar to when all bands had the FM cue. These results provide insight into the relative contributions of AM and FM to speech communication and the potential advantage of incorporating FM for cochlear implant signal processing. 相似文献
18.
Yuan J 《The Journal of the Acoustical Society of America》2011,130(6):4063-4069
There is a tendency across languages to use a rising pitch contour to convey question intonation and a falling pitch contour to convey a statement. In a lexical tone language such as Mandarin Chinese, rising and falling pitch contours are also used to differentiate lexical meaning. How, then, does the multiplexing of the F(0) channel affect the perception of question and statement intonation in a lexical tone language? This study investigated the effects of lexical tones and focus on the perception of intonation in Mandarin Chinese. The results show that lexical tones and focus impact the perception of sentence intonation. Question intonation was easier for native speakers to identify on a sentence with a final falling tone and more difficult to identify on a sentence with a final rising tone, suggesting that tone identification intervenes in the mapping of F(0) contours to intonational categories and that tone and intonation interact at the phonological level. In contrast, there is no evidence that the interaction between focus and intonation goes beyond the psychoacoustic level. The results provide insights that will be useful for further research on tone and intonation interactions in both acoustic modeling studies and neurobiological studies. 相似文献
19.
In this study we have simultaneously measured subglottic air pressure, airflow, and vocal intensity during speech in nine healthy subjects. Subglottic air pressure was measured directly by puncture of the cricothyroid membrane. The results show that the interaction between these aerodynamic properties is much more complex that previously believed. Certain trends were seen in most individuals, such as an increase in vocal intensity with increased subglottic air pressure. However, there was considerable variability in the overall aerodynamic properties between subjects and at different frequency and intensity ranges. At certain frequencies several subjects were able to generate significantly louder voices without a comparable increase in subglottic air pressure. We hypothesize that these increases in vocal efficiency are due to changes in vocal fold vibration properties. The relationship between fundamental frequency and subglottic pressure was also noted to vary depending on vocal intensity. Possible mechanisms for these behaviors are discussed. 相似文献
20.
In human speech, declination of the fundamental frequency (F0) of the voice spans coherent units of an utterance and, therefore, signals where units begin and end. A rapid final fall at the end of an utterance provides a further indication of an utterance's ending. The occurrence of declination is sufficiently widespread across languages that several investigators have suggested it as a language universal. Language universals may be universal because they are part of a species-specific specialization for language or, alternatively, they may constitute conventionalizations of natural dispositions of the vocal tract that may serve a communicative function. Evidence is offered favoring the latter account for declination and the final fall by showing that vocal productions of vervet monkeys (Cercopithecus aethiops) and rhesus macaques (Macaca mulatta) show declination, and vervets show clear evidence of a final fall. Interestingly, the fall in F0 may serve some communicative role in the vocal exchanges of vervets and rhesus, analogous to its signalling function in human language. 相似文献