首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Four experiments investigated the effect of the fundamental frequency (F0) contour on speech intelligibility against interfering sounds. Speech reception thresholds (SRTs) were measured for sentences with different manipulations of their F0 contours. These manipulations involved either reductions in F0 variation, or complete inversion of the F0 contour. Against speech-shaped noise, a flattened F0 contour had no significant impact on SRTs compared to a normal F0 contour; the mean SRT for the flattened contour was only 0.4 dB higher. The mean SRT for the inverted contour, however, was 1.3 dB higher than for the normal F0 contour. When the sentences were played against a single-talker interferer, the overall effect was greater, with a 2.0 dB difference between normal and flattened conditions, and 3.8 dB between normal and inverted. There was no effect of altering the F0 contour of the interferer, indicating that any abnormality of the F0 contour serves to reduce intelligibility of the target speech, but does not alter the masking produced by interfering speech. Low-pass filtering the F0 contour increased SRTs; elimination of frequencies between 2 and 4 Hz had the greatest effect. Filtering sentences with inverted contours did not have a significant effect on SRTs.  相似文献   

2.
Linguistic modality effects on fundamental frequency in speech   总被引:2,自引:0,他引:2  
This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information.  相似文献   

3.
A series of experiments was carried out to investigate how fundamental frequency declination is perceived by speakers of English. Using linear predictor coded speech, nonsense sentences were constructed in which fundamental frequency on the last stressed syllable had been systematically varied. Listeners were asked to judge which stressed syllable was higher in pitch. Their judgments were found to reflect normalization for expected declination; in general, when two stressed syllables sounded equal in pitch, the second was actually lower. The pattern of normalization reflected certain major features of production patterns: A greater correction for declination was made for wide pitch range stimuli than for narrow pitch range stimuli. The slope of expected declination was less for longer stimuli than for shorter ones. Lastly, amplitude was found to have a significant effect on judgments, suggesting that the amplitude downdrift which normally accompanies fundamental frequency declination may have an important role in the perception of phrasing.  相似文献   

4.
Several experiments have found that changing the intrinsic f0 of a vowel can have an effect on perceived vowel quality. It has been suggested that these shifts may occur because f0 is involved in the specification of vowel quality in the same way as the formant frequencies. Another possibility is that f0 affects vowel quality indirectly, by changing a listener's assumptions about characteristics of a speaker who is likely to have uttered the vowel. In the experiment outlined here, participants were asked to listen to vowels differing in terms of f0 and their formant frequencies and report vowel quality and the apparent speaker's gender and size on a trial-by-trial basis. The results presented here suggest that f0 affects vowel quality mainly indirectly via its effects on the apparent-speaker characteristics; however, f0 may also have some residual direct effects on vowel quality. Furthermore, the formant frequencies were also found to have significant indirect effects on vowel quality by way of their strong influence on the apparent speaker.  相似文献   

5.
This study compared how normal-hearing listeners (NH) and listeners with moderate to moderately severe cochlear hearing loss (HI) use and combine information within and across frequency regions in the perceptual separation of competing vowels with fundamental frequency differences (deltaF0) ranging from 0 to 9 semitones. Following the procedure of Culling and Darwin [J. Acoust. Soc. Am. 93, 3454-3467 (1993)], eight NH listeners and eight HI listeners identified competing vowels with either a consistent or inconsistent harmonic structure. Vowels were amplified to assure audibility for HI listeners. The contribution of frequency region depended on the value of deltaF0 between the competing vowels. When deltaF0 was small, both groups of listeners effectively utilized deltaF0 cues in the low-frequency region. In contrast, HI listeners derived significantly less benefit than NH listeners from deltaF0 cues conveyed by the high-frequency region at small deltaF0's. At larger deltaF0's, both groups combined deltaF0 cues from the low and high formant-frequency regions. Cochlear impairment appears to negatively impact the ability to use F0 cues for within-formant grouping in the high-frequency region. However, cochlear loss does not appear to disrupt the ability to use within-formant F0 cues in the low-frequency region or to group F0 cues across formant regions.  相似文献   

6.
YIN,a fundamental frequency estimator for speech and music   总被引:22,自引:0,他引:22  
An algorithm is presented for the estimation of the fundamental frequency (F0) of speech or musical sounds. It is based on the well-known autocorrelation method with a number of modifications that combine to prevent errors. The algorithm has several desirable features. Error rates are about three times lower than the best competing methods, as evaluated over a database of speech recorded together with a laryngograph signal. There is no upper limit on the frequency search range, so the algorithm is suited for high-pitched voices and music. The algorithm is relatively simple and may be implemented efficiently and with low latency, and it involves few parameters that must be tuned. It is based on a signal model (periodic signal) that may be extended in several ways to handle various forms of aperiodicity that occur in particular applications. Finally, interesting parallels may be drawn with models of auditory processing.  相似文献   

7.
This work proposes a method to reconstruct an acoustic speech signal solely from a stream of mel-frequency cepstral coefficients (MFCCs) as may be encountered in a distributed speech recognition (DSR) system. Previous methods for speech reconstruction have required, in addition to the MFCC vectors, fundamental frequency and voicing components. In this work the voicing classification and fundamental frequency are predicted from the MFCC vectors themselves using two maximum a posteriori (MAP) methods. The first method enables fundamental frequency prediction by modeling the joint density of MFCCs and fundamental frequency using a single Gaussian mixture model (GMM). The second scheme uses a set of hidden Markov models (HMMs) to link together a set of state-dependent GMMs, which enables a more localized modeling of the joint density of MFCCs and fundamental frequency. Experimental results on speaker-independent male and female speech show that accurate voicing classification and fundamental frequency prediction is attained when compared to hand-corrected reference fundamental frequency measurements. The use of the predicted fundamental frequency and voicing for speech reconstruction is shown to give very similar speech quality to that obtained using the reference fundamental frequency and voicing.  相似文献   

8.
A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.  相似文献   

9.
The use of across-frequency timing cues and the effect of disrupting these cues were examined across the frequency spectrum by introducing between-band asynchronies to pairs of narrowband temporal speech patterns. Sentence intelligibility by normal-hearing listeners fell when as little as 12.5 ms of asynchrony was introduced and was reduced to floor values by 100 ms. Disruptions to across-frequency timing had similar effects in the low-, mid-, and high-frequency regions, but band pairs having wider frequency separation were less disrupted by small amounts of asynchrony. In experiment 2, it was found that the disruptive influence of asynchrony on adjacent band pairs did not result from disruptions to the complex patterns present in overlapping excitation. The results of experiment 3 suggest that the processing of speech patterns may take place using mechanisms having different sensitivities to exact timing, similar to the dual mechanisms proposed for within- and across-channel gap detection. Preservation of relative timing can be critical to intelligibility. While the use of across-frequency timing cues appears similar across the spectrum, it may differ based on frequency separation. This difference appears to involve a greater reliance on exact timing during the processing of speech energy at proximate frequencies.  相似文献   

10.
言语知觉是心理学的一个领域,其发展与语音学、语音工程和人工智能等许多学科有关.本文简要介绍了在认知心理学和其它相关学科的推动下,言语知觉发展的主线、当前的主要问题和研究现状。  相似文献   

11.
A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.  相似文献   

12.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds.  相似文献   

13.
This paper presents a systematic comparison of various measures of f0 range in female speakers of English and German. F0 range was analyzed along two dimensions, level (i.e., overall f0 height) and span (extent of f0 modulation within a given speech sample). These were examined using two types of measures, one based on "long-term distributional" (LTD) methods, and the other based on specific landmarks in speech that are linguistic in nature ("linguistic" measures). The various methods were used to identify whether and on what basis or bases speakers of these two languages differ in f0 range. Findings yielded significant cross-language differences in both dimensions of f0 range, but effect sizes were found to be larger for span than for level, and for linguistic than for LTD measures. The linguistic measures also uncovered some differences between the two languages in how f0 range varies through an intonation contour. This helps shed light on the relation between intonational structure and f0 range.  相似文献   

14.
15.
As the human ear is dull to the phase in speech, little attention has been paid to phase information in speech coding. In fact, the speech perceptual quality may be degenerated if the phase distortion is very large. The perceptual effect of the STFT (Short time Fourier transform) phase spectrum is studied by auditory subjective hearing tests. Three main conclusions are (1) If the phase information is neglected completely, the subjective quality of the reconstructed speech may be very poor; (2) Whether the neglected phase is in low frequency band or high frequency band, the difference from the original speech can be perceived by ear;(3) It is very difficult for the human ear to perceive the difference of speech quality between original speech and reconstructed speech while the phase quantization step size is shorter than π/7.  相似文献   

16.
Frequency discrimination of spectral envelopes of complex stimuli, frequency selectivity measured with psychophysical tuning curves, and speech perception were determined in hearing-impaired subjects each having a relatively flat, sensory-neural loss. Both the frequency discrimination and speech perception measures were obtained in quiet and noise. Most of these subjects showed abnormal susceptibility to ambient noise with regard to speech perception. Frequency discrimination in quiet and frequency selectivity did not correlate significantly. At low signal-to-noise ratios, frequency discrimination correlated significantly with frequency selectivity. Speech perception in noise correlated significantly with frequency selectivity and with frequency discrimination at low signal-to-noise ratios. The frequency discrimination data are discussed in terms of an excitation-pattern model. However, they neither support nor refute the model.  相似文献   

17.
In this study we have simultaneously measured subglottic air pressure, airflow, and vocal intensity during speech in nine healthy subjects. Subglottic air pressure was measured directly by puncture of the cricothyroid membrane. The results show that the interaction between these aerodynamic properties is much more complex that previously believed. Certain trends were seen in most individuals, such as an increase in vocal intensity with increased subglottic air pressure. However, there was considerable variability in the overall aerodynamic properties between subjects and at different frequency and intensity ranges. At certain frequencies several subjects were able to generate significantly louder voices without a comparable increase in subglottic air pressure. We hypothesize that these increases in vocal efficiency are due to changes in vocal fold vibration properties. The relationship between fundamental frequency and subglottic pressure was also noted to vary depending on vocal intensity. Possible mechanisms for these behaviors are discussed.  相似文献   

18.
Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing.  相似文献   

19.
Four experiments were performed to evaluate a new wearable vibrotactile speech perception aid that extracts fundamental frequency (F0) and displays the extracted F0 as a single-channel temporal or an eight-channel spatio-temporal stimulus. Specifically, we investigated the perception of intonation (i.e., question versus statement) and emphatic stress (i.e., stress on the first, second, or third word) under Visual-Alone (VA), Visual-Tactile (VT), and Tactile-Alone (TA) conditions and compared performance using the temporal and spatio-temporal vibrotactile display. Subjects were adults with normal hearing in experiments I-III and adults with severe to profound hearing impairments in experiment IV. Both versions of the vibrotactile speech perception aid successfully conveyed intonation. Vibrotactile stress information was successfully conveyed, but vibrotactile stress information did not enhance performance in VT conditions beyond performance in VA conditions. In experiment III, which involved only intonation identification, a reliable advantage for the spatio-temporal display was obtained. Differences between subject groups were obtained for intonation identification, with more accurate VT performance by those with normal hearing. Possible effects of long-term hearing status are discussed.  相似文献   

20.
The dependency of the timbre of musical sounds on their fundamental frequency (F0) was examined in three experiments. In experiment I subjects compared the timbres of stimuli produced by a set of 12 musical instruments with equal F0, duration, and loudness. There were three sessions, each at a different F0. In experiment II the same stimuli were rearranged in pairs, each with the same difference in F0, and subjects had to ignore the constant difference in pitch. In experiment III, instruments were paired both with and without an F0 difference within the same session, and subjects had to ignore the variable differences in pitch. Experiment I yielded dissimilarity matrices that were similar at different F0's, suggesting that instruments kept their relative positions within timbre space. Experiment II found that subjects were able to ignore the salient pitch difference while rating timbre dissimilarity. Dissimilarity matrices were symmetrical, suggesting further that the absolute displacement of the set of instruments within timbre space was small. Experiment III extended this result to the case where the pitch difference varied from trial to trial. Multidimensional scaling (MDS) of dissimilarity scores produced solutions (timbre spaces) that varied little across conditions and experiments. MDS solutions were used to test the validity of signal-based predictors of timbre, and in particular their stability as a function of F0. Taken together, the results suggest that timbre differences are perceived independently from differences of pitch, at least for F0 differences smaller than an octave. Timbre differences can be measured between stimuli with different F0's.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号