共查询到20条相似文献,搜索用时 0 毫秒
1.
2.
J P Gupta S S Agrawal R Ahmed 《The Journal of the Acoustical Society of America》1971,49(2):Suppl 2:567-Suppl 2:568
3.
In stuttered repetitions of a syllable, the vowel that occurs often sounds like schwa even when schwa is not intended. In this article, acoustic analyses are reported which show that the spectral properties of stuttered vowels are similar to the following fluent vowel, so it would appear that the stutterers are articulating the vowel appropriately. Though spectral properties of the stuttered vowels are normal, others are unusual: The stuttered vowels are low in amplitude and short in duration. In two experiments, the effects of amplitude and duration on perception of these vowels are examined. It is shown that, if the amplitude of stuttered vowels is made normal and their duration is lengthened, they sound more like the intended vowels. These experiments lead to the conclusion that low amplitude and short duration are the factors that cause stuttered vowels to sound like schwa. This differs from the view of certain clinicians and theorists who contend that stutterers actually articulate /schwa/'s when these are heard in stuttered speech. Implications for stuttering therapy are considered. 相似文献
4.
Modeling the perception of concurrent vowels: vowels with different fundamental frequencies 总被引:5,自引:0,他引:5
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources. 相似文献
5.
MV Kondaurova TR Bergeson LC Dilley 《The Journal of the Acoustical Society of America》2012,132(2):1039-1049
Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee. 相似文献
6.
H Kuwabara 《The Journal of the Acoustical Society of America》1985,77(2):686-694
A method is proposed to reduce the ambiguity of vowels in connected speech by normalizing the coarticulation effects. The method is applied to vowels in phonetic environments where great ambiguity would be likely to occur, taking as their features the first and second formant trajectories. The separability between vowel clusters is found to be greatly improved for the vowel samples. In addition, distribution of the vowels on a feature plane characterized by this method seems to reflect their perceptual nature when presented to listeners without isolation from their phonetic environments. The results suggest that the method proposed here is useful for automatic speech recognition and help infer some possible mechanisms underlying dynamic aspects of human speech recognition. 相似文献
7.
The vowel in part-word repetitions in stuttered speech often sounds neutralized. In the present article, measurements of the excitatory source made during such episodes of dysfluency are reported. These measurements show that, compared with fluent utterances, the glottal volume velocities are lower in amplitude and shorter in duration and that the energy occurs more towards the low-frequency end of the spectrum. In a first perceptual experiment, the effects of varying the amplitude and duration of the glottal source were assessed. The glottal volume velocity recordings of the /ae/ vowels used in the analyses were employed as driving sources for an articulatory synthesizer so that judgments about the vowel quality could be made. With dysfluent glottal sources (either as spoken or by editing a fluent source so that it was low in amplitude and brief), the vowels sounded more neutralized than with fluent glottal sources (as spoken or by editing a dysfluent source to increase its amplitude and lengthen it). In a second perceptual experiment, synthetic glottal volume velocities were used to verify these findings and to assess the influence of the low-frequency emphasis in the dysfluent speech. This experiment showed that spectral bias and duration both cause stuttered vowels to sound neutralized. 相似文献
8.
The syllable repetitions of 24 child and eight teenage stutterers were investigated to assess whether the vowels neutralize and, if so, what causes this. In both groups of speakers, the vowel in CV syllable repetitions and the following fluent vowel were excised from conversational speech samples. Acoustic analyses showed the formant frequencies of vowels in syllable repetitions to be appropriate for the intended vowel and the duration of the dysfluent vowels to be shorter than those of the fluent vowels for both groups of speakers. The intensity of the fluent vowels was greater than that of the dysfluent vowels for the teenagers but not the children: For both age groups, excitation waveforms obtained by inverse filtering showed that the excitation spectra associated with dysfluent vowels fell off more rapidly with frequency than did those associated with the fluent vowels. The fundamental frequency of the children's dysfluent speech was higher than their fluent speech while there was no difference in the teenager's speech. The relationship between the intensities of the glottal volume velocities was the same as that of the speech waveforms. Perceptual tests were also conducted to assess whether duration and the differences found in the source excitation would make children's vowels sound neutral. The experiments show that in children neither vowel duration nor fundamental frequency differences cause the vowels to be perceived as neutral. The results suggest that the low intensity and characteristics of the source of excitation which cause vowels to sound neutral may only occur in late childhood. Furthermore, monitoring stuttered speech for the emergence of neutral vowels may be a way of indexing the progress of the disorder. 相似文献
9.
10.
11.
Ferreira AJ 《The Journal of the Acoustical Society of America》2007,122(4):2389-2404
This paper addresses the problem of automatic identification of vowels uttered in isolation by female and child speakers. In this case, the magnitude spectrum of voiced vowels is sparsely sampled since only frequencies at integer multiples of F0 are significant. This impacts negatively on the performance of vowel identification techniques that either ignore pitch or rely on global shape models. A new pitch-dependent approach to vowel identification is proposed that emerges from the concept of timbre and that defines perceptual spectral clusters (PSC) of harmonic partials. A representative set of static PSC-related features are estimated and their performance is evaluated in automatic classification tests using the Mahalanobis distance. Linear prediction features and Mel-frequency cepstral coefficients (MFCC) coefficients are used as a reference and a database of five (Portuguese) natural vowel sounds uttered by 44 speakers (including 27 child speakers) is used for training and testing the Gaussian models. Results indicate that perceptual spectral cluster (PSC) features perform better than plain linear prediction features, but perform slightly worse than MFCC features. However, PSC features have the potential to take full advantage of the pitch structure of voiced vowels, namely in the analysis of concurrent voices, or by using pitch as a normalization parameter. 相似文献
12.
Human listeners are better able to identify two simultaneous vowels if the fundamental frequencies of the vowels are different. A computational model is presented which, for the first time, is able to simulate this phenomenon at least qualitatively. The first stage of the model is based upon a bank of bandpass filters and inner hair-cell simulators that simulate approximately the most relevant characteristics of the human auditory periphery. The output of each filter/hair-cell channel is then autocorrelated to extract pitch and timbre information. The pooled autocorrelation function (ACF) based on all channels is used to derive a pitch estimate for one of the component vowels from a signal composed of two vowels. Individual channel ACFs showing a pitch peak at this value are combined and used to identify the first vowel using a template matching procedure. The ACFs in the remaining channels are then combined and used to identify the second vowel. Model recognition performance shows a rapid improvement in correct vowel identification as the difference between the fundamental frequencies of two simultaneous vowels increases from zero to one semitone in a manner closely resembling human performance. As this difference increases up to four semitones, performance improves further only slowly, if at all. 相似文献
13.
14.
15.
Acoustic recognition of voice disorders: a comparative study of running speech versus sustained vowels 总被引:4,自引:0,他引:4
F Klingholtz 《The Journal of the Acoustical Society of America》1990,87(5):2218-2224
The signals of running speech and sustained vowels of normals and subjects suffering from dysphonia were analyzed statistically with respect to the signal-to-noise ratio (SNR). The distribution of the SNR measured in multiple overlapping frames in the speech signal was described by a linear combination of the distribution frequencies for SNR = 0 dB, 0 dB less than SNR less than 15 dB, and SNR greater than or equal to 15 dB. The values of the linear combination, the SNR of the vowels, and clinical assignment of the voices to normal and pathologic populations based on laryngoscopic and stroboscopic investigation parameters were used to compare the different evaluations of the voices. The SNR distribution in speech remained stable over signal lengths of more than 30 s. The correlation coefficient between the SNR measure for running speech and the SNR of sustained vowels amounted to only 0.63. The error rate in the discrimination between normal and dysphonic voices amounted to 22.6% in application to sustained vowels and 5.6% when the SNR distribution was used. Possible reasons for the observed discrepancies are discussed, and the results are compared to those of other studies. 相似文献
16.
17.
Reticent speakers differ from nonreticent speakers in vocal characteristics, such as fundamental frequency, frequency range, fluency, and intensity, which prompt negative impressions on the part of listeners. Waveform and spectrographic analyses were performed on the vocal cues of 19 reticent and nonreticent subjects (57 speech samples). Statistically significant differences were found in fluency between reticent and nonreticent speech. Reticent male speakers also showed significantly higher F0, whereas reticent female speakers demonstrated narrower frequency range. Identification and analysis of these characteristics are required for effective remediation. 相似文献
18.
语音识别中多种特征信息综合利用的方法 总被引:2,自引:1,他引:2
在基于特征的语音识别研究中,往往会发现其中有些特征的识别性能对一些音比另一些音更好,而另一些特征却与此相反。它们在一些音的识别特性上存在着一定程度的互补。本文基于目前话音识别研究主要方法之一的HMMM识别方法,提出了三种有效综合利用这种互补关系提高HMM识别性能的方法。作者*分别称它们为顶尖参数法,全部参数法和最可靠参数法。这三种方法在多发音人汉语数字的DHMM/VQ语音识别中,分别将识别率由89%提高到了92.3%、95.7%、94.3%。本文将详细介绍这三种方法,及其在多发育人汉语数字的DHMM/VQ语音识别中试验结果极及其分析。 相似文献
19.
提出了一种基于独立分量分析(ICA)的语音信号鲁棒特征提取算法,用以解决在卷积噪声环境下语音信号的训练与识别特征不匹配的问题。该算法通过短时傅里叶变换将带噪语音信号从时域转换到频域后,采用复值ICA方法从带噪语音的短时谱中分离出语音信号的短时谱,然后根据所得到的语音信号短时谱计算美尔倒谱系数(MFCC)及其一阶差分作为特征参数。在仿真与真实环境下汉语数字语音识别实验中,所提算法相比较传统的MFCC其识别正确率分别提升了34.8%和32.6%。实验结果表明基于ICA方法的语音特征在卷积噪声环境下具有良好的鲁棒性。 相似文献
20.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds. 相似文献