首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
In stuttered repetitions of a syllable, the vowel that occurs often sounds like schwa even when schwa is not intended. In this article, acoustic analyses are reported which show that the spectral properties of stuttered vowels are similar to the following fluent vowel, so it would appear that the stutterers are articulating the vowel appropriately. Though spectral properties of the stuttered vowels are normal, others are unusual: The stuttered vowels are low in amplitude and short in duration. In two experiments, the effects of amplitude and duration on perception of these vowels are examined. It is shown that, if the amplitude of stuttered vowels is made normal and their duration is lengthened, they sound more like the intended vowels. These experiments lead to the conclusion that low amplitude and short duration are the factors that cause stuttered vowels to sound like schwa. This differs from the view of certain clinicians and theorists who contend that stutterers actually articulate /schwa/'s when these are heard in stuttered speech. Implications for stuttering therapy are considered.  相似文献   

2.
The vowel in part-word repetitions in stuttered speech often sounds neutralized. In the present article, measurements of the excitatory source made during such episodes of dysfluency are reported. These measurements show that, compared with fluent utterances, the glottal volume velocities are lower in amplitude and shorter in duration and that the energy occurs more towards the low-frequency end of the spectrum. In a first perceptual experiment, the effects of varying the amplitude and duration of the glottal source were assessed. The glottal volume velocity recordings of the /ae/ vowels used in the analyses were employed as driving sources for an articulatory synthesizer so that judgments about the vowel quality could be made. With dysfluent glottal sources (either as spoken or by editing a fluent source so that it was low in amplitude and brief), the vowels sounded more neutralized than with fluent glottal sources (as spoken or by editing a dysfluent source to increase its amplitude and lengthen it). In a second perceptual experiment, synthetic glottal volume velocities were used to verify these findings and to assess the influence of the low-frequency emphasis in the dysfluent speech. This experiment showed that spectral bias and duration both cause stuttered vowels to sound neutralized.  相似文献   

3.
Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions.  相似文献   

4.
Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering.  相似文献   

5.
The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information.  相似文献   

6.
语音中元音和辅音的听觉感知研究   总被引:1,自引:0,他引:1       下载免费PDF全文
本文对语音中元音和辅音的听觉感知研究进行综述。80多年前基于无意义音节的权威实验结果表明辅音对人的听感知更为重要,由于实验者在学术上的成就和权威性,这一结论成为了常识,直到近20年前基于自然语句的实验挑战了这个结论并引发了新一轮的研究。本文主要围绕元音和辅音对语音感知的相对重要性、元音和辅音的稳态信息和边界动态信息对语音感知的影响以及相关研究的潜在应用等进行较为系统的介绍,最后给出了总结与展望。  相似文献   

7.
The perceptual mechanisms of assimilation and contrast in the phonetic perception of vowels were investigated. In experiment 1, 14 stimulus continua were generated using an /i/-/e/-/a/ vowel continuum. They ranged from a continuum with both ends belonging to the same phonemic category in Japanese, to a continuum with both ends belonging to different phonemic categories. The AXB method was employed and the temporal position of X was changed under three conditions. In each condition ten subjects were required to judge whether X was similar to A or to B. The results demonstrated that assimilation to the temporally closer sound occurs if the phonemic categories of A and B are the same and that contrast to the temporally closer sound occurs if A and B belong to different phonemic categories. It was observed that the transition from assimilation to contrast is continuous except in the /i'/-X-/e/ condition. In experiment 2, the total duration of t 1 (between A and X) and t 2 (between X and B) was changed under five conditions. One stimulus continuum consisted of the same phonemic category in Japanese and the other consisted of different phonemic categories. Six subjects were required to make similarity judgements of X. The results demonstrated that the occurrence of assimilation and contrast to the temporally closer sound seemed to be constant under each of the five conditions. The present findings suggest that assimilation and contrast are determined by three factors: the temporal position of the three stimuli, the acoustic distance between the three stimuli on the stimulus continuum, and the phonemic categories of the three stimuli.  相似文献   

8.
Japanese 5- to 13-yr-olds who used cochlear implants (CIs) and a comparison group of normally hearing (NH) Japanese children were tested on their perception and production of speech prosody. For the perception task, they were required to judge whether semantically neutral utterances that were normalized for amplitude were spoken in a happy, sad, or angry manner. The performance of NH children was error-free. By contrast, child CI users performed well below ceiling but above chance levels on happy- and sad-sounding utterances but not on angry-sounding utterances. For the production task, children were required to imitate stereotyped Japanese utterances expressing disappointment and surprise as well as culturally typically representations of crow and cat sounds. NH 5- and 6-year-olds produced significantly poorer imitations than older hearing children, but age was unrelated to the imitation quality of child CI users. Overall, child CI user's imitations were significantly poorer than those of NH children, but they did not differ significantly from the imitations of the youngest NH group. Moreover, there was a robust correlation between the performance of child CI users on the perception and production tasks; this implies that difficulties with prosodic perception underlie their difficulties with prosodic imitation.  相似文献   

9.
The signals of running speech and sustained vowels of normals and subjects suffering from dysphonia were analyzed statistically with respect to the signal-to-noise ratio (SNR). The distribution of the SNR measured in multiple overlapping frames in the speech signal was described by a linear combination of the distribution frequencies for SNR = 0 dB, 0 dB less than SNR less than 15 dB, and SNR greater than or equal to 15 dB. The values of the linear combination, the SNR of the vowels, and clinical assignment of the voices to normal and pathologic populations based on laryngoscopic and stroboscopic investigation parameters were used to compare the different evaluations of the voices. The SNR distribution in speech remained stable over signal lengths of more than 30 s. The correlation coefficient between the SNR measure for running speech and the SNR of sustained vowels amounted to only 0.63. The error rate in the discrimination between normal and dysphonic voices amounted to 22.6% in application to sustained vowels and 5.6% when the SNR distribution was used. Possible reasons for the observed discrepancies are discussed, and the results are compared to those of other studies.  相似文献   

10.
11.
The extent to which it is necessary to model the dynamic behavior of vowel formants to enable vowel separation has been the subject of debate in recent years. To investigate this issue, a study has been made on the vowels of 132 Australian English speakers (male and female). The degree of vowel separation from the formant values at the target was contrasted to that from modeling the formant contour with discrete cosine transform coefficients. The findings are that, although it is necessary to model the formant contour to separate out the diphthongs, the formant values at the target, plus vowel duration are sufficient to separate out the monophthongs. However, further analysis revealed that there are formant contour differences which benefit the within-class separation of the tense/lax monophthong pairs.  相似文献   

12.
Most investigators agree that the acoustic information for American English vowels includes dynamic (time-varying) parameters as well as static "target" information contained in a single cross section of the syllable. Using the silent-center (SC) paradigm, the present experiment examined the case in which the initial and final portions of stop consonant-vowel-stop consonant (CVC) syllables containing the same vowel but different consonants were recombined into mixed-consonant SC syllables and presented to listeners for vowel identification. Ten vowels were spoken in six different syllables, /b Vb, bVd, bVt, dVb, dVd, dVt/, embedded in a carrier sentence. Initial and final transitional portions of these syllables were cross-matched in: (1) silent-center syllables with original syllable durations (silences) preserved (mixed-consonant SC condition) and (2) mixed-consonant SC syllables with syllable duration equated across the ten vowels (fixed duration mixed-consonant SC condition). Vowel-identification accuracy in these two mixed consonant SC conditions was compared with performance on the original SC and fixed duration SC stimuli, and in initial and final control conditions in which initial and final transitional portions were each presented alone. Vowels were identified highly accurately in both mixed-consonant SC and original syllable SC conditions (only 7%-8% overall errors). Neutralizing duration information led to small, but significant, increases in identification errors in both mixed-consonant and original fixed-duration SC conditions (14%-15% errors), but performance was still much more accurate than for initial and finals control conditions (35% and 52% errors, respectively). Acoustical analysis confirmed that direction and extent of formant change from initial to final portions of mixed-consonant stimuli differed from that of original syllables, arguing against a target + offglide explanation of the perceptual results. Results do support the hypothesis that temporal trajectories specifying "style of movement" provide information for the differentiation of American English tense and lax vowels, and that this information is invariant over the place of articulation and voicing of the surrounding stop consonants.  相似文献   

13.
The purpose of this study was to use vocal tract simulation and synthesis as means to determine the acoustic and perceptual effects of changing both the cross-sectional area and location of vocal tract constrictions for six different vowels: Area functions at and near vocal tract constrictions are considered critical to the acoustic output and are also the central point of hypotheses concerning speech targets. Area functions for the six vowels, [symbol: see text] were perturbed by changing the cross-sectional area of the constriction (Ac) and the location of the constriction (Xc). Perturbations for Ac were performed for different values of Xc, producing several series of acoustic continua for the different vowels. Acoustic simulations for the different area functions were made using a frequency domain model of the vocal tract. Each simulated vowel was then synthesized as a 1-s duration steady-state segment. The phoneme boundaries of the perturbed synthesized vowels were determined by formal perception tests. Results of the perturbation analyses showed that formants for each of the vowels were more sensitive to changes in constriction cross-sectional area than changes in constriction location. Vowel perception, however, was highly resistant to both types of changes. Results are discussed in terms of articulatory precision and constriction-related speech production strategies.  相似文献   

14.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.  相似文献   

15.
当前社会新型犯罪中电信诈骗案件频发,急需一种能够自动有效区分语声真伪的方法。为进一步增强目前深度学习领域识别合成语声的能力,为保障语声信息安全提供技术上的支持,针对合成语声声学特性上异于真实语声的特点,分析对比合成语声和真实语声的声学特性,设计了一种声学特征均方根角量化语声声强变化程度,结合基频变化率和语声窄带频谱图声学特征进行融合,量化了声学特性差异,聚焦了合成语声中关键声学信息。在神经网络模型中融合输入声学特征,在FoR数据集的验证集上得到了0.6%的等错误率,在测试集上最好结果达到了10.8%的等错误率。该文成功实现了对合成语声的识别,证实了声学特征的有效性和研究方案的可行性,在一定程度上拓宽了合成语声特征设计的研究思路。  相似文献   

16.
17.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds.  相似文献   

18.
This study considers an operation of an auditory spectral integration process which may be involved in perceiving dynamic time-varying changes in speech found in diphthongs and glide-type transitions. Does the auditory system need explicit vowel formants to track the dynamic changes over time? Listeners classified diphthongs on the basis of a moving center of gravity (COG) brought about by changing intensity ratio of static spectral components instead of changing an F2. Listeners were unable to detect COG movement only when the F2 change was small (160 Hz) or when the separation between the static components was large (4.95 bark).  相似文献   

19.
20.
Native Italian speakers' perception and production of English vowels   总被引:2,自引:0,他引:2  
This study examined the production and perception of English vowels by highly experienced native Italian speakers of English. The subjects were selected on the basis of the age at which they arrived in Canada and began to learn English, and how much they continued to use Italian. Vowel production accuracy was assessed through an intelligibility test in which native English-speaking listeners attempted to identify vowels spoken by the native Italian subjects. Vowel perception was assessed using a categorial discrimination test. The later in life the native Italian subjects began to learn English, the less accurately they produced and perceived English vowels. Neither of two groups of early Italian/English bilinguals differed significantly from native speakers of English either for production or perception. This finding is consistent with the hypothesis of the speech learning model [Flege, in Speech Perception and Linguistic Experience: Theoretical and Methodological Issues (York, Timonium, MD, 1995)] that early bilinguals establish new categories for vowels found in the second language (L2). The significant correlation observed to exist between the measures of L2 vowel production and perception is consistent with another hypothesis of the speech learning model, viz., that the accuracy with which L2 vowels are produced is limited by how accurately they are perceived.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号