首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

2.
Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering.  相似文献   

3.
Estimates of the ability to make use of sentence context in 34 postlingually hearing-impaired (HI) individuals were obtained using formulas developed by Boothroyd and Nittrouer [Boothroyd and Nittrouer, J. Acoust. Sco. Am. 84, 101-114 (1988)] which relate scores for isolated words to words in meaningful sentences. Sentence materials were constructed by concatenating digitized productions of isolated words to ensure physical equivalence among the test items in the two conditions. Isolated words and words in sentences were tested at three levels of intelligibility (targeting 29%, 50%, and 79% correct). Thus, for each subject, three estimates of context ability, or k factors, were obtained. In addition, auditory, visual, and auditory-visual sentence recognition was evaluated using natural productions of sentence materials. Two main questions were addressed: (1) Is context ability constant for speech materials produced with different degrees of clarity? and (2) What are the relations between individual estimates of k and sentence recognition as a function of presentation modality? Results showed that estimates of k were not constant across different levels of intelligibility: k was greater for the more degraded condition relative to conditions of higher word intelligibility. Estimates of k also were influenced strongly by the test order of isolated words and words in sentences. That is, prior exposure to words in sentences improved later recognition of the same words when presented in isolation (and vice versa), even though the 1500 key words comprising the test materials were presented under degraded (filtered) conditions without feedback. The impact of this order effect was to reduce individual estimates of k for subjects exposed to sentence materials first and to increase estimates of k for subjects exposed to isolated words first. Finally, significant relationships were found between individual k scores and sentence recognition scores in all three presentation modalities, suggesting that k is a useful measure of individual differences in the ability to use sentence context.  相似文献   

4.
The ability to recognize spoken words interrupted by silence was investigated with young normal-hearing listeners and older listeners with and without hearing impairment. Target words from the revised SPIN test by Bilger et al. [J. Speech Hear. Res. 27(1), 32-48 (1984)] were presented in isolation and in the original sentence context using a range of interruption patterns in which portions of speech were replaced with silence. The number of auditory "glimpses" of speech and the glimpse proportion (total duration glimpsed/word duration) were varied using a subset of the SPIN target words that ranged in duration from 300 to 600 ms. The words were presented in isolation, in the context of low-predictability (LP) sentences, and in high-predictability (HP) sentences. The glimpse proportion was found to have a strong influence on word recognition, with relatively little influence of the number of glimpses, glimpse duration, or glimpse rate. Although older listeners tended to recognize fewer interrupted words, there was considerable overlap in recognition scores across listener groups in all conditions, and all groups were affected by interruption parameters and context in much the same way.  相似文献   

5.
It was investigated whether the model for context effects, developed earlier by Bronkhorst et al. [J. Acoust. Soc. Am. 93, 499-509 (1993)], can be applied to results of sentence tests, used for the evaluation of speech recognition. Data for two German sentence tests, that differed with respect to their semantic content, were analyzed. They had been obtained from normal-hearing listeners using adaptive paradigms in which the signal-to-noise ratio was varied. It appeared that the model can accurately reproduce the complete pattern of scores as a function of signal-to-noise ratio: both sentence recognition scores and proportions of incomplete responses. In addition, it is shown that the model can provide a better account of the relationship between average word recognition probability (p(e)) and sentence recognition probability (p(w)) than the relationship p(w) =p(e)j, which has been used in previous studies. Analysis of the relationship between j and the model parameters shows that j is, nevertheless, a very useful parameter, especially when it is combined with the parameter j', which can be derived using the equivalent relationship p(w,0) = (1 - p(e))(j'), where p(w,0) is the probability of recognizing none of the words in the sentence. These parameters not only provide complementary information on context effects present in the speech material, but they also can be used to estimate the model parameters. Because the model can be applied to both speech and printed text, an experiment was conducted in which part of the sentences was presented orthographically with 1-3 missing words. The results revealed a large difference between the values of the model parameters for the two presentation modes. This is probably due to the fact that, with speech, subjects can reduce the number of alternatives for a certain word using partial information that they have perceived (i.e., not only using the sentence context). A method for mapping model parameters from one mode to the other is suggested, but the validity of this approach has to be confirmed with additional data.  相似文献   

6.
The relative importance of different parts of the auditory spectrum to recognition of the Diagnostic Rhyme Test (DRT) and its six speech feature subtests was determined. Three normal hearing subjects were tested twice in each of 70 experimental conditions. The analytical procedures of French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)] were applied to the data to derive frequency importance functions for each of the DRT subtests and the test as a whole over the frequency range 178-8912 Hz. For the DRT as a whole, the low frequencies were found to be more important than is the case for nonsense syllables. Importance functions for the feature subtests also differed from those for nonsense syllables and from each other as well. These results suggest that test materials loaded with different proportions of particular phonemes have different frequency importance functions. Comparison of the results with those from other studies suggests that importance functions depend to a degree on the available response options as well.  相似文献   

7.
Recognition of speech stimuli consisting of monosyllabic words, sentences, and nonsense syllables was tested in normal subjects and in a subject with a low-frequency sensorineural hearing loss characterized by an absence of functioning sensory units in the apical region of the cochlea, as determined in a previous experiment [C. W. Turner, E. M. Burns, and D. A. Nelson, J. Acoust. Soc. Am. 73, 966-975 (1983)]. Performance of all subjects was close to 100% correct for all stimuli presented unfiltered at a moderate intensity level. When stimuli were low-pass filtered, performance of the hearing-impaired subject fell below that of the normals, but was still considerably above chance. A further diminution in the impaired subject's recognition of nonsense syllables resulted from the addition of a high-pass masking noise, indicating that his performance in the filtered quiet condition was attributable in large part to the contribution of sensory units in basal and midcochlear regions. Normals' performance was also somewhat decreased by the masker, suggesting that they also may have been extracting some low-frequency speech cues from responses of sensory units located in the base of the cochlea.  相似文献   

8.
For all but the most profoundly hearing-impaired (HI) individuals, auditory-visual (AV) speech has been shown consistently to afford more accurate recognition than auditory (A) or visual (V) speech. However, the amount of AV benefit achieved (i.e., the superiority of AV performance in relation to unimodal performance) can differ widely across HI individuals. To begin to explain these individual differences, several factors need to be considered. The most obvious of these are deficient A and V speech recognition skills. However, large differences in individuals' AV recognition scores persist even when unimodal skill levels are taken into account. These remaining differences might be attributable to differing efficiency in the operation of a perceptual process that integrates A and V speech information. There is at present no accepted measure of the putative integration process. In this study, several possible integration measures are compared using both congruent and discrepant AV nonsense syllable and sentence recognition tasks. Correlations were tested among the integration measures, and between each integration measure and independent measures of AV benefit for nonsense syllables and sentences in noise. Integration measures derived from tests using nonsense syllables were significantly correlated with each other; on these measures, HI subjects show generally high levels of integration ability. Integration measures derived from sentence recognition tests were also significantly correlated with each other, but were not significantly correlated with the measures derived from nonsense syllable tests. Similarly, the measures of AV benefit based on nonsense syllable recognition tests were found not to be significantly correlated with the benefit measures based on tests involving sentence materials. Finally, there were significant correlations between AV integration and benefit measures derived from the same class of speech materials, but nonsignificant correlations between integration and benefit measures derived from different classes of materials. These results suggest that the perceptual processes underlying AV benefit and the integration of A and V speech information might not operate in the same way on nonsense syllable and sentence input.  相似文献   

9.
Perception is influenced both by characteristics of the stimulus, and by the context in which it is presented. The relative contributions of each of these factors depend, to some extent, on perceiver characteristics. The contributions of word and sentence context to the perception of phonemes within words and words within sentences, respectively, have been well studied for normal, young adults. However, far less is known about these context effects for much younger and older listeners. In the present study, measures of these context effects were obtained from young children (ages 4 years 6 months to 6 years 6 months) and from older adults (over 62 years), and compared with those of the young adults in an earlier study [A. Boothroyd and S. Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)]. Both children and older adults demonstrated poorer overall recognition scores than did young adults. However, responses of children and older adults demonstrated similar context effects, with two exceptions: Children used the semantic constraints of sentences to a lesser extent than did young or older adults, and older adults used lexical constraints to a greater extent than either of the other two groups.  相似文献   

10.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

11.
A frequency importance function for continuous discourse   总被引:1,自引:0,他引:1  
Normal hearing subjects estimated the intelligibility of continuous discourse (CD) passages spoken by three talkers (two male and one female) under 135 conditions of filtering and signal-to-noise ratio. The relationship between the intelligibility of CD and the articulation index (the transfer function) was different from any found in ANSI S3.5-1969. Also, the lower frequencies were found to be relatively more important for the intelligibility of CD than for identification of nonsense syllables and other types of speech for which data are available except for synthetic sentences [Speaks, J. Speech Hear. Res. 10, 289-298 (1967)]. The frequency which divides the auditory spectrum into two equally important halves (the crossover frequency) was found to be about 0.5 oct lower for the CD used in this study than the crossover frequency for male talkers of nonsense syllables found in ANSI S3.5-1969 and about 0.7 oct lower than the one for combined male and female talkers of nonsense syllables reported by French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)].  相似文献   

12.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

13.
Linguistic modality effects on fundamental frequency in speech   总被引:2,自引:0,他引:2  
This paper examines the effects on fundamental frequency (F0) patterns of modality operators, such as sentential adverbs, modals, negatives, and quantifiers. These words form inherently contrastive classes which have varying tendencies to produce emphasis deviations in F0 contours. Three speakers read a set of 186 sentences and three paragraphs to provide data for F0 analysis. The important words in each sentence were marked intonationally with rises or sharp falls in F0, compared to gradually falling F0 in unemphasized words. These emphasis deviations were measured in terms of F0 variations from the norm; they were larger toward the beginning of sentences, in longer sentences, on syllables surrounded by unemphasized syllables, and in contrastive contexts. Other results showed that embedded clauses tended to have lower F0, and negative contractions were emphasized on their first syllables. Individual speakers differed in overall F0 levels, while using roughly similar emphasis strategies. F0 levels changed in paragraphs, with emphasis going to contextually new information.  相似文献   

14.
While a large portion of the variance among listeners in speech recognition is associated with the audibility of components of the speech waveform, it is not possible to predict individual differences in the accuracy of speech processing strictly from the audiogram. This has suggested that some of the variance may be associated with individual differences in spectral or temporal resolving power, or acuity. Psychoacoustic measures of spectral-temporal acuity with nonspeech stimuli have been shown, however, to correlate only weakly (or not at all) with speech processing. In a replication and extension of an earlier study [Watson et al., J. Acoust. Soc. Am. Suppl. 1 71. S73 (1982)] 93 normal-hearing college students were tested on speech perception tasks (nonsense syllables, words, and sentences in a noise background) and on six spectral-temporal discrimination tasks using simple and complex nonspeech sounds. Factor analysis showed that the abilities that explain performance on the nonspeech tasks are quite distinct from those that account for performance on the speech tasks. Performance was significantly correlated among speech tasks and among nonspeech tasks. Either, (a) auditory spectral-temporal acuity for nonspeech sounds is orthogonal to speech processing abilities, or (b) the appropriate tasks or types of nonspeech stimuli that challenge the abilities required for speech recognition have yet to be identified.  相似文献   

15.
Older adults are known to benefit from supportive context in order to compensate for age-related reductions in perceptual and cognitive processing, including when comprehending spoken language in adverse listening conditions. In the present study, we examine how younger and older adults benefit from two types of contextual support, predictability from sentence context and priming, when identifying target words in noise-vocoded sentences. In the first part of the experiment, benefit from context based on primarily semantic knowledge was evaluated by comparing the accuracy of identification of sentence-final target words that were either highly predictable or not predictable from the sentence context. In the second part of the experiment, benefit from priming was evaluated by comparing the accuracy of identification of target words when noise-vocoded sentences were either primed or not by the presentation of the sentence context without noise vocoding and with the target word replaced with white noise. Younger and older adults benefited from each type of supportive context, with the most benefit realized when both types were combined. Supportive context reduced the number of noise-vocoded bands needed for 50% word identification more for older adults than their younger counterparts.  相似文献   

16.
Perceptual distances among single tokens of American English vowels were established for nonreverberant and reverberant conditions. Fifteen vowels in the phonetic context (b-t), embedded in the sentence "Mark the (b-t) again" were recorded by a male talker. For the reverberant condition, the sentences were played through a room with a reverberation time of 1.2 s. The CVC syllables were removed from the sentences and presented in pairs to ten subjects with audiometrically normal hearing, who judged the similarity of the syllable pairs separately for the nonreverberant and reverberant conditions. The results were analyzed by multidimensional scaling procedures, which showed that the perceptual data were accounted for by a three-dimensional vowel space. Correlations were obtained between the coordinates of the vowels along each dimension and selected acoustic parameters. For both conditions, dimensions 1 and 2 were highly correlated with formant frequencies F2 and F1, respectively, and dimension 3 was correlated with the product of the duration of the vowels and the difference between F3 and F1 expressed on the Bark scale. These observations are discussed in terms of the influence of reverberation on speech perception.  相似文献   

17.
The 36 basic "sentences" in the experiment were six-syllable nonsense sequences of the form DAS a LAS a GAS a or a DAS a BAS a LAS. Either (a) one vowel in the sentence was lengthened or shortened by about 50, 90, or 130 ms by computer-editing routines, or (b) the sentence was left intact (as spoken). The resulting perceptual impression after the vowel change was a change of tempo within the sentence. Vowel changes occurred systematically throughout the sentences, in one of syllables one through five. Reaction time (RT) was recorded to assigned target segments /b, d, or g/ in one of syllables one through six, and RT was compared to targets in tempo-changed versus intact sentences (these were acoustically identical except for the distorted vowel). The listeners responded to over 2000 versions of the sentences. The results were: (a) Tempo changes generally increased segment target RT. (b) Tempo-change effects were ubiquitous; for instance, vowel changes in the first syllable increased RT to targets in later syllables, and the reach of effects spanned four syllables. Both vowel shortening and lengthening increased target RT. (c) Effects attributed to precessing time decreased, whereas effects attributed to stimulus expectancy increased, with time into the sentence. (d) Tempo-change effects persisted throughout the experiment despite practice and familiarity with stimuli. The conclusions were: (a) The effects of time distortion of the stimulus on target RT were produced mainly by changes in stimulus-induced expectancy, not changes in processing time. (b) The expected input to perception is the acoustically intact utterance in both its rhythmic and segmental aspects; these aspects are not perceived independently.  相似文献   

18.
通用实时语言识别系统——RTSRS(01)   总被引:2,自引:0,他引:2       下载免费PDF全文
俞铁城 《物理学报》1978,27(5):508-515
本文描述一个通用实时语言识别系统——RTSRS(01)。在以前工作的基础上,每条口呼命令的参数在时间域上规正,采用二值频谱,大大压缩了参考音的参数存贮量,同时应用新的求差距的办法,使得识别所需的时间大为缩短,以致字表为200时能实时识别。专人的识别结果为:口呼数字,99.7%;20句话(每句7字),99.7%;四字成语100个,99.5%;四字成语150个,99.3%;四字成语200个,98.8%;四字成语400个,99.7%。非正式的实验表明,对于不同音节数的字表,乃至口呼英语数字或BASIC语句名字等,都有高的正确识别率。 关键词:  相似文献   

19.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

20.
This study examined the perception and acoustics of a large corpus of vowels spoken in consonant-vowel-consonant syllables produced in citation-form (lists) and spoken in sentences at normal and rapid rates by a female adult. Listeners correctly categorized the speaking rate of sentence materials as normal or rapid (2% errors) but did not accurately classify the speaking rate of the syllables when they were excised from the sentences (25% errors). In contrast, listeners accurately identified the vowels produced in sentences spoken at both rates when presented the sentences and when presented the excised syllables blocked by speaking rate or randomized. Acoustical analysis showed that formant frequencies at syllable midpoint for vowels in sentence materials showed "target undershoot" relative to citation-form values, but little change over speech rate. Syllable durations varied systematically with vowel identity, speaking rate, and voicing of final consonant. Vowel-inherent-spectral-change was invariant in direction of change over rate and context for most vowels. The temporal location of maximum F1 frequency further differentiated spectrally adjacent lax and tense vowels. It was concluded that listeners were able to utilize these rate- and context-independent dynamic spectrotemporal parameters to identify coarticulated vowels, even when sentential information about speaking rate was not available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号