首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
It was investigated whether the model for context effects, developed earlier by Bronkhorst et al. [J. Acoust. Soc. Am. 93, 499-509 (1993)], can be applied to results of sentence tests, used for the evaluation of speech recognition. Data for two German sentence tests, that differed with respect to their semantic content, were analyzed. They had been obtained from normal-hearing listeners using adaptive paradigms in which the signal-to-noise ratio was varied. It appeared that the model can accurately reproduce the complete pattern of scores as a function of signal-to-noise ratio: both sentence recognition scores and proportions of incomplete responses. In addition, it is shown that the model can provide a better account of the relationship between average word recognition probability (p(e)) and sentence recognition probability (p(w)) than the relationship p(w) =p(e)j, which has been used in previous studies. Analysis of the relationship between j and the model parameters shows that j is, nevertheless, a very useful parameter, especially when it is combined with the parameter j', which can be derived using the equivalent relationship p(w,0) = (1 - p(e))(j'), where p(w,0) is the probability of recognizing none of the words in the sentence. These parameters not only provide complementary information on context effects present in the speech material, but they also can be used to estimate the model parameters. Because the model can be applied to both speech and printed text, an experiment was conducted in which part of the sentences was presented orthographically with 1-3 missing words. The results revealed a large difference between the values of the model parameters for the two presentation modes. This is probably due to the fact that, with speech, subjects can reduce the number of alternatives for a certain word using partial information that they have perceived (i.e., not only using the sentence context). A method for mapping model parameters from one mode to the other is suggested, but the validity of this approach has to be confirmed with additional data.  相似文献   

2.
Mathematical treatment of context effects in phoneme and word recognition   总被引:2,自引:0,他引:2  
Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.  相似文献   

3.
Stimulus selection in adaptive psychophysical procedures   总被引:2,自引:0,他引:2  
In adaptive psychophysical procedures, the stimulus should be presented at a relatively high level rather than near the middle of the psychometric function, which is often defined as the "threshold" value. For some psychometric functions, the optimal stimulus placement level produces 84% to 94% correct responses in a two-alternative forced-choice task. This result is disquieting because the popular two-down one-up rule tracks a relatively low percentage of correct responses, 70.7%. Computer simulations and a variety of psychometric functions were used to confirm the validity of this analysis. These simulations also demonstrate that the precise form of the psychometric function is not critical in achieving the high efficiencies. Finally, data from human listeners indicate that the standard deviation of threshold estimates is indeed larger when the stimulus presented on each trial is at a stimulus level corresponding to 70.7% rather than 94% correct responses.  相似文献   

4.
The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words.  相似文献   

5.
The ability to obtain reliable phonetic information from a talker's face during speech perception is an important skill. However, lip-reading abilities vary considerably across individuals. There is currently a lack of normative data on lip-reading abilities in young normal-hearing listeners. This letter describes results obtained from a visual-only sentence recognition experiment using CUNY sentences and provides the mean number of words correct and the standard deviation for different sentence lengths. Additionally, the method for calculating T-scores is provided to facilitate the conversion between raw and standardized scores. This metric can be utilized by clinicians and researchers in lip-reading studies. This statistic provides a useful benchmark for determining whether an individual's lip-reading score falls within the normal range, or whether it is above or below this range.  相似文献   

6.
Speech intonation and focus location in matched statements and questions   总被引:3,自引:0,他引:3  
An acoustical study of speech production was conducted to determine the manner in which the location of linguistic focus influences intonational attributes of duration and fundamental voice frequency (F0) in matched statements and questions. Speakers orally read sentences that were preceded by aurally presented stimuli designed to elicit either no focus or focus on the first or last noun phrase of the target sentences. Computer-aided acoustical analysis of word durations showed a localized, large magnitude increase in the duration of the focused word for both statements and questions. Analysis of F0 revealed a more complex pattern of results, with the shape of the F0 topline dependent on sentence type and focus location. For sentences with neutral or sentence-final focus, the difference in the F0 topline between questions and statements was evident only on the last key word, where the F0 peak of questions was considerably higher than that of statements. For sentences with focus on the first key word, there was no difference in peak F0 on the focused item itself, but the F0 toplines of questions and statements diverged quite dramatically following the initial word. The statement contour dropped to a low F0 value for the remainder of the sentence, whereas the question remained quite high in F0 for all subsequent words. In addition, the F0 contour on the focused word was rising in questions and falling in statements, regardless of focus location. The results provide a basis for work on the perception of linguistic focus.  相似文献   

7.
The rationale for a method to quantify the information content of linguistic stimuli, i.e., the linguistic entropy, is developed. The method is an adapted version of the letter-guessing procedure originally devised by Shannon [Bell Syst. Tech. J. 30, 50-64 (1951)]. It is applied to sentences included in a widely used test to measure speech-reception thresholds and originally selected to be approximately equally redundant. Results of a first experiment reveal that this method enables one to detect subtle differences between sentences and sentence lists with respect to linguistic entropy. Results of a second experiment show that (1) in young listeners and with the sentences employed, manipulating linguistic entropy can result in an effect on SRT of approximately 4 dB in terms of signal-to-noise ratio; (2) the range of this effect is approximately the same in elderly listeners.  相似文献   

8.
Perception of interrupted speech and the influence of speech materials and memory load were investigated using one or two concurrent square-wave gating functions. Sentences (Experiment 1) and random one-, three-, and five-word sequences (Experiment 2) were interrupted using either a primary gating rate alone (0.5-24 Hz) or a combined primary and faster secondary rate. The secondary rate interrupted only speech left intact after primary gating, reducing the original speech to 25%. In both experiments, intelligibility increased with primary rate, but varied with memory load and speech material (highest for sentences, lowest for five-word sequences). With dual-rate gating of sentences, intelligibility with fast secondary rates was superior to that with single rates and a 25% duty cycle, approaching that of single rates with a 50% duty cycle for some low and high rates. For dual-rate gating of words, the positive effect of fast secondary gating was smaller than for sentences, and the advantage of sentences over word-sequences was not obtained in many dual-rate conditions. These findings suggest that integration of interrupted speech fragments after gating depends on the duration of the gated speech interval and that sufficiently robust acoustic-phonetic word cues are needed to access higher-level contextual sentence information.  相似文献   

9.
汉语综合资料库的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
语言是人类最重要的交际工具,随着现代信息技术的发展,语言也是人与机器之间交际的有效工具.近年来世界各国纷纷建立本国的言语资料库作为言语科学研究和言语技术开发的基础.汉语综合资料库的语音材料有:汉语全部有调音节、数字串、单词、韵律特征材料,以及语言清晰度试验用音节表、词表、句表和有代表性的短文等.汉语综合资料库在语言学和语音学特征以及声学特征方面充分体现汉语的基本特点.首先要解决语料选取问题,考虑各种语言单位的使用频率,不仅要包括全部高频词,也要反映较全面的语音现象.数据库在结构上是开放的模块式的,同时配有灵活的数据库管理系统.  相似文献   

10.
The simple up-down adaptive procedure is a common method for measuring speech reception thresholds. It is used by the Dutch speech-in-noise telephone screening test [National Hearing test; Smits and Houtgast Ear Hear. 26, 89-95 (2005)]. The test uses digit triplets to measure the speech reception threshold in noise by telephone (SRTT(n)). About 66 000 people took this test within four months of its introduction and details were stored of all individual measurements. Analyses of this large volume of data have revealed that the standard deviation of SRTT(n) estimates increases with hearing loss. This paper presents a calculation model which--using an intelligibility function as input--can determine the standard deviation of SRTT(n) estimates and the bias for the simple up-down procedure. The effects of variations in the slope of the intelligibility function, the guess rate, the starting level, the heterogeneity of the speech material, and the possibilities of optimizing SRTT(n) measurements were all explored with this model. The predicted decrease in the standard deviation of SRTT(n) estimates as a result of optimizing the speech material was confirmed by measurements in 244 listeners. The paper concludes by discussing possibilities for optimizing the development of comparable tests.  相似文献   

11.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

12.
Older adults are known to benefit from supportive context in order to compensate for age-related reductions in perceptual and cognitive processing, including when comprehending spoken language in adverse listening conditions. In the present study, we examine how younger and older adults benefit from two types of contextual support, predictability from sentence context and priming, when identifying target words in noise-vocoded sentences. In the first part of the experiment, benefit from context based on primarily semantic knowledge was evaluated by comparing the accuracy of identification of sentence-final target words that were either highly predictable or not predictable from the sentence context. In the second part of the experiment, benefit from priming was evaluated by comparing the accuracy of identification of target words when noise-vocoded sentences were either primed or not by the presentation of the sentence context without noise vocoding and with the target word replaced with white noise. Younger and older adults benefited from each type of supportive context, with the most benefit realized when both types were combined. Supportive context reduced the number of noise-vocoded bands needed for 50% word identification more for older adults than their younger counterparts.  相似文献   

13.
Word recognition in sentences with and without context was measured in young and aged subjects with normal but not identical audiograms. Benefit derived from context by older adults has been obscured, in part, by the confounding effect of even mildly elevated thresholds, especially as listening conditions vary in difficulty. This problem was addressed here by precisely controlling signal-to-noise ratio across conditions and by accounting for individual differences in signal-to-noise ratio. Pure-tone thresholds and word recognition were measured in quiet and threshold-shaped maskers that shifted quiet thresholds by 20 and 40 dB. Word recognition was measured at several speech levels in each condition. Threshold was defined as the speech level (or signal-to-noise ratio) corresponding to the 50 rau point on the psychometric function. As expected, thresholds and slopes of psychometric functions were different for sentences with context compared to those for sentences without context. These differences were equivalent for young and aged subjects. Individual differences in word recognition among all subjects, young and aged, were accounted for by individual differences in signal-to-noise ratio. With signal-to-noise ratio held constant, word recognition for all subjects remained constant or decreased only slightly as speech and noise levels increased. These results suggest that, given equivalent speech audibility, older and younger listeners derive equivalent benefit from context.  相似文献   

14.
The three experiments reported here compare the effectiveness of natural prosodic and vocal-tract size cues at overcoming spatial cues in selective attention. Listeners heard two simultaneous sentences and decided which of two simultaneous target words came from the attended sentence. Experiment 1 used sentences that had natural differences in pitch and in level caused by a change in the location of the main sentence stress. The sentences' pitch contours were moved apart or together in order to separate out effects due to pitch and those due to other prosodic factors such as intensity. Both pitch and the other prosodic factors had an influence on which target word was reported, but the effects were not strong enough to override the spatial difference produced by an interaural time difference of +/- 91 microseconds. In experiment 2, a large (+/- 15%) difference in apparent vocal-tract size between the speakers of the two sentences had an additional and strong effect, which, in conjunction with the original prosodic differences overrode an interaural time difference of +/- 181 microseconds. Experiment 3 showed that vocal-tract size differences of +/- 4% or less had no detectable effect. Overall, the results show that prosodic and vocal-tract size cues can override spatial cues in determining which target word belongs in an attended sentence.  相似文献   

15.
This study compares alternative modes of presenting sentences in testing situations where noise is employed and where the required response is only one word in the sentence. The purpose is to establish the extent to which contextual information is transmitted to the listener in the following four presentation modes: (1) acoustical presentation of the test word under noise and written presentation of the rest of the sentence (mode W); (2) acoustical presentation of sentence and test word under uniform noise (mode A); (3) superposition of the previous two modes (mode B); and (4) acoustical presentation of the test word under noise in one ear, immediately following the presentation of the rest of the sentence without noise in the other ear (mode C). Modes B and C are found to be essentially equivalent to mode W. When mode A is used, the intelligibility of the test word is substantially lower than with mode W, especially at low signal-to-noise (S/N) ratios. These results are particularly relevant to testing situations where the primary intent is to assess the utilization of contextual information in perceiving speech.  相似文献   

16.
The 36 basic "sentences" in the experiment were six-syllable nonsense sequences of the form DAS a LAS a GAS a or a DAS a BAS a LAS. Either (a) one vowel in the sentence was lengthened or shortened by about 50, 90, or 130 ms by computer-editing routines, or (b) the sentence was left intact (as spoken). The resulting perceptual impression after the vowel change was a change of tempo within the sentence. Vowel changes occurred systematically throughout the sentences, in one of syllables one through five. Reaction time (RT) was recorded to assigned target segments /b, d, or g/ in one of syllables one through six, and RT was compared to targets in tempo-changed versus intact sentences (these were acoustically identical except for the distorted vowel). The listeners responded to over 2000 versions of the sentences. The results were: (a) Tempo changes generally increased segment target RT. (b) Tempo-change effects were ubiquitous; for instance, vowel changes in the first syllable increased RT to targets in later syllables, and the reach of effects spanned four syllables. Both vowel shortening and lengthening increased target RT. (c) Effects attributed to precessing time decreased, whereas effects attributed to stimulus expectancy increased, with time into the sentence. (d) Tempo-change effects persisted throughout the experiment despite practice and familiarity with stimuli. The conclusions were: (a) The effects of time distortion of the stimulus on target RT were produced mainly by changes in stimulus-induced expectancy, not changes in processing time. (b) The expected input to perception is the acoustically intact utterance in both its rhythmic and segmental aspects; these aspects are not perceived independently.  相似文献   

17.
Three experiments explored the resistance to simulated reverberation of various cues for selective attention. Listeners decided which of two simultaneous target words belonged to an attended rather than to a simultaneous unattended sentence. Attended and unattended sentences were spatially separated using interaural time differences (ITDs) of 0, +/-45, +/-91 or +/-181 micros. Experiment 1 used sentences resynthesized on a monotone, with sentence pairs having F0 differences of 0, 1, 2, or 4 semitones. Listeners' weak preference for the target word with the same monotonous F0 as the attended sentence was eliminated by reverberation. Experiment 1 also showed that listeners' ability to use ITD differences was seriously impaired by reverberation although some ability remained for the longest ITD tested. In experiment 2 the sentences were spoken with natural prosody, with sentence stress in different places in the attended and unattended sentences. The overall F0 of each sentence was shifted by a constant amount on a log scale to bring the F0 trajectories of the target words either closer together or further apart. These prosodic manipulations were generally more resistant to reverberation than were the ITD differences. In experiment 3, adding a large difference in vocal-tract size (+/- 15%) to the prosodic cues produced a high level of performance which was very resistant to reverberation. The experiments show that the natural prosody and vocal-tract size differences between talkers that were used retain their efficacy in helping selective attention under conditions of reverberation better than do interaural time differences.  相似文献   

18.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

19.
Estimates of the ability to make use of sentence context in 34 postlingually hearing-impaired (HI) individuals were obtained using formulas developed by Boothroyd and Nittrouer [Boothroyd and Nittrouer, J. Acoust. Sco. Am. 84, 101-114 (1988)] which relate scores for isolated words to words in meaningful sentences. Sentence materials were constructed by concatenating digitized productions of isolated words to ensure physical equivalence among the test items in the two conditions. Isolated words and words in sentences were tested at three levels of intelligibility (targeting 29%, 50%, and 79% correct). Thus, for each subject, three estimates of context ability, or k factors, were obtained. In addition, auditory, visual, and auditory-visual sentence recognition was evaluated using natural productions of sentence materials. Two main questions were addressed: (1) Is context ability constant for speech materials produced with different degrees of clarity? and (2) What are the relations between individual estimates of k and sentence recognition as a function of presentation modality? Results showed that estimates of k were not constant across different levels of intelligibility: k was greater for the more degraded condition relative to conditions of higher word intelligibility. Estimates of k also were influenced strongly by the test order of isolated words and words in sentences. That is, prior exposure to words in sentences improved later recognition of the same words when presented in isolation (and vice versa), even though the 1500 key words comprising the test materials were presented under degraded (filtered) conditions without feedback. The impact of this order effect was to reduce individual estimates of k for subjects exposed to sentence materials first and to increase estimates of k for subjects exposed to isolated words first. Finally, significant relationships were found between individual k scores and sentence recognition scores in all three presentation modalities, suggesting that k is a useful measure of individual differences in the ability to use sentence context.  相似文献   

20.
A method is described to select sentence materials for efficient measurement of the speech reception threshold (SRT). The first part of the paper addresses the creation of the sentence materials, the recording procedure, and a listening experiment to evaluate the new speech materials. The result is a set of 1272 sentences, where every sentence has been uttered by two male and two female speakers. In the second part of the paper, a method is described to select subsets with properties that are desired for an efficient measurement of the SRT. For two speakers, this method has been applied to obtain two subsets for measurement of the SRT in stationary noise with the long-term average spectrum of speech. Lastly, a listening experiment has been conducted where the two subsets (each comprising 39 lists of 13 sentences each) are directly compared to the existing sets of Plomp and Mimpen [Audiology 18, 43-52 (1979)] and Smoorenburg [J. Acoust. Soc. Am. 91, 421-437 (1992)]. One of the outcomes is that the newly developed sets can be considered as equivalent to these existing sets.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号