首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Log-linear models, in conjunction with the G2 statistic, were developed and applied to several existing sets of consonant confusion data. Significant interactions of consonant error patterns were found with signal-to-noise ratio (S/N), presentation level, vowel context, and low-pass and high-pass filtering. These variables also showed significant interactions with error patterns when categorized on the basis of feature classifications. Patterns of errors were significantly altered by S/N for place of articulation (front, middle, back), voicing, frication, and nasality. Low-pass filtering significantly affected error patterns when categorized by place of articulation, duration, or nasality; whereas, high-pass filtering only affected voicing and frication error patterns. This paper also demonstrates the utility of log-linear modeling techniques in applications to confusion matrix analysis: specific effects can be tested; variant cells in a matrix can be isolated with respect to a particular model of interest; diagonal cells can be eliminated from the analysis; and the matrix can be collapsed across levels of variables, with no violation of independence. Finally, log-linear techniques are suggested for development of parsimonious and predictive models of speech perception.  相似文献   

2.
Two effects of reverberation on the identification of consonants were evaluated for ten normal-hearing subjects: (1) the overlap of energy of a preceding consonant on the following consonant, called "overlap-masking"; and (2) the internal temporal smearing of energy within each consonant, called "self-masking." The stimuli were eight consonants/p,t,k,f,m,n,l,w/. The consonants were spoken in /s-at/context (experiment 1) and generated by a speech synthesizer in /s-at/ and/-at/contexts (experiment 2). In both experiments, identification of consonants was tested in four conditions: (1) quiet, without degradations; (2) with a babble of voices; (3) with noise that was shaped like either natural or synthetic/s/ for the two experiments, respectively; and (4) with room reverberation. The results for the natural and synthetic syllables indicated that the effect of reverberation on identification of consonants following/s/ was not comparable to masking by either the /s/ -spectrum-shaped noise or the babble. In addition, the results for the synthetic syllables indicated that most of the errors in reverberation for the /s-at/context were similar to a sum of errors in two conditions: (1) with /s/-shaped noise causing overlap masking; and (2) with reverberation causing self-masking within each consonant.  相似文献   

3.
Previous research has demonstrated reduced speech recognition when speech is presented at higher-than-normal levels (e.g., above conversational speech levels), particularly in the presence of speech-shaped background noise. Persons with hearing loss frequently listen to speech-in-noise at these levels through hearing aids, which incorporate multiple-channel, wide dynamic range compression. This study examined the interactive effects of signal-to-noise ratio (SNR), speech presentation level, and compression ratio on consonant recognition in noise. Nine subjects with normal hearing identified CV and VC nonsense syllables in a speech-shaped noise at two SNRs (0 and +6 dB), three presentation levels (65, 80, and 95 dB SPL) and four compression ratios (1:1, 2:1, 4:1, and 6:1). Stimuli were processed through a simulated three-channel, fast-acting, wide dynamic range compression hearing aid. Consonant recognition performance decreased as compression ratio increased and presentation level increased. Interaction effects were noted between SNR and compression ratio, as well as between presentation level and compression ratio. Performance decrements due to increases in compression ratio were larger at the better (+6 dB) SNR and at the lowest (65 dB SPL) presentation level. At higher levels (95 dB SPL), such as those experienced by persons with hearing loss, increasing compression ratio did not significantly affect speech intelligibility.  相似文献   

4.
In contrast to the availability of consonant confusion studies with adults, to date, no investigators have compared children's consonant confusion patterns in noise to those of adults in a single study. To examine whether children's error patterns are similar to those of adults, three groups of children (24 each in 4-5, 6-7, and 8-9 yrs. old) and 24 adult native speakers of American English (AE) performed a recognition task for 15 AE consonants in /ɑ/-consonant-/ɑ/ nonsense syllables presented in a background of speech-shaped noise. Three signal-to-noise ratios (SNR: 0, +5, and +10 dB) were used. Although the performance improved as a function of age, the overall consonant recognition accuracy as a function of SNR improved at a similar rate for all groups. Detailed analyses using phonetic features (manner, place, and voicing) revealed that stop consonants were the most problematic for all groups. In addition, for the younger children, front consonants presented in the 0 dB SNR condition were more error prone than others. These results suggested that children's use of phonetic cues do not develop at the same rate for all phonetic features.  相似文献   

5.
The responses of four high-spontaneous fibers from a damaged cat cochlea responding to naturally uttered consonant-vowel (CV) syllables [m], [p], and [t], each with [a], [i], and [u] in four different levels of noise were simulated using a two-stage computer model. At the lowest noise level [+30 dB signal-to-noise (S/N) ratio], the responses of the models of the three fibers from a heavily damaged portion of the cochlea [characteristic frequencies (CFs) from 1.6 to 2.14 kHz] showed quite different response patterns from those of fibers in normal cochleas: There was little response to the noise alone, the consonant portions of the syllables evoked small-amplitude wide-bandwidth complexes, and the vowel-segment response synchrony was often masked by low-frequency components, especially the first formant. At the next level of noise (S/N = 20 dB), spectral information regarding the murmur segments of the [m] syllables was essentially lost. At the highest noise levels used (S/N = +10 and 0 dB), the noise was almost totally disruptive of coding of the spectral peaks of the consonant portions of the stop CVs. Possible implications of the results with regard to the understanding of speech by hearing-impaired listeners are discussed.  相似文献   

6.
Consonant recognition in quiet using the Nonsense Syllable Test (NST) [Resnick et al., J. Acoust. Soc. Am. Suppl. 1 58, S114 (1975)] was investigated in 62 normal hearing subjects 20 to 65 years of age at their most comfortable listening levels (MCLs) and at 8 dB above and below MCL. Although overall consonant recognition performance was high (as expected for normal listeners), the effects of age decade, relative presentation level, and NST subsets were all significant, as was the interaction of age X level. The interactions of age X NST subset, and age X subset X level were nonsignificant. These findings suggest that consonant recognition decreases with normal aging, particularly below MCL. However, the relative perceptual difficulty of the seven subtests is the same across age groups. Confusion matrices were similar across levels and age groups. Percent information transmitted for several consonant features was calculated from the confusion matrices. Older subjects showed decrements in performance primarily for the features recognized relatively less accurately by the younger subjects. The results suggest that normal hearing older individuals listening in quiet have decreased consonant recognition ability, but that their confusions are similar to those of younger persons.  相似文献   

7.
The purpose of this study is to specify the contribution of certain frequency regions to consonant place perception for normal-hearing listeners and listeners with high-frequency hearing loss, and to characterize the differences in stop-consonant place perception among these listeners. Stop-consonant recognition and error patterns were examined at various speech-presentation levels and under conditions of low- and high-pass filtering. Subjects included 18 normal-hearing listeners and a homogeneous group of 10 young, hearing-impaired individuals with high-frequency sensorineural hearing loss. Differential filtering effects on consonant place perception were consistent with the spectral composition of acoustic cues. Differences in consonant recognition and error patterns between normal-hearing and hearing-impaired listeners were observed when the stimulus bandwidth included regions of threshold elevation for the hearing-impaired listeners. Thus place-perception differences among listeners are, for the most part, associated with stimulus bandwidths corresponding to regions of hearing loss.  相似文献   

8.
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.  相似文献   

9.
Studies on consonant perception under noise conditions typically describe the average consonant error as exponential in the Articulation Index (AI). While this AI formula nicely fits the average error over all consonants, it does not fit the error for any consonant at the utterance level. This study analyzes the error patterns of six stop consonants /p, t, k, b, d, g/ with four vowels (/α/, /ε/, /I/, /ae/), at the individual consonant (i.e., utterance) level. The findings include that the utterance error is essentially zero for signal to noise ratios (SNRs) at least -2 dB, for >78% of the stop consonant utterances. For these utterances, the error is essentially a step function in the SNR at the utterance's detection threshold. This binary error dependence is consistent with the audibility of a single binary defining acoustic feature, having zero error above the feature's detection threshold. Also 11% of the sounds have high error, defined as ≥ 20% for SNRs greater than or equal to -2 dB. A grand average across many such sounds, having a natural distribution in thresholds, results in the error being exponential in the AI measure, as observed. A detailed analysis of the variance from the AI error is provided along with a Bernoulli-trials analysis of the statistical significance.  相似文献   

10.
Voice onset time (VOT) signifies the interval between consonant onset and the start of rhythmic vocal-cord vibrations. Differential perception of consonants such as /d/ and /t/ is categorical in American English, with the boundary generally lying at a VOT of 20-40 ms. This study tests whether previously identified response patterns that differentially reflect VOT are maintained in large-scale population activity within primary auditory cortex (A1) of the awake monkey. Multiunit activity and current source density patterns evoked by the syllables /da/ and /ta/ with variable VOTs are examined. Neural representation is determined by the tonotopic organization. Differential response patterns are restricted to lower best-frequency regions. Response peaks time-locked to both consonant and voicing onsets are observed for syllables with a 40- and 60-ms VOT, whereas syllables with a 0- and 20-ms VOT evoke a single response time-locked only to consonant onset. Duration of aspiration noise is represented in higher best-frequency regions. Representation of VOT and aspiration noise in discrete tonotopic areas of A1 suggest that integration of these phonetic cues occurs in secondary areas of auditory cortex. Findings are consistent with the evolving concept that complex stimuli are encoded by synchronized activity in large-scale neuronal ensembles.  相似文献   

11.
In a recent study [S. Gordon-Salant, J. Acoust. Soc. Am. 80, 1599-1607 (1986)], young and elderly normal-hearing listeners demonstrated significant improvements in consonant-vowel (CV) recognition with acoustic modification of the speech signal incorporating increments in the consonant-vowel ratio (CVR). Acoustic modification of consonant duration failed to enhance performance. The present study investigated whether consonant recognition deficits of elderly hearing-impaired listeners would be reduced by these acoustic modifications, as well as by increases in speech level. Performance of elderly hearing-impaired listeners with gradually sloping and sharply sloping sensorineural hearing losses was compared to performance of elderly normal-threshold listeners (reported previously) for recognition of a variety of nonsense syllable stimuli. These stimuli included unmodified CVs, CVs with increases in CVR, CVs with increases in consonant duration, and CVs with increases in both CVR and consonant duration. Stimuli were presented at each of two speech levels with a background of noise. Results obtained from the hearing-impaired listeners agreed with those observed previously from normal-hearing listeners. Differences in performance between the three subject groups as a function of level were observed also.  相似文献   

12.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

13.
Responses of auditory-nerve fibers in anesthetized cats to nine different spoken stop- and nasal-consonant/vowel syllables presented at 70 dB SPL in various levels of speech-shaped noise [signal-to-noise (S/N) ratios of 30, 20, 10, and 0 dB] are reported. The temporal aspects of speech encoding were analyzed using spectrograms. The responses of the "lower-spontaneous-rate" fibers (less than 20/s) were found to be more limited than those of the high-spontaneous-rate fibers. The lower-spontaneous-rate fibers did not encode noise-only portions of the stimulus at the lowest noise level (S/N = 30 dB) and only responded to the consonant if there was a formant or major spectral peak near its characteristic frequency. The fibers' responses at the higher noise levels were compared to those obtained at the lowest noise level using the covariance as a quantitative measure of signal degradation. The lower-spontaneous-rate fibers were found to preserve more of their initial temporal encoding than high-spontaneous-rate fibers of the same characteristic frequency. The auditory-nerve fibers' responses were also analyzed for rate-place encoding of the stimuli. The results are similar to those found for temporal encoding.  相似文献   

14.
Binaural speech intelligibility in noise for hearing-impaired listeners   总被引:2,自引:0,他引:2  
The effect of head-induced interaural time delay (ITD) and interaural level differences (ILD) on binaural speech intelligibility in noise was studied for listeners with symmetrical and asymmetrical sensorineural hearing losses. The material, recorded with a KEMAR manikin in an anechoic room, consisted of speech, presented from the front (0 degree), and noise, presented at azimuths of 0 degree, 30 degrees, and 90 degrees. Derived noise signals, containing either only ITD or only ILD, were generated using a computer. For both groups of subjects, speech-reception thresholds (SRT) for sentences in noise were determined as a function of: (1) noise azimuth, (2) binaural cue, and (3) an interaural difference in overall presentation level, simulating the effect of a monaural hearing acid. Comparison of the mean results with corresponding data obtained previously from normal-hearing listeners shows that the hearing impaired have a 2.5 dB higher SRT in noise when both speech and noise are presented from the front, and 2.6-5.1 dB less binaural gain when the noise azimuth is changed from 0 degree to 90 degrees. The gain due to ILD varies among the hearing-impaired listeners between 0 dB and normal values of 7 dB or more. It depends on the high-frequency hearing loss at the side presented with the most favorable signal-to-noise (S/N) ratio. The gain due to ITD is nearly normal for the symmetrically impaired (4.2 dB, compared with 4.7 dB for the normal hearing), but only 2.5 dB in the case of asymmetrical impairment. When ITD is introduced in noise already containing ILD, the resulting gain is 2-2.5 dB for all groups. The only marked effect of the interaural difference in overall presentation level is a reduction of the gain due to ILD when the level at the ear with the better S/N ratio is decreased. This implies that an optimal monaural hearing aid (with a moderate gain) will hardly interfere with unmasking through ITD, while it may increase the gain due to ILD by preventing or diminishing threshold effects.  相似文献   

15.
为了解决在数字散斑干涉技术测量时,散斑干涉相位条纹图像中大量噪声对相位解包裹结果和精度产生严重影响的问题,介绍了一种条纹正余弦分解和频域低通滤波结合的方法,实现了散斑干涉相位条纹图的高精度滤波。该方法的基本思路是在对相位图像进行滤波处理前,先将相位图通过正余弦函数进行映射转换成两幅图,分别经过频域滤波,然后再合成为相位图。这种分解频域滤波方法可以在滤波的同时,有效保留相位跳变信息。实验结果表明:与传统的图像降噪方法相比,该方法能够在保留图像“尖峰”信息的基础上,较好地滤除图像中的散斑噪声,方法简单有效,有效解决了传统滤波方法应用在相位条纹图中,相图灰度信息丢失10%~40%的问题。  相似文献   

16.
The ability to localize a click train in the frontal-horizontal plane was measured in quiet and in the presence of a white-noise masker. The experiment tested the effects of signal frequency, signal-to-noise ratio (S/N), and masker location. Clicks were low-pass filtered at 11 kHz in the broadband condition, low-pass filtered at 1.6 kHz in the low-pass condition, and bandpass filtered between 1.6 and 11 kHz in the high-pass condition. The masker was presented at either -90, 0, or +90 deg azimuth. Six signal-to-noise ratios were used, ranging from -9 to +18 dB. Results obtained with four normal-hearing listeners show that (1) for all masker locations and filtering conditions, localization accuracy remains unaffected by noise until 0-6 dB S/N and decreases at more adverse signal-to-noise ratios, (2) for all filtering conditions and at low signal-to-noise ratios, the effect of noise is greater when noise is presented at +/- 90 deg azimuth than at 0 deg azimuth, (3) the effect of noise is similar for all filtering conditions when noise is presented at 0 deg azimuth, and (4) when noise is presented at +/- 90 deg azimuth, the effect of noise is similar for the broadband and high-pass conditions, but greater for the low-pass condition. These results suggest that the low- and high-frequency cues used to localize sounds are equally affected when noise is presented at 0 deg azimuth. However, low-frequency cues are less resistant to noise than high-frequency cues when noise is presented at +/- 90 deg azimuth. When both low- and high-frequency cues are available, listeners base their decision on the cues providing the most accurate estimation of the direction of the sound source (high-frequency cues). Parallel measures of click detectability suggest that the poorer localization accuracy observed when noise is at +/- 90 deg azimuth may be caused by a reduction in the detectability of the signal at the ear ipsilateral to the noise.  相似文献   

17.
English consonant recognition in undegraded and degraded listening conditions was compared for listeners whose primary language was either Japanese or American English. There were ten subjects in each of the two groups, termed the non-native (Japanese) and the native (American) subjects, respectively. The Modified Rhyme Test was degraded either by a babble of voices (S/N = -3 dB) or by a room reverberation (reverberation time, T = 1.2 s). The Japanese subjects performed at a lower level than the American subjects in both noise and reverberation, although the performance difference in the undegraded, quiet condition was relatively small. There was no difference between the scores obtained in noise and in reverberation for either group. A limited-error analysis revealed some differences in type of errors for the groups of listeners. Implications of the results are discussed in terms of the effects of degraded listening conditions on non-native listeners' speech perception.  相似文献   

18.
针对激光位姿敏感器获得的原始点云有噪声和直接参与解算消耗星上计算资源过大问题,给出一种适用于空间非合作目标位姿测量的点云滤波和特征提取算法。应用仿真的方法分别验证了算法滤除空间随机噪声和点云降采样的有效性,验证了特征点对目标位姿变化和高斯测量噪声的鲁棒性。在非合作目标绕飞、抵近、捕获全物理试验平台上,以扫描激光位姿敏感器获得的原始点云数据为输入,验证了算法在实际空间目标位姿测量中的性能。试验结果表明,该算法实现了原始点云93.1%的降采样,节省了92.9%的位姿解算时间,可有效提升星上数据处理的效率和姿态解算的实时性。  相似文献   

19.
Speech perception requires the integration of information from multiple phonetic and phonological dimensions. A sizable literature exists on the relationships between multiple phonetic dimensions and single phonological dimensions (e.g., spectral and temporal cues to stop consonant voicing). A much smaller body of work addresses relationships between phonological dimensions, and much of this has focused on sequences of phones. However, strong assumptions about the relevant set of acoustic cues and/or the (in)dependence between dimensions limit previous findings in important ways. Recent methodological developments in the general recognition theory framework enable tests of a number of these assumptions and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A hierarchical Bayesian Gaussian general recognition theory model was fit to data from two experiments investigating identification of English labial stop and fricative consonants in onset (syllable initial) and coda (syllable final) position. The results underscore the importance of distinguishing between conceptually distinct processing levels and indicate that, for individual subjects and at the group level, integration of phonological information is partially independent with respect to perception and that patterns of independence and interaction vary with syllable position.  相似文献   

20.
Acoustic information about the place of articulation of a prevocalic nasal consonant is distributed over two distinct signal portions, the nasal murmur and the onset of the following vowel. The spectral properties of these signal portions are perceptually important, as is their relationship (the pattern of spectral change). A series of experiments was conducted to investigate to what extent relational place of articulation information derives from a peripheral auditory interaction, viz., short-term adaptation caused by the murmur. Experimental manipulations intended to disrupt the effects of such adaptation included separation of the murmur and the vowel by intervals of silence, presentation to different ears, and reversal of order. Other tests of the possible role of adaptation included manipulation of murmur duration, murmur-vowel cross splicing, and high-pass filtering of the excised vowel onset. While the results of several experiments were compatible with the peripheral adaptation hypothesis, others did not support it. An alternative hypothesis, that the manner cues provided by the murmur are crucial for accurate place judgments, was also discredited. It was concluded that, at least under good listening conditions, the perception of spectral relationships does not depend on peripheral auditory enhancement and probably rests on a central comparison process.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号