首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was derived from the speech intelligibility results and the efficiency of the ERs was determined as the ratio of the useful ER energy to the total ER energy. Even though ER energy contributed to speech intelligibility, DS energy was always more efficient, leading to better speech intelligibility for both groups of listeners. The efficiency loss for the ERs was mainly ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed.  相似文献   

2.
The ability of subjects to identify vowels in vibrotactile transformations of consonant-vowel syllables was measured for two types of displays: a spectral display (frequency by intensity), and a vocal tract area function display (vocal tract location by cross-sectional area). Both displays were presented to the fingertip via the tactile display of the Optacon transducer. In the first experiments the spectral display was effective for identifying vowels in /b/V/ context when as many as 24 or as few as eight spectral channels were presented to the skin. However, performance fell when the 12- and 8-channel displays were reduced in size to occupy 1/2 or 1/3 of the 24-row tactile matrix. The effect of reducing the size of the display was greater when the spectrum was represented as a solid histogram ("filled" patterns) than when it was represented as a simple spectral contour ("unfilled" patterns). Spatial masking within the filled pattern was postulated as the cause for this decline in performance. Another experiment measured the utility of the spectral display when the syllables were produced by multiple speakers. The resulting increase in response confusions was primarily attributable to variations in the tactile patterns caused by differences in vocal tract resonances among the speakers. The final experiment found an area function display to be inferior to the spectral display for identification of vowels. The results demonstrate that a two-dimensional spectral display is worthy of further development as a basic vibrotactile display for speech.  相似文献   

3.
We study the influence of some types of reflections on the oscillatory processes in a gyrotron. The oscillation stability conditions in the presence of a reflected signal are estimated, the processes in a gyrotron with a fixed structure of the HF field are simulated numerically, and the enrichment of the signal spectrum in the presence of reflections is studied. Institute of Applied Physics of the Russian Academy of Sciences, Nizhny Novogorod, Russia. Translated from Izvestiya Vysshikh Uchebnykh Zavedenii, Radiofizika, Vol. 41, No. 10, pp. 1348–1357, october 1998.  相似文献   

4.
Animals live in cluttered auditory environments, where sounds arrive at the two ears through several paths. Reflections make sound localization difficult, and it is thought that the auditory system deals with this issue by isolating the first wavefront and suppressing later signals. However, in many situations, reflections arrive too early to be suppressed, for example, reflections from the ground in small animals. This paper examines the implications of these early reflections on binaural cues to sound localization, using realistic models of reflecting surfaces and a spherical model of diffraction by the head. The fusion of direct and reflected signals at each ear results in interference patterns in binaural cues as a function of frequency. These cues are maximally modified at frequencies related to the delay between direct and reflected signals, and therefore to the spatial location of the sound source. Thus, natural binaural cues differ from anechoic cues. In particular, the range of interaural time differences is substantially larger than in anechoic environments. Reflections may potentially contribute binaural cues to distance and polar angle when the properties of the reflecting surface are known and stable, for example, for reflections on the ground.  相似文献   

5.
The contribution of extraneous sounds to the perceptual estimation of the first-formant (F1) frequency of voiced vowels was investigated using a continuum of vowels perceived as changing from/I/to/epsilon/as F1 was increased. Any phonetic effects of adding extraneous sounds were measured as a change in the position of the phoneme boundary on the continuum. Experiments 1-5 demonstrated that a pair of extraneous tones, mistuned from harmonic values of the fundamental frequency of the vowel, could influence perceived vowel quality when added in the F1 region. Perceived F1 frequency was lowered when the tones were added on the lower skirt of F1, and raised when they were added on the upper skirt. Experiments 6 and 7 demonstrated that adding a narrow-band noise in the F1 region could produce a similar pattern of boundary shifts, despite the differences in temporal properties and timbre between a noise band and a voiced vowel. The data are interpreted using the concept of the harmonic sieve [Duifhuis et al., J. Acoust. Soc. Am. 71, 1568-1580 (1982)]. The results imply a partial failure of the harmonic sieve to exclude extraneous sounds from the perceptual estimation of F1 frequency. Implications for the nature of the hypothetical harmonic sieve are discussed.  相似文献   

6.
早期声在厅堂中的分布   总被引:2,自引:0,他引:2  
蒋国荣  王季卿 《声学学报》2000,25(3):193-200
厅堂内早期反射声能分布与厅堂的容积、体形、吸声布置及接收点的位置等有关,其形成的早期声场与完全扩散声场有较大的差别。因而有关与早期反射声能相关的音质参量在厅堂中的分布情况无法用扩散场的理论来预计。本文通过对四个厅的测量结果,来分析早期反射声能在厅堂中分布的特性。  相似文献   

7.
This study examined the effects of mild-to-moderate sensorineural hearing loss on vowel perception abilities of young, hearing-impaired (YHI) adults. Stimuli were presented at a low conversational level with a flat frequency response (approximately 60 dB SPL), and in two gain conditions: (a) high level gain with a flat frequency response (95 dB SPL), and (b) frequency-specific gain shaped according to each listener's hearing loss (designed to simulate the frequency response provided by a linear hearing aid to an input signal of 60 dB SPL). Listeners discriminated changes in the vowels /I e E inverted-v ae/ when F1 or F2 varied, and later categorized the vowels. YHI listeners performed better in the two gain conditions than in the conversational level condition. Performances in the two gain conditions were similar, suggesting that upward spread of masking was not seen at these signal levels for these tasks. Results were compared with those from a group of elderly, hearing-impaired (EHI) listeners, reported in Coughlin, Kewley-Port, and Humes [J. Acoust. Soc. Am. 104, 3597-3607 (1998)]. Comparisons revealed no significant differences between the EHI and YHI groups, suggesting that hearing impairment, not age, is the primary contributor to decreased vowel perception in these listeners.  相似文献   

8.
Human listeners are better able to identify two simultaneous vowels if the fundamental frequencies of the vowels are different. A computational model is presented which, for the first time, is able to simulate this phenomenon at least qualitatively. The first stage of the model is based upon a bank of bandpass filters and inner hair-cell simulators that simulate approximately the most relevant characteristics of the human auditory periphery. The output of each filter/hair-cell channel is then autocorrelated to extract pitch and timbre information. The pooled autocorrelation function (ACF) based on all channels is used to derive a pitch estimate for one of the component vowels from a signal composed of two vowels. Individual channel ACFs showing a pitch peak at this value are combined and used to identify the first vowel using a template matching procedure. The ACFs in the remaining channels are then combined and used to identify the second vowel. Model recognition performance shows a rapid improvement in correct vowel identification as the difference between the fundamental frequencies of two simultaneous vowels increases from zero to one semitone in a manner closely resembling human performance. As this difference increases up to four semitones, performance improves further only slowly, if at all.  相似文献   

9.
10.
I.IntroductionLoudnessisoncofthedistinguishingcharacteristicsinauditoriumacoustics,buthasreceivedlessattentionthanotherparametersinthepast.Genera11ytheresu1tingloudness,orthetotalenergylevelofasteadysoundsourceinahallcanbesimplypredictedbythesumofthedircctsoundandthcreverberantsound.Asspeechandmusic,areoftransicntcharacteristics,theperceivedloudnessinahallismorecomplicatedandthustheaboveprc-dichonisnottrue.Accordingtotheintegratingabilityoftheear,onlythcdirectsoundandthecarlyreflections,i.c.,…  相似文献   

11.
The effects of vowels on voice perturbation measures   总被引:1,自引:0,他引:1  
This study examines voice perturbation parameters of the sustained [a] in English and of the eight vowels in Turkish to discover whether any difference exists between these languages, and whether a correlation exists between voice perturbation parameters and articulatory and acoustic properties of the Turkish vowels. Eight Turkish vowels uttered by 26 healthy nonsmoker volunteer males who are native Turkish speakers were compared with a voice database that includes samples of normal and disordered voices belonging to American English speakers. Fundamental frequencies, the first and second formants, and perturbation parameters, such as jitter percent, pitch perturbation quotient, shimmer percent, and amplitude perturbation quotient of the sustained vowels, were measured. Also, the first and second formants of the sustained [a] in English were measured, and other parameters have been obtained from the database. When the voice perturbation parameters in Turkish and English were compared, statistically significant differences were not found. However, when Turkish vowels compared with each other, statistically significant differences were found among perturbation values. Categorical comparisons of the Turkish vowels like high-low, rounded-unrounded, and front-back revealed significant differences in perturbation values. In correlation analysis, a weak linear inverse relation between jitter percent and the first formant (r=-0.260, p<0.05) was found.  相似文献   

12.
Listeners are more likely to hear a synthetic fricative ambiguous between /s/ and /integral/ as /integral/ if it is appended to a woman's voice than a man's voice [Strand and Johnson, in Natural Language Processing and Speech Technology: Results of the 3rd KONVENS Conference (Mouton de Gruyter, Berlin, 1996), pp. 14-26]. This study expanded on this finding by replicating the result with a much larger group of male and female talkers than had been examined previously, by examining whether phonetic context mediates the influence of talker sex on fricative identification, and by examining whether talkers' perceived sexual orientation influences fricative identification. Stimuli were created by pairing a synthetic nine-step /s/-/integral/ continuum with tokens of /ae k/ and /Ip/ taken from productions of shack and ship by 44 talkers whose perceived sexual orientation had been reported previously [Munson et al., J. Phonetics (in press)]. Listeners participated in a series of two-alternative sack-shack and sip-ship identification experiments. Listeners identified more /integral/ tokens for women's voices than for men's voices for both continua. Lesbian/bisexual-sounding women elicited more sack and sip responses than heterosexual-sounding women. No consistent influence of perceived sexual orientation on fricative identification was noted for men's voices. Results suggest that listeners are sensitive to the association between fricatives' center frequencies and perceived sexual orientation in women's voices, but not in men's voices.  相似文献   

13.
14.
This paper presents the results of new studies based on speech intelligibility tests in simulated sound fields and analyses of impulse response measurements in rooms used for speech communication. The speech intelligibility test results confirm the importance of early reflections for achieving good conditions for speech in rooms. The addition of early reflections increased the effective signal-to-noise ratio and related speech intelligibility scores for both impaired and nonimpaired listeners. The new results also show that for common conditions where the direct sound is reduced, it is only possible to understand speech because of the presence of early reflections. Analyses of measured impulse responses in rooms intended for speech show that early reflections can increase the effective signal-to-noise ratio by up to 9 dB. A room acoustics computer model is used to demonstrate that the relative importance of early reflections can be influenced by the room acoustics design.  相似文献   

15.
The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds.  相似文献   

16.
Abnormalities in the cochlear function usually cause broadening of the auditory filters which reduces the speech intelligibility. An attempt to apply a spectral enhancement algorithm has been undertaken to improve the identification of Polish vowels by subjects with cochlear-based hearing-impairment. The identification scores of natural (unprocessed) vowels and spectrally enhanced (processed) vowels has been measured for hearing-impaired subjects. It has been found that spectral enhancement improves vowel scores by about 10% for those subjects, however, a wide variation in individual performance among subjects has been observed. The overall vowels identification scores obtained were 85% for natural vowels and 96% for spectrally enhanced vowels.  相似文献   

17.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.  相似文献   

18.
The electroreflectance spectrum of GaSe near its fundamental (2 eV) edge is reported and shown to be sensitive to multiple light reflections, which give rise to structures previously ascribed to extrinsic states. Electroabsorption is less sensitive to multiple reflection and therefore better suited to quantitative comparison with theory.  相似文献   

19.
On concert hall stages the sound traveling between players consists of the direct sound, a floor reflection and early reflections off players and objects on stage such as instruments and music stands. In smaller music ensembles, the acoustic communication between players is normally good. In larger ensembles, there is a similar situation for short distances between players. However for ensembles like a symphony orchestra, the number of players on stage results in large distances between some players with many other players sitting in between, which block the direct sound and floor reflection paths. This study investigates the sound levels on stage with and without a large orchestra present, in the absence of any stage enclosure. Sound levels within the octave bands 63-2000 Hz on an empty stage were studied analytically, while sound levels over the same frequency range with players present were investigated in a 1:25 scale model, both without and with risers on stage. The main results are presented in terms of the attenuation introduced by the orchestra, with linear models developed to describe behavior for the octave bands 500-2000 Hz.  相似文献   

20.
A "simple" dichotic pitch arises when a single narrow band possesses a different interaural configuration from a surrounding broadband noise whose interaural configuration is uniform and correlated. Such pitches were created by interaurally decorrelating a narrow band (experiment 1) or by giving a narrow band a different interaural time difference from the noise (experiment 2). Using an adaptive forced-choice procedure, listeners adjusted the interaural intensity difference of "pointers" to match their lateralization to that of the dichotic pitches. The primary determinants of lateralization were the interaural configuration of the broadband noise (experiment 1), the center frequency of the narrow band (experiment 1), and its interaural configuration (experiment 2). The ability of two computational models to predict these results was evaluated. A version of the central-spectrum model [J. Raatgever and F. A. Bilsen, J. Acoust. Soc. Am. 80, 429-441 (1986)] incorporating realistic frequency selectivity accounted for the main results of experiment 1 but not experiment 2. A new "reconstruction-comparison" model accounted for the main results of both experiments. To accommodate the variables shown to influence lateralization, this model segregates evidence of the dichotic pitch from the noise, reconstructs the cross-correlogram of the noise, and compares it with the cross-correlogram of the original stimulus.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号