期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Speech segregation in rooms: monaural, binaural, and interacting effects of reverberation on target and interferer

Lavandier M Culling JF 《The Journal of the Acoustical Society of America》2008,123(4):2237-2248

Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers. 相似文献

2.

Reverberation time and maximum background-noise level for classrooms from a comparative study of speech intelligibility metrics 总被引：2，自引：0，他引：2

Bistafa SR Bradley JS 《The Journal of the Acoustical Society of America》2000,107(2):861-875

Speech intelligibility metrics that take into account sound reflections in the room and the background noise have been compared, assuming diffuse sound field. Under this assumption, sound decays exponentially with a decay constant inversely proportional to reverberation time. Analytical formulas were obtained for each speech intelligibility metric providing a common basis for comparison. These formulas were applied to three sizes of rectangular classrooms. The sound source was the human voice without amplification, and background noise was taken into account by a noise-to-signal ratio. Correlations between the metrics and speech intelligibility are presented and applied to the classrooms under study. Relationships between some speech intelligibility metrics were also established. For each noise-to-signal ratio, the value of each speech intelligibility metric is maximized for a specific reverberation time. For quiet classrooms, the reverberation time that maximizes these speech intelligibility metrics is between 0.1 and 0.3 s. Speech intelligibility of 100% is possible with reverberation times up to 0.4-0.5 s and this is the recommended range. The study suggests "ideal" and "acceptable" maximum background-noise level for classrooms of 25 and 20 dB, respectively, below the voice level at 1 m in front of the talker. 相似文献

3.

A study on the optimal English speech level for Chinese listeners in classrooms

Ming Qin Xuhao DuJiancheng Tao Xiaojun Qiu 《Applied Acoustics》2016

Speech intelligibility in classrooms affects the learning efficiency of students directly, especially for the students who are using a second language. The speech intelligibility value is determined by many factors such as speech level, signal to noise ratio, and reverberation time in the rooms. This paper investigates the contributions of these factors with subjective tests, especially speech level, which is required for designing the optimal gain for sound amplification systems in classrooms. The test material was generated by mixing the convolution output of the English Coordinate Response Measure corpus and the room impulse responses with the background noise. The subjects are all Chinese students who use English as a second language. It is found that the speech intelligibility increases first and then decreases with the increase of speech level, and the optimal English speech level is about 71 dBA in classrooms for Chinese listeners when the signal to noise ratio and the reverberation time keep constant. Finally, a regression equation is proposed to predict the speech intelligibility based on speech level, signal to noise ratio, and reverberation time. 相似文献

4.

Binaural prediction of speech intelligibility in reverberant rooms with multiple noise sources

Lavandier M Jelfs S Culling JF Watkins AJ Raimond AP Makin SJ 《The Journal of the Acoustical Society of America》2012,131(1):218-231

When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs. 相似文献

5.

The benefit of binaural hearing in a cocktail party: effect of location and type of interferer

Hawley ML Litovsky RY Culling JF 《The Journal of the Acoustical Society of America》2004,115(2):833-843

The "cocktail party problem" was studied using virtual stimuli whose spatial locations were generated using anechoic head-related impulse responses from the AUDIS database [Blauert et al., J. Acoust. Soc. Am. 103, 3082 (1998)]. Speech reception thresholds (SRTs) were measured for Harvard IEEE sentences presented from the front in the presence of one, two, or three interfering sources. Four types of interferer were used: (1) other sentences spoken by the same talker, (2) time-reversed sentences of the same talker, (3) speech-spectrum shaped noise, and (4) speech-spectrum shaped noise, modulated by the temporal envelope of the sentences. Each interferer was matched to the spectrum of the target talker. Interferers were placed in several spatial configurations, either coincident with or separated from the target. Binaural advantage was derived by subtracting SRTs from listening with the "better monaural ear" from those for binaural listening. For a single interferer, there was a binaural advantage of 2-4 dB for all interferer types. For two or three interferers, the advantage was 2-4 dB for noise and speech-modulated noise, and 6-7 dB for speech and time-reversed speech. These data suggest that the benefit of binaural hearing for speech intelligibility is especially pronounced when there are multiple voiced interferers at different locations from the target, regardless of spatial configuration; measurements with fewer or with other types of interferers can underestimate this benefit. 相似文献

6.

Effect of adding artificial reverberation to speech-like masking sound

Yusuke Hioka Jen W. TangJacky Wan 《Applied Acoustics》2016

Time-reversed speech has been known to effectively mask information for speech privacy applications. However, the annoyance and distraction caused by the time-reversed speech-like masking sound is higher than other masking sound. This study investigates the effects of adding artificial reverberation to the time-reversed speech. Subjective listening tests have been conducted to measure the intelligibility of target speech, annoyance and distraction caused by the masking sound. The experimental results suggest that adding artificial reverberation to a speech-like masking sound has a significant effect to reduce the annoyance level while maintaining the masking effectiveness of the original masking sound. A trend was also observed that the addition of artificial reverberation could reduce the level of distraction caused by the masking sound. 相似文献

7.

Chinese speech intelligibility at different speech sound pressure levels and signal-to-noise ratios in simulated classrooms

Peng Jianxin 《Applied Acoustics》2010,71(4):386-390

The speech intelligibility in classroom can be influenced by background-noise levels, speech sound pressure level (SSPL), reverberation time and signal-to-noise ratio (SNR). The relationship between SSPL and subjective Chinese Mandarin speech intelligibility and the effect of different SNRs on Chinese Mandarin speech intelligibility in the simulated classroom were investigated through room acoustical simulation, auralisation technique and subjective evaluation. Chinese speech intelligibility test signals recorded in anechoic chamber were convolved with the simulated binaural room impulse responses, and then reproduced through the headphone by different SSPLs and SNRs. The results show that Chinese Mandarin speech intelligibility scores increase with increasing of SSPLs and SNRs within a certain range in simulated classrooms. Chinese Mandarin speech intelligibility scores have no significant difference with SNRs of no less than 15 dBA under the same reverberation time condition. 相似文献

8.

Perceptual validation of virtual room acoustics: Sound localisation and speech understanding

Monika Rychtáriková Tim van den Bogaert Gerrit Vermeir Jan Wouters 《Applied Acoustics》2011,(4):196-204

The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak. 相似文献

9.

Effects of reverberation on perceptual segregation of competing voices

Culling JF Hodder KI Toh CY 《The Journal of the Acoustical Society of America》2003,114(5):2871-2876

Two experiments investigated the effect of reverberation on listeners' ability to perceptually segregate two competing voices. Culling et al. [Speech Commun. 14, 71-96 (1994)] found that for competing synthetic vowels, masked identification thresholds were increased by reverberation only when combined with modulation of fundamental frequency (F0). The present investigation extended this finding to running speech. Speech reception thresholds (SRTs) were measured for a male voice against a single interfering female voice within a virtual room with controlled reverberation. The two voices were either (1) co-located in virtual space at 0 degrees azimuth or (2) separately located at +/-60 degrees azimuth. In experiment 1, target and interfering voices were either normally intonated or resynthesized with a fixed F0. In anechoic conditions, SRTs were lower for normally intonated and for spatially separated sources, while, in reverberant conditions, the SRTs were all the same. In experiment 2, additional conditions employed inverted F0 contours. Inverted F0 contours yielded higher SRTs in all conditions, regardless of reverberation. The results suggest that reverberation can seriously impair listeners' ability to exploit differences in F0 and spatial location between competing voices. The levels of reverberation employed had no effect on speech intelligibility in quiet. 相似文献

10.

Blind estimation of reverberation time

Ratnam R Jones DL Wheeler BC O'Brien WD Lansing CR Feng AS 《The Journal of the Acoustical Society of America》2003,114(5):2877-2892

The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values. 相似文献

11.

Effect of masker type and age on speech intelligibility and spatial release from masking in children and adults

Johnstone PM Litovsky RY 《The Journal of the Acoustical Society of America》2006,120(4):2177-2189

Speech recognition in noisy environments improves when the speech signal is spatially separated from the interfering sound. This effect, known as spatial release from masking (SRM), was recently shown in young children. The present study compared SRM in children of ages 5-7 with adults for interferers introducing energetic, informational, and/or linguistic components. Three types of interferers were used: speech, reversed speech, and modulated white noise. Two female voices with different long-term spectra were also used. Speech reception thresholds (SRTs) were compared for: Quiet (target 0 degrees front, no interferer), Front (target and interferer both 0 degrees front), and Right (interferer 90 degrees right, target 0 degrees front). Children had higher SRTs and greater masking than adults. When spatial cues were not available, adults, but not children, were able to use differences in interferer type to separate the target from the interferer. Both children and adults showed SRM. Children, unlike adults, demonstrated large amounts of SRM for a time-reversed speech interferer. In conclusion, masking and SRM vary with the type of interfering sound, and this variation interacts with age; SRM may not depend on the spectral peculiarities of a particular type of voice when the target speech and interfering speech are different sex talkers. 相似文献

12.

Pitch-based monaural segregation of reverberant speech

Roman N Wang D 《The Journal of the Acoustical Society of America》2006,120(1):458-469

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design. 相似文献

13.

The effect of overlap-masking on binaural reverberant word intelligibility

Libbey B Rogers PH 《The Journal of the Acoustical Society of America》2004,116(5):3141-3151

Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation. 相似文献

14.

The spatial unmasking of speech: evidence for better-ear listening

Edmonds BA Culling JF 《The Journal of the Acoustical Society of America》2006,120(3):1539-1545

Speech reception thresholds (SRTs) were measured for target speech presented concurrently with interfering speech (spoken by a different speaker). In experiment 1, the target and interferer were divided spectrally into high- and low-frequency bands and presented over headphones in three conditions: monaural, dichotic (target and interferer to different ears), and swapped (the low-frequency target band and the high-frequency interferer band were presented to one ear, while the high-frequency target band and the low-frequency interferer band were presented to the other ear). SRTs were highest in the monaural condition and lowest in the dichotic condition; SRTs in the swapped condition were intermediate. In experiment 2, two new conditions were devised such that one target band was presented in isolation to one ear while the other band was presented at the other ear with the interferer. The pattern of SRTs observed in experiment 2 suggests that performance in the swapped condition reflects the intelligibility of the target frequency bands at just one ear; the auditory system appears unable to exploit advantageous target-to-interferer ratios at different ears when segregating target speech from a competing speech interferer. 相似文献

15.

Subjective scaling of spatial room acoustic parameters influenced by visual environmental cues

Valente DL Braasch J 《The Journal of the Acoustical Society of America》2010,128(4):1952-1964

Although there have been numerous studies investigating subjective spatial impression in rooms, only a few of those studies have addressed the influence of visual cues on the judgment of auditory measures. In the psychophysical study presented here, video footage of five solo music/speech performers was shown for four different listening positions within a general-purpose space. The videos were presented in addition to the acoustic signals, which were auralized using binaural room impulse responses (BRIR) that were recorded in the same general-purpose space. The participants were asked to adjust the direct-to-reverberant energy ratio (D/R ratio) of the BRIR according to their expectation considering the visual cues. They were also directed to rate the apparent source width (ASW) and listener envelopment (LEV) for each condition. Visual cues generated by changing the sound-source position in the multi-purpose space, as well as the makeup of the sound stimuli affected the judgment of spatial impression. Participants also scaled the direct-to-reverberant energy ratio with greater direct sound energy than was measured in the acoustical environment. 相似文献

16.

Feasibility of subjective speech intelligibility assessment based on auralization

Jianxin Peng 《Applied Acoustics》2005,66(5):591-601

Subjective speech intelligibility can be assessed by speech recorded in an anechoic chamber and then convolved with room impulse responses that can be created by acoustic simulation. The speech intelligibility (SI) assessment based on auralization was validated in three rooms. The articulation scores obtained from simulated sound field were compared with the ones from measured sound field and from direct listening in rooms. Results show that the speech intelligibility prediction based on auralization technique with simulated binaural room impulse responses (BRIRs) is in agreement with reality and results from measured BRIRs. When this technique is used with simulated and measured monaural room impulse responses (MRIRs), the predicted results underestimate the reality. It has been shown that auralization technique with simulated BRIRs is capable of assessing subjective speech intelligibility of listening positions in the room. 相似文献

17.

Speech intelligibility and localization in a multi-source environment. 总被引：1，自引：0，他引：1

M L Hawley R Y Litovsky H S Colburn 《The Journal of the Acoustical Society of America》1999,105(6):3436-3448

Natural environments typically contain sound sources other than the source of interest that may interfere with the ability of listeners to extract information about the primary source. Studies of speech intelligibility and localization by normal-hearing listeners in the presence of competing speech are reported on in this work. One, two or three competing sentences [IEEE Trans. Audio Electroacoust. 17(3), 225-246 (1969)] were presented from various locations in the horizontal plane in several spatial configurations relative to a target sentence. Target and competing sentences were spoken by the same male talker and at the same level. All experiments were conducted both in an actual sound field and in a virtual sound field. In the virtual sound field, both binaural and monaural conditions were tested. In the speech intelligibility experiment, there were significant improvements in performance when the target and competing sentences were spatially separated. Performance was similar in the actual sound-field and virtual sound-field binaural listening conditions for speech intelligibility. Although most of these improvements are evident monaurally when using the better ear, binaural listening was necessary for large improvements in some situations. In the localization experiment, target source identification was measured in a seven-alternative absolute identification paradigm with the same competing sentence configurations as for the speech study. Performance in the localization experiment was significantly better in the actual sound-field than in the virtual sound-field binaural listening conditions. Under binaural conditions, localization performance was very good, even in the presence of three competing sentences. Under monaural conditions, performance was much worse. For the localization experiment, there was no significant effect of the number or configuration of the competing sentences tested. For these experiments, the performance in the speech intelligibility experiment was not limited by localization ability. 相似文献

18.

The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2002,112(2):664-676

Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source. 相似文献

19.

Speech intelligibility studies in classrooms 总被引：2，自引：0，他引：2

J S Bradley 《The Journal of the Acoustical Society of America》1986,80(3):846-854

Speech intelligibility tests and acoustical measurements were made in ten occupied classrooms. Octave-band measurements of background noise levels, early decay times, and reverberation times, as well as various early/late sound ratios, and the center time were obtained. Various octave-band useful/detrimental ratios were calculated along with the speech transmission index. The interrelationships of these measures were considered to evaluate which were most appropriate in classrooms, and the best predictors of speech intelligibility scores were identified. From these results ideal design goals for acoustical conditions for classrooms were determined either in terms of the 50-ms useful/detrimental ratios or from combinations of the reverberation time and background noise level. 相似文献

20.

波场合成中声像感知距离重建

李娟李军锋颜永红《声学学报》2013,38(6):743-748

声像距离是声像位置的重要信息,为了完整地描述重建声场中感知声像位置,需在虚拟声场中重建出声像距离。本文基于人耳对距离的感知机理,将镜像法引入波场合成技术中,通过模拟反射声来控制感知声像距离。基于"比较判断律"的主观测听实验表明:通过模拟房间反射声受试者更明显地感受到声像距离的变化,并且变化规律与人耳对距离的感知机理相符合。因此在基于波场合成的声重放系统中,通过模拟反射声和有效地控制反射声能量可以准确地重建出声像距离。相似文献