期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Pitch-based monaural segregation of reverberant speech

Roman N Wang D 《The Journal of the Acoustical Society of America》2006,120(1):458-469

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design. 相似文献

2.

Speech segregation in rooms: monaural, binaural, and interacting effects of reverberation on target and interferer

Lavandier M Culling JF 《The Journal of the Acoustical Society of America》2008,123(4):2237-2248

Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers. 相似文献

3.

The effect of overlap-masking on binaural reverberant word intelligibility

Libbey B Rogers PH 《The Journal of the Acoustical Society of America》2004,116(5):3141-3151

Reverberation interferes with the ability to understand speech in rooms. Overlap-masking explains this degradation by assuming reverberant phonemes endure in time and mask subsequent reverberant phonemes. Most listeners benefit from binaural listening when reverberation exists, indicating that the listener's binaural system processes the two channels to reduce the reverberation. This paper investigates the hypothesis that the binaural word intelligibility advantage found in reverberation is a result of binaural overlap-masking release with the reverberation acting as masking noise. The tests utilize phonetically balanced word lists (ANSI-S3.2 1989), that are presented diotically and binaurally with recorded reverberation and reverberation-like noise. A small room, 62 m3, reverberates the words. These are recorded using two microphones without additional noise sources. The reverberation-like noise is a modified form of these recordings and has a similar spectral content. It does not contain binaural localization cues due to a phase randomization procedure. Listening to the reverberant words binaurally improves the intelligibility by 6.0% over diotic listening. The binaural intelligibility advantage for reverberation-like noise is only 2.6%. This indicates that binaural overlap-masking release is insufficient to explain the entire binaural word intelligibility advantage in reverberation. 相似文献

4.

Performance of an adaptive beamforming noise reduction scheme for hearing aid applications. II. Experimental verification of the predictions

Kompis M Dillier N 《The Journal of the Acoustical Society of America》2001,109(3):1134-1143

A method to predict the amount of noise reduction which can be achieved using a two-microphone adaptive beamforming noise reduction system for hearing aids [J. Acoust. Soc. Am. 109, 1123 (2001)] is verified experimentally. 34 experiments are performed in real environments and 58 in simulated environments and the results are compared to the predictions. In all experiments, one noise source and one target signal source are present. Starting from a setting in a moderately reverberant room (reverberation time 0.42 s, volume 34 m3, distance between listener and either sound source 1 m, length of the adaptive filter 25 ms), eight different parameters of the acoustical environment and three different design parameters of the adaptive beamformer were systematically varied. For those experiments, in which the direct-to-reverberant ratios of the noise signal is +3 dB or less, the difference between the predicted and the measured improvement in signal-to-noise ratio (SNR) is -0.21+/-0.59 dB for real environments and -0.25+/-0.51 dB for simulated environments (average +/- standard deviation). At higher direct-to-reverberant ratios, SNR improvement is systematically underestimated by up to 5.34 dB. The parameters with the greatest influence on the performance of the adaptive beamformer have been found to be the direct-to-reverberant ratio of the noise source, the reverberation time of the acoustic environment, and the length of the adaptive filter. 相似文献

5.

Voice segregation by difference in fundamental frequency: evidence for harmonic cancellation

Deroche ML Culling JF 《The Journal of the Acoustical Society of America》2011,130(5):2855-2865

Two experiments investigated listeners' ability to use a difference of two semitones in fundamental frequency (F0) to segregate a target voice from harmonic complex tones, with speech-like spectral profiles. Masker partials were in random phase (experiment 1) or in sine phase (experiment 2) and stimuli were presented over headphones. Target's and masker's harmonicity were each distorted by F0 modulation and reverberation. The F0 of each source was manipulated (monotonized or modulated by 2 semitones at 5 Hz) factorially. In addition, all sources were presented from the same location in a virtual room with controlled reverberation, assigned factorially to each source. In both experiments, speech reception thresholds increased by about 2 dB when the F0 of the masker was modulated and increased by about 6 dB when, in addition to F0 modulation, the masker was reverberant. Masker partial phases did not influence the results. The results suggest that F0-segregation relies upon the masker's harmonicity, which is disrupted by rapid modulation. This effect is compounded by reverberation. In addition, F0-segregation was found to be independent of the depth of masker envelope modulations. 相似文献

6.

Perceptual validation of virtual room acoustics: Sound localisation and speech understanding

Monika Rychtáriková Tim van den Bogaert Gerrit Vermeir Jan Wouters 《Applied Acoustics》2011,(4):196-204

The reliability of algorithms for room acoustic simulations has often been confirmed on the basis of the verification of predicted room acoustical parameters. This paper presents a complementary perceptual validation procedure consisting of two experiments, respectively dealing with speech intelligibility, and with sound source front–back localisation.The evaluated simulation algorithm, implemented in software ODEON®, is a hybrid method that is based on an image source algorithm for the prediction of early sound reflection and on ray-tracing for the later part, using a stochastic scattering process with secondary sources. The binaural room impulse response (BRIR) is calculated from a simulated room impulse response where information about the arriving time, intensity and spatial direction of each sound reflection is collected and convolved with a measured Head Related Transfer Function (HRTF). The listening stimuli for the speech intelligibility and localisation tests are auralised convolutions of anechoic sound samples with measured and simulated BRIRs.Perception tests were performed with human subjects in two acoustical environments, i.e. an anechoic and reverberant room, by presenting the stimuli to subjects in a natural way, and via headphones by using two non-individualized HRTFs (artificial head and hearing aids placed on the ears of the artificial head) of both a simulated and a real room.Very good correspondence is found between the results obtained with simulated and measured BRIRs, both for speech intelligibility in the presence of noise and for sound source localisation tests. In the anechoic room an increase in speech intelligibility is observed when noise and signal are presented from sources located at different angles. This improvement is not so evident in the reverberant room, with the sound sources at 1-m distance from the listener. Interestingly, the performance of people for front–back localisation is better in the reverberant room than in the anechoic room.The correlation between people’s ability for sound source localisation on one hand, and their ability for recognition of binaurally received speech in reverberation on the other hand, is found to be weak. 相似文献

7.

Blind estimation of reverberation time

Ratnam R Jones DL Wheeler BC O'Brien WD Lansing CR Feng AS 《The Journal of the Acoustical Society of America》2003,114(5):2877-2892

The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values. 相似文献

8.

Using blind source separation techniques to improve speech recognition in bilateral cochlear implant patients

Kokkinakis K Loizou PC 《The Journal of the Acoustical Society of America》2008,123(4):2379-2390

Bilateral cochlear implants seek to restore the advantages of binaural hearing by improving access to binaural cues. Bilateral implant users are currently fitted with two processors, one in each ear, operating independent of one another. In this work, a different approach to bilateral processing is explored based on blind source separation (BSS) by utilizing two implants driven by a single processor. Sentences corrupted by interfering speech or speech-shaped noise are presented to bilateral cochlear implant users at 0 dB signal-to-noise ratio in order to evaluate the performance of the proposed BSS method. Subjects are tested in both anechoic and reverberant settings, wherein the target and masker signals are spatially separated. Results indicate substantial improvements in performance in both anechoic and reverberant settings over the subjects' daily strategies for both masker conditions and at various locations of the masker. It is speculated that such improvements are due to the fact that the proposed BSS algorithm capitalizes on the variations of interaural level differences and interaural time delays present in the mixtures of the signals received by the two microphones, and exploits that information to spatially separate the target from the masker signals. 相似文献

9.

Spatial unmasking of nearby pure-tone targets in a simulated anechoic environment 总被引：6，自引：0，他引：6

Kopco N Shinn-Cunningham BG 《The Journal of the Acoustical Society of America》2003,114(5):2856-2870

Detection thresholds were measured for different spatial configurations of 500- and 1000-Hz pure-tone targets and broadband maskers. Sources were simulated using individually measured head-related transfer functions (HRTFs) for source positions varying in both azimuth and distance. For the spatial configurations tested, thresholds ranged over 50 dB, primarily as a result of large changes in the target-to-masker ratio (TMR) with changes in target and masker locations. Intersubject differences in both HRTFs and in binaural sensitivity were large; however, the overall pattern of results was similar across subjects. As expected, detection thresholds were generally smaller when the target and masker were separated in azimuth than when they were at the same location. However, in some cases, azimuthal separation of target and masker yielded little change or even a small increase in detection threshold. Significant intersubject differences occurred as a result both of differences in monaural and binaural acoustic cues in the individualized HRTFs and of different binaural contributions to performance. Model predictions captured general trends in the pattern of spatial unmasking. However, subject-specific model predictions did not account for the observed individual differences in performance, even after taking into account individual differences in HRTF measurements and overall binaural sensitivity. These results suggest that individuals differ not only in their overall sensitivity to binaural cues, but also in how their binaural sensitivity varies with the spatial position of (and interaural differences in) the masker. 相似文献

10.

Performance of an adaptive beamforming noise reduction scheme for hearing aid applications. I. Prediction of the signal-to-noise-ratio improvement

Kompis M Dillier N 《The Journal of the Acoustical Society of America》2001,109(3):1123-1133

Adaptive beamformers have been proposed as noise reduction schemes for conventional hearing aids and cochlear implants. A method to predict the amount of noise reduction that can be achieved by a two-microphone adaptive beamformer is presented. The prediction is based on a model of the acoustic environment in which the presence of one acoustic target-signal source and one acoustic noise source in a reverberant enclosure is assumed. The acoustic field is sampled using two omnidirectional microphones mounted close to the ears of a user. The model takes eleven different parameters into account, including reverberation time and size of the room, directionality of the acoustic sources, and design parameters of the beamformer itself, including length of the adaptive filter and delay in the target signal path. An approximation to predict the achievable signal-to-noise improvement based on the model is presented. Potential applications as well as limitations of the proposed prediction method are discussed and a FORTRAN subroutine to predict the achievable signal-to-noise improvement is provided. Experimental verification of the predictions is provided in a companion paper [J. Acoust. Soc. Am. 109, 1134 (2001)]. 相似文献

11.

Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

Helms Tillery K Brown CA Bacon SP 《The Journal of the Acoustical Society of America》2012,131(1):416-423

Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. 相似文献

12.

Binaural segregation in multisource reverberant environments

Roman N Srinivasan S Wang D 《The Journal of the Acoustical Society of America》2006,120(6):4040-4051

In a natural environment, speech signals are degraded by both reverberation and concurrent noise sources. While human listening is robust under these conditions using only two ears, current two-microphone algorithms perform poorly. The psychological process of figure-ground segregation suggests that the target signal is perceived as a foreground while the remaining stimuli are perceived as a background. Accordingly, the goal is to estimate an ideal time-frequency (T-F) binary mask, which selects the target if it is stronger than the interference in a local T-F unit. In this paper, a binaural segregation system that extracts the reverberant target signal from multisource reverberant mixtures by utilizing only the location information of target source is proposed. The proposed system combines target cancellation through adaptive filtering and a binary decision rule to estimate the ideal T-F binary mask. The main observation in this work is that the target attenuation in a T-F unit resulting from adaptive filtering is correlated with the relative strength of target to mixture. A comprehensive evaluation shows that the proposed system results in large SNR gains. In addition, comparisons using SNR as well as automatic speech recognition measures show that this system outperforms standard two-microphone beamforming approaches and a recent binaural processor. 相似文献

13.

Tuning in the spatial dimension: evidence from a masked speech identification task

Marrone N Mason CR Kidd G 《The Journal of the Acoustical Society of America》2008,124(2):1146-1158

Spatial release from masking was studied in a three-talker soundfield listening experiment. The target talker was presented at 0 degrees azimuth and the maskers were either colocated or symmetrically positioned around the target, with a different masker talker on each side. The symmetric placement greatly reduced any "better ear" listening advantage. When the maskers were separated from the target by +/-15 degrees , the average spatial release from masking was 8 dB. Wider separations increased the release to more than 12 dB. This large effect was eliminated when binaural cues and perceived spatial separation were degraded by covering one ear with an earplug and earmuff. Increasing reverberation in the room increased the target-to-masker ratio (TM) for the separated, but not colocated, conditions reducing the release from masking, although a significant advantage of spatial separation remained. Time reversing the masker speech improved performance in both the colocated and spatially separated cases but lowered TM the most for the colocated condition, also resulting in a reduction in the spatial release from masking. Overall, the spatial tuning observed appears to depend on the presence of interaural differences that improve the perceptual segregation of sources and facilitate the focus of attention at a point in space. 相似文献

14.

改进次最佳检测在侧扫声呐底混响抑制中的应用*

下载免费PDF全文

马龙双许枫刘佳蒋立军《应用声学》2021,40(1):147-148

侧扫声呐进行沉底小目标探测时,底混响是主要背景干扰。底混响通常是一种非平稳、非高斯的带限噪声,它使得白噪声条件下的滤波器性能受到限制。在混响背景下常利用自回归模型对接收信号进预行白化处理,但对于实际侧扫声呐应用,白化后直接匹配滤波的处理效果不甚理想。针对此问题,在自回归模型预白化的基础上,提出采用一种次最佳检测与多分辨二分奇异值分解相结合的改进方法。该方法首先对接收信号进行分段处理,利用改进Burg算法估计每段数据自回归模型的系数及阶数;然后构造白化滤波器对分段数据预白化,并对白化后的数据进行多分辨二分奇异值分解;最后应用ostu方法对原始声图和处理后的声图进行目标检测。仿真与实验结果表明,该方法明显提高了信混比,改善了侧扫声呐沉底静态小目标的成图质量,有利于后期实现基于图像的目标自动检测。相似文献

15.

Prediction of the influence of reverberation on binaural speech intelligibility in noise and in quiet

Rennies J Brand T Kollmeier B 《The Journal of the Acoustical Society of America》2011,130(5):2999-3012

Reverberation usually degrades speech intelligibility for spatially separated speech and noise sources since spatial unmasking is reduced and late reflections decrease the fidelity of the received speech signal. The latter effect could not satisfactorily be predicted by a recently presented binaural speech intelligibility model [Beutelmann et al. (2010). J. Acoust. Soc. Am. 127, 2479-2497]. This study therefore evaluated three extensions of the model to improve its predictions: (1) an extension of the speech intelligibility index based on modulation transfer functions, (2) a correction factor based on the room acoustical quantity "definition," and (3) a separation of the speech signal into useful and detrimental parts. The predictions were compared to results of two experiments in which speech reception thresholds were measured in a reverberant room in quiet and in the presence of a noise source for listeners with normal hearing. All extensions yielded better predictions than the original model when the influence of reverberation was strong, while predictions were similar for conditions with less reverberation. Although model (3) differed substantially in the assumed interaction of binaural processing and early reflections, its predictions were very similar to model (2) that achieved the best fit to the data. 相似文献

16.

Reverberation cancellation in a closed test section of a wind tunnel using a multi-microphone cesptral method

D. Blacodon J. Bulté 《Journal of sound and vibration》2014

Nowadays, although aerodynamic data are still primarily sought after during wind tunnel tests, reliable acoustic measurements also become a priority for aircraft designers. In order to gather both kinds of data, aerodynamic and acoustic tests are carried out simultaneously under the same closed test section. This solution has two major drawbacks: the acoustic signals delivered by microphones may be corrupted by the boundary layer expanding on the wind tunnel walls and by the reverberant noise originating from reflective surfaces. Technological solutions can be deployed to reduce the corruption of the signals by the wind tunnel background noise. Methods based on the power cepstrum can be used to reduce reverberation effects by removing the quefrencies due to the echoes in the cepstral domain. 相似文献

17.

Speech segregation in rooms: effects of reverberation on both target and interferer

Lavandier M Culling JF 《The Journal of the Acoustical Society of America》2007,122(3):1713

Speech reception thresholds were measured to investigate the influence of a room on speech segregation between a spatially separated target and interferer. The listening tests were realized under headphones. A room simulation allowed selected positioning of the interferer and target, as well as varying the absorption coefficient of the room internal surfaces. The measurements involved target sentences and speech-shaped noise or 2-voice interferers. Four experiments revealed that speech segregation in rooms was not only dependent on the azimuth separation of sound sources, but also on their direct-to-reverberant energy ratio at the listening position. This parameter was varied for interferer and target independently. Speech intelligibility decreased as the direct-to-reverberant ratio of sources was degraded by sound reflections in the room. The influence of the direct-to-reverberant ratio of the interferer was in agreement with binaural unmasking theories, through its effect on interaural coherence. The effect on the target occurred at higher levels of reverberation and was explained by the intrinsic degradation of speech intelligibility in reverberation. 相似文献

18.

强吸收-强反射型听音室声场的有限元优化 总被引：1，自引：0，他引：1

朱晓天籍仙蓉朱冬冬《声学学报》2009,34(4):355-361

为获得听音室预期的平直混响时间和合适的房间形状,提出了一种结合强吸收-强反射概念和有限元优化技术的设计方法,并进行了实验验证。对按照优化设计结果所搭建的听音室的房间频率响应、混响时间进行了实际测量,实测数据显示了与解析计算、数值模拟结果有着较好的吻合。在整个设计频段内混响时间特性曲线平直,优化后房间低频段两个倍频程内频率响应的标准偏差可降低约6dB。1/3倍频程中心频率上实测混响时间的方均根值和设计值的偏差在63至4 kHz的6个倍频程内仅为0.02 s。相似文献

19.

一种基于最大似然的混响时间盲估计方法_*

吴礼福王华程义郭业才《应用声学》2016,35(4):288-293

混响是室内声学中的重要现象,在室内设计与音频信号处理中都需要测量或估计混响时间。本文改进了一种基于最大似然估计的混响时间盲估计方法,即采用说话人在房间中自然说话时发出的混响语音信号来估计混响时间的方法。该方法首先确定语音衰减段的最优边界,其次计算该衰减段的两个额外参数,据此筛选出符合条件的语音段,最后将满足条件的语音段采用最大似然估计得到混响时间估计值。在五个不同混响时间条件下的仿真表明,与已有方法相比,改进方法估计的混响时间同真实混响时间的偏差更小,方差更低,估计准确性较高。相似文献

20.

Reverberation time measurements in non-diffuse acoustic field by the modal reverberation time

Andrea Prato Federico Casassa Alessandro Schiavi 《Applied Acoustics》2016

The increasing presence of low frequency sources and the lack of acoustic standard measurement procedures make the extension of reverberation time measurements to frequencies below 100 Hz necessary. In typical ordinary rooms with volumes between 30 m³ and 200 m³ the sound field is non-diffuse at such low frequencies, entailing inhomogeneities in space and frequency domains. Presence of standing waves is also the main cause of bad quality of listening in terms of clarity and rumble effects. Since standard measurements according to ISO 3382 fail to achieve accurate and precise values in third octave bands due to non-linear decays caused by room modes, a new approach based on reverberation time measurements of single resonant frequencies (the modal reverberation time) has been introduced. From background theory, due to the intrinsic relation between modal decays and half bandwidth of resonant frequencies, two measurement methods have been proposed together with proper measurement procedures: a direct method based on interrupted source signal method, and an indirect method based on half bandwidth measurements. With microphones placed at corners of rectangular rooms in order to detect all modes and maximize SNRs, different source signals were tested. Anti-resonant sine waves and sweep signal turned out to be the most suitable for direct and indirect measurement methods respectively. From spatial measurements in an empty rectangular test room, comparison between direct and indirect methods showed good and significant agreements. This is the first experimental validation of the relation between resonant half bandwidth and modal reverberation time. Furthermore, comparisons between means and standard deviations of modal reverberation times and standard reverberation times in third octave bands confirm the inadequacy of standard procedure to get accurate and precise values at low frequencies with respect to the modal approach. Modal reverberation time measurements applied to furnished ordinary rooms confirm previous results in the limit of modal sound field: for highly damped modes due to furniture or acoustic treatment, the indirect method is not applicable due to strong suppression of modes and the consequent deviation of the acoustic field from a non-diffuse condition to a damped modal condition, while standard reverberation times align with direct method values. In the future, further investigations will be necessary in different rooms to improve uncertainty evaluation. 相似文献