期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

柏浩钧张天骐刘鉴兴叶绍鹏《声学学报》2022,47(3):394-404

针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。相似文献

2.

一种增强带噪语音可懂度的新算法

姚峰英张敏《声学学报》2002,(6)

一般的语音增强算法在强噪声环境中只能提高信噪比,不能提高可懂度。本文提出用可调节白噪声代替信号中非语音部分的语音可懂度增强处理新算法。实验证明此方法能明显改善强噪声时的语音可懂度,能对低至-10dB的带噪语音信号进行有效的可懂度增强。相似文献

3.

噪声对单耳语言可懂度的影响

王康王鹏麻乘榕毛燕蓉邱小军《声学学报》2016,41(5):776-783

单耳通信时,周边噪声对语言可懂度产生影响。针对信号侧语音信号强度70dB时,研究3种不同类型噪声下,干扰侧不同强度噪声和信号侧不同信噪比情况的语言可懂度。实验结果表明:当信号侧信噪比大于某一阈值时,干扰侧噪声对可懂度不产生显著影响,该阈值同噪声类型有关;而在信号侧低信噪比的情形下,干扰侧适当强度噪声可提高信号侧语言可懂度,最佳干扰噪声级为78—82dB,过大的干扰侧噪声级导致可懂度下降。基于心理声学和生理学的初步机理发现:噪声环境下的语音识别中,对侧耳中耳肌肉伸缩对噪声感知的抑制提高了信号侧语言可懂度。相似文献

4.

提高耳语音可懂度的非对称压缩语音增强方法

周健郑文明王青云赵力《声学学报》2014,39(4):501-508

提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。相似文献

5.

联合深度神经网络和凸优化的单通道语音增强算法 总被引：1，自引：1，他引：0

下载免费PDF全文

张晓艳张天骐葛宛营白杨柳《声学学报》2021,46(3):471-480

噪声估计的准确性直接影响语音增强算法的好坏,为提升当前语音增强算法的噪声抑制效果,有效求解无约束优化问题,提出一种联合深度神经网络(DNN)和凸优化的时频掩蔽优化算法进行单通道语音增强。首先,提取带噪语音的能量谱作为DNN的输入特征;接着,将噪声与带噪语音的频带内互相关系数(ICC Factor)作为DNN的训练目标;然后,利用DNN模型得到的互相关系数构造凸优化的目标函数;最后,联合DNN和凸优化,利用新混合共轭梯度法迭代处理初始掩蔽,通过新的掩蔽合成增强语音。仿真实验表明,在不同背景噪声的低信噪比下,相比改进前,新的掩蔽使增强语音获得了更好的对数谱距离(LSD)、主观语音质量(PESQ)、短时客观可懂度(STOI)和分段信噪比(segSNR)指标,提升了语音的整体质量并且可以有效抑制噪声。相似文献

6.

提升小波加权自相关函数的基音检测算法* 总被引：1，自引：0，他引：1

下载免费PDF全文

王晨章小兵刘美娟《应用声学》2018,37(2):201-207

随着计算机技术的发展,语音信号处理作为人机交互的重要渠道,其在复杂噪声环境下的特征值检测算法直接关系到计算机的运算效率。基音周期是语音特征值提取的重要参数之一。针对传统基音检测算法在噪声环境下检测精度低的问题,提出了一种基于自适应提升小波变换加权线性预测误差自相关函数的基音检测算法。该方法用多级提升小波近似系数加权求和的方法来弥补自相关函数随着时间延迟量的增加幅值衰减的缺陷;用线性预测误差自相关函数的方法来抑制共振峰的干扰,然后将两种方法结合来突出基音周期处的峰值。实验结果表明,与传统的自相关函数法和小波加权法相比,该方法能有效减弱共振峰的影响,突出基音周期处的峰值,提高基音周期检测精度,鲁棒性更好。相似文献

7.

用于无监督语音降噪的听觉感知鲁棒主成分分析法 总被引：2，自引：0，他引：2

下载免费PDF全文

闵刚邹霞韩伟张雄伟谭薇《声学学报》2017,42(2):246-256

针对现有稀疏低秩分解语音降噪方法对人耳听觉感知特性应用不充分、语音失真易被感知的问题,提出了一种用于语音降噪的听觉感知鲁棒主成分分析法。由于耳蜗基底膜对于频率感知具有非线性特性,该方法采用耳蜗谱图作为语噪分离的基础。此外,选用符合人耳听觉感知特性的板仓-斋田距离度量作为优化目标函数,在稀疏低秩建模过程中引入非负约束以使分解分量更符合实际物理含义,并在交替方向乘子法框架下推导了具有闭合解形式的迭代优化算法。文中方法在语音降噪时是完全无监督的,无需预先训练语音或噪声模型。多种类型噪声和不同信噪比条件下的仿真实验验证了该方法的有效性,噪声抑制效果较目前同类算法更为显著,且降噪后语音的可懂度和总体质量有所提高、至少相当。相似文献

8.

单通道语音增强算法对汉语语音可懂度影响的研究 总被引：1，自引：0，他引：1

杨琳张建平颜永红《声学学报》2010,35(2):248-253

考察了当前常用的几种单通道语音增强算法对汉语语音可懂度的影响。受不同类型噪音干扰的语音经过5种单通道语音增强算法的处理后,播放给具有正常听力水平的被试进行听辩,考察增强后语音的可懂度。实验结果表明,语音增强算法并不能改进语音的可懂度水平;通过分析具体的错误原因,发现听辩错误主要来自于音素错误,与声调关系不大;而且,同英文的辨识结果相比,一些增强算法对于中、英文可懂度影响差异显著。相似文献

9.

扩散噪声下协方差矩阵重构的语音分离与降噪

下载免费PDF全文

曾庆宁王师琦《声学学报》2021,46(5):775-784

针对传统多通道语音分离算法在扩散噪声下性能下降的问题,提出了一种用于语音分离及降噪的空间协方差模型及参数估计方法。该方法将扩散噪声视为独立声源,利用由导向矢量重构的空间协方差矩阵建模目标声源的空间特性,并通过空间协方差分析方法估计用于语音分离的多通道维纳滤波器。同时,还提出了一种联合该方法的后置滤波器参数框架,为输出信号降噪和失真的折中提供了更多选择。在扩散噪声下的单目标和多目标实验中,所提方法的语音提取和分离性能都优于对比算法,联合参数的后置滤波器可提供更为符合人们要求的降噪语音,验证了所提模型与参数估计方法的有效性。相似文献

10.

偏度最大化多通道逆滤波语音去混响研究* 总被引：1，自引：1，他引：0

下载免费PDF全文

郭颖彭任华郑成诗李晓东《应用声学》2019,38(1):58-67

房间混响会降低语音质量和语音可懂度。高阶统计量是衡量非高斯性的重要参量,基于语音非高斯特性可实现语音去混响。本文提出一种基于高阶统计量的多通道语音去混响方法,该方法首次用多通道语音信号线性预测残差的三阶统计量偏度（Skewness）构造代价函数,以去混响重建信号线性预测残差的偏度最大化为目标自适应地更新逆滤波器;同时结合语音信号的产生模型,提出基于偏度准则的线性预测与房间脉冲响应逆滤波联合估计方法,进一步提高去混响算法性能。实验结果表明,该方法相较于已有的基于线性预测残差四阶统计量峰度（Kurtosis）的方法具有更好的去混响效果,且对噪声具有更强的鲁棒性。相似文献

11.

Binaural speech intelligibility in noise for hearing-impaired listeners 总被引：2，自引：0，他引：2

A W Bronkhorst R Plomp 《The Journal of the Acoustical Society of America》1989,86(4):1374-1383

The effect of head-induced interaural time delay (ITD) and interaural level differences (ILD) on binaural speech intelligibility in noise was studied for listeners with symmetrical and asymmetrical sensorineural hearing losses. The material, recorded with a KEMAR manikin in an anechoic room, consisted of speech, presented from the front (0 degree), and noise, presented at azimuths of 0 degree, 30 degrees, and 90 degrees. Derived noise signals, containing either only ITD or only ILD, were generated using a computer. For both groups of subjects, speech-reception thresholds (SRT) for sentences in noise were determined as a function of: (1) noise azimuth, (2) binaural cue, and (3) an interaural difference in overall presentation level, simulating the effect of a monaural hearing acid. Comparison of the mean results with corresponding data obtained previously from normal-hearing listeners shows that the hearing impaired have a 2.5 dB higher SRT in noise when both speech and noise are presented from the front, and 2.6-5.1 dB less binaural gain when the noise azimuth is changed from 0 degree to 90 degrees. The gain due to ILD varies among the hearing-impaired listeners between 0 dB and normal values of 7 dB or more. It depends on the high-frequency hearing loss at the side presented with the most favorable signal-to-noise (S/N) ratio. The gain due to ITD is nearly normal for the symmetrically impaired (4.2 dB, compared with 4.7 dB for the normal hearing), but only 2.5 dB in the case of asymmetrical impairment. When ITD is introduced in noise already containing ILD, the resulting gain is 2-2.5 dB for all groups. The only marked effect of the interaural difference in overall presentation level is a reduction of the gain due to ILD when the level at the ear with the better S/N ratio is decreased. This implies that an optimal monaural hearing aid (with a moderate gain) will hardly interfere with unmasking through ITD, while it may increase the gain due to ILD by preventing or diminishing threshold effects. 相似文献

12.

Array signal processing with two outputs preserving binaural information

Ryouichi Nishimura Yôiti Suzuki Futoshi Asano 《Applied Acoustics》2004,65(7):657-672

A new type of array signal processing combined with a weighted least squares algorithm to enable two-channel output with binaural information is proposed in this paper. This algorithm may be effective for use in a binaural hearing aid because the interaural relationship can be preserved after array signal processing. Retaining spatial information on specified directions while sufficiently suppressing unnecessary ambient noise coming from directions other than those of target sounds is required for this type of algorithm. In order to satisfy these two simultaneous requirements, the proposed algorithm was derived from a constraint algorithm by employing the weighted least squares algorithm. Performance in directivity patterns as well as interaural time difference (ITD) and interaural level difference (ILD) were evaluated. Computer simulations showed that this algorithm yields robust performance in various conditions compared with array signal processing based on a constraint algorithm. 相似文献

13.

Speech perception from monaural and binaural information

Culling JF Edmonds BA Hodder KI 《The Journal of the Acoustical Society of America》2006,119(1):559-565

Two experiments explored the concept of the binaural spectrogram [Culling and Colburn, J. Acoust. Soc. Am. 107, 517-527 (2000)] and its relationship to monaurally derived information. In each experiment, speech was added to noise at an adverse signal-to-noise ratio in the NoS pi binaural configuration. The resulting monaural and binaural cues were analyzed within an array of spectro-temporal bins and then these cues were resynthesized by modulating the intensity and/or interaural correlation of freshly generated noise. Experiment 1 measured the intelligibility of the resynthesized stimuli and compared them with the original NoSo and NoS pi stimuli at a fixed signal-to-noise ratio. While NoS pi stimuli were approximately equal to 50% intelligible, each cue in isolation produced similar (very low) intelligibility to the NoSo condition. The resynthesized combination produced approximately equal to 25% intelligibility. Modulation of interaural correlation below 1.2 kHz and of amplitude above 1.2 kHz was not as effective as their combination across all frequencies. Experiment 2 measured three-point psychometric functions in which the signal-to-noise ratio of the original NoS pi stimulus was increased in 3-dB steps from the level used in experiment 1. Modulation of interaural correlation alone proved to have a flat psychometric function. The functions for NoS pi and for combined monaural and binaural cues appeared similar in slope, but shifted horizontally. The results indicate that for sentence materials, neither fluctuations in interaural correlation nor in monaural intensity are sufficient to support speech recognition at signal-to-noise ratios where 50% intelligibility is achieved in the NoS pi configuration; listeners appear to synergistically combine monaural and binaural information in this task, to some extent within the same frequency region. 相似文献

14.

The effects of spatial separation in distance on the informational and energetic masking of a nearby speech signal

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2002,112(2):664-676

Although many studies have shown that intelligibility improves when a speech signal and an interfering sound source are spatially separated in azimuth, little is known about the effect that spatial separation in distance has on the perception of competing sound sources near the head. In this experiment, head-related transfer functions (HRTFs) were used to process stimuli in order to simulate a target talker and a masking sound located at different distances along the listener's interaural axis. One of the signals was always presented at a distance of 1 m, and the other signal was presented 1 m, 25 cm, or 12 cm from the center of the listener's head. The results show that distance separation has very different effects on speech segregation for different types of maskers. When speech-shaped noise was used as the masker, most of the intelligibility advantages of spatial separation could be accounted for by spectral differences in the target and masking signals at the ear with the higher signal-to-noise ratio (SNR). When a same-sex talker was used as the masker, the intelligibility advantages of spatial separation in distance were dominated by binaural effects that produced the same performance improvements as a 4-5-dB increase in the SNR of a diotic stimulus. These results suggest that distance-dependent changes in the interaural difference cues of nearby sources play a much larger role in the reduction of the informational masking produced by an interfering speech signal than in the reduction of the energetic masking produced by an interfering noise source. 相似文献

15.

声学头模对语言清晰度测量的影响

下载免费PDF全文

章斯宇郑晓林孟子厚《声学学报》2015,40(6):894-901

借助声学头模考察了水平面不同语声源和噪声源位置对语言清晰度测量的影响,比较了有声学头模的双耳STIPA与无声学头模常规STIPA测量结果的差异,分别采用录听和现场测听方式进行了同等条件下的汉语听感清晰度主观评价实验,并分析了清晰度主客观结果的相关性。结果表明:声源位置对有声学头模的STIPA以及头模录制信号和真人现场实测的听感清晰度影响显著。无声学头模的STIPA更接近有声学头模时左右耳中较差的劣势耳的STIPA结果。单侧耳与语声源同侧或与噪声源异侧对应的单侧耳听感清晰度更高,语声源和噪声源重叠对应的双耳听感清晰度最低,声源分离可以显著提高双耳听感清晰度。头模录制信号和真人现场实测的听感清晰度与无声学头模STIPA不相关,与有声学头模的STIPA高度相关,其中单侧耳听感清晰度与该单侧耳STIPA高度相关,双耳听感清晰度与左右耳STIPA的较高值相关性最高。相似文献

16.

The influence of dummy head on measuring speech intelligibility

《声学学报：英文版》2016,(2)

In order to investigate the influence of dummy head on measuring speech intelligibility,the objective and subjective speech intelligibility evaluation experiments were respectively carried out for different spatial configurations of a target source and a noise source in the horizontal plane.The differences between standard STIPA measured without a dummy head and binaural STIPA measured with a dummy head were compared and the correlation of subjective speech intelligibility and objective STIPA was analyzed.It is showed that the position of sound source affects significantly on binaural STIPA and subjective intelligibility measured by a dummy head or measured in a real-life scenario.The standard STIPA is closer to the lower value of the two binaural STIPA values.The speech intelligibility is higher for a single ear which is on the same side with the target source or on the other side of the noise source.Binaural speech intelligibility is always the lowest when both target and noise sources are at the same place but once apart the speech intelligibility will increase sharply.It is also found that the subjective intelligibility measured by a dummy head or measured in a real-life scenario is uncorrelated with standard STIPA,but correlated highly with STIPA measured with a dummy head.The subjective intelligibility of one single ear is correlated highly with STIPA measured at the same ear,and the binaural speech intelligibility is in well agreement with the higher value of the two binaural STIPA values. 相似文献

17.

The effect of head-induced interaural time and level differences on speech intelligibility in noise 总被引：1，自引：0，他引：1

A W Bronkhorst R Plomp 《The Journal of the Acoustical Society of America》1988,83(4):1508-1516

A study was made of the effect of interaural time delay (ITD) and acoustic headshadow on binaural speech intelligibility in noise. A free-field condition was simulated by presenting recordings, made with a KEMAR manikin in an anechoic room, through earphones. Recordings were made of speech, reproduced in front of the manikin, and of noise, emanating from seven angles in the azimuthal plane, ranging from 0 degree (frontal) to 180 degrees in steps of 30 degrees. From this noise, two signals were derived, one containing only ITD, the other containing only interaural level differences (ILD) due to headshadow. Using this material, speech reception thresholds (SRT) for sentences in noise were determined for a group of normal-hearing subjects. Results show that (1) for noise azimuths between 30 degrees and 150 degrees, the gain due to ITD lies between 3.9 and 5.1 dB, while the gain due to ILD ranges from 3.5 to 7.8 dB, and (2) ILD decreases the effectiveness of binaural unmasking due to ITD (on the average, the threshold shift drops from 4.6 to 2.6 dB). In a second experiment, also conducted with normal-hearing subjects, similar stimuli were used, but now presented monaurally or with an overall 20-dB attenuation in one channel, in order to simulate hearing loss. In addition, SRTs were determined for noise with fixed ITDs, for comparison with the results obtained with head-induced (frequency dependent) ITDs.(ABSTRACT TRUNCATED AT 250 WORDS) 相似文献

18.

Binaural sluggishness in the perception of tone sequences and speech in noise

Culling JF Colburn HS 《The Journal of the Acoustical Society of America》2000,107(1):517-527

The binaural system is well-known for its sluggish response to changes in the interaural parameters to which it is sensitive. Theories of binaural unmasking have suggested that detection of signals in noise is mediated by detection of differences in interaural correlation. If these theories are correct, improvements in the intelligibility of speech in favorable binaural conditions is most likely mediated by spectro-temporal variations in interaural correlation of the stimulus which mirror the spectro-temporal amplitude modulations of the speech. However, binaural sluggishness should limit the temporal resolution of the representation of speech recovered by this means. The present study tested this prediction in two ways. First, listeners' masked discrimination thresholds for ascending vs descending pure-tone arpeggios were measured as a function of rate of frequency change in the NoSo and NoSpi binaural configurations. Three-tone arpeggios were presented repeatedly and continuously for 1.6 s, masked by a 1.6-s burst of noise. In a two-interval task, listeners determined the interval in which the arpeggios were ascending. The results showed a binaural advantage of 12-14 dB for NoSpi at 3.3 arpeggios per s (arp/s), which reduced to 3-5 dB at 10.4 arp/s. This outcome confirmed that the discrimination of spectro-temporal patterns in noise is susceptible to the effects of binaural sluggishness. Second, listeners' masked speech-reception thresholds were measured in speech-shaped noise using speech which was 1, 1.5, and 2 times the original articulation rate. The articulation rate was increased using a phase-vocoder technique which increased all the modulation frequencies in the speech without altering its pitch. Speech-reception thresholds were, on average, 5.2 dB lower for the NoSpi than for the NoSo configuration, at the original articulation rate. This binaural masking release was reduced to 2.8 dB when the articulation rate was doubled, but the most notable effect was a 6-8 dB increase in thresholds with articulation rate for both configurations. These results suggest that higher modulation frequencies in masked signals cannot be temporally resolved by the binaural system, but that the useful modulation frequencies in speech are sufficiently low (<5 Hz) that they are invulnerable to the effects of binaural sluggishness, even at elevated articulation rates. 相似文献

19.

The influence of spectral characteristics of early reflections on speech intelligibility

Arweiler I Buchholz JM 《The Journal of the Acoustical Society of America》2011,130(2):996-1005

The auditory system takes advantage of early reflections (ERs) in a room by integrating them with the direct sound (DS) and thereby increasing the effective speech level. In the present paper the benefit from realistic ERs on speech intelligibility in diffuse speech-shaped noise was investigated for normal-hearing and hearing-impaired listeners. Monaural and binaural speech intelligibility tests were performed in a virtual auditory environment where the spectral characteristics of ERs from a simulated room could be preserved. The useful ER energy was derived from the speech intelligibility results and the efficiency of the ERs was determined as the ratio of the useful ER energy to the total ER energy. Even though ER energy contributed to speech intelligibility, DS energy was always more efficient, leading to better speech intelligibility for both groups of listeners. The efficiency loss for the ERs was mainly ascribed to their altered spectrum compared to the DS and to the filtering by the torso, head, and pinna. No binaural processing other than a binaural summation effect could be observed. 相似文献

20.

Spatial unmasking of nearby speech sources in a simulated anechoic environment

Shinn-Cunningham BG Schickler J Kopco N Litovsky R 《The Journal of the Acoustical Society of America》2001,110(2):1118-1129

Spatial unmasking of speech has traditionally been studied with target and masker at the same, relatively large distance. The present study investigated spatial unmasking for configurations in which the simulated sources varied in azimuth and could be either near or far from the head. Target sentences and speech-shaped noise maskers were simulated over headphones using head-related transfer functions derived from a spherical-head model. Speech reception thresholds were measured adaptively, varying target level while keeping the masker level constant at the "better" ear. Results demonstrate that small positional changes can result in very large changes in speech intelligibility when sources are near the listener as a result of large changes in the overall level of the stimuli reaching the ears. In addition, the difference in the target-to-masker ratios at the two ears can be substantially larger for nearby sources than for relatively distant sources. Predictions from an existing model of binaural speech intelligibility are in good agreement with results from all conditions comparable to those that have been tested previously. However, small but important deviations between the measured and predicted results are observed for other spatial configurations, suggesting that current theories do not accurately account for speech intelligibility for some of the novel spatial configurations tested. 相似文献