期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

朱学文杨道淳王炜牟峰徐柏龄《声学学报》2003,28(1):12-16

首先介绍了帧同步混合小波包的分析方法。该方法结合了小波包时频窗口可变的特点和STFT的分帧处理形式。它既能够保证语音信号处理中帧长的要求,即可实时处理,义能获得对信号频域上的最佳分解,是一种类似FFT的小波包的快速算法。在此基础上,应用该方法模拟了听觉模型,并运用于语音增强。实验表明,即使在-5 dB低信噪比的条件下,也能获得良好的除噪效果和听觉效果。该方法还可运用于语音的编码、合成和识别等领域。相似文献

2.

听觉通道语音冲突大脑皮层电位的听觉认知控制特征提取方法

于波李海峰马琳王勋达《声学学报》2016,41(6):870-880

认知心理学发现,视觉、听觉接收到信息有冲突时,大脑皮层电位会发生扰动,由此可探索认知冲突控制的“刺激-反应”机制。视觉认知冲突实验较多,成果丰硕,而相应的听觉实验很少,并且得到不一样的结论。本研究利用冲突和非冲突的语音信号刺激,分析研究脑电信号,提出基于三阶段听觉认知控制的时域特征模型。研究人脑听觉通道在出现语音认知冲突时的认知控制的规律下的单次试验脑电数据特征提取方法。根据得到的认知规律,单次试验脑电样本被分成3个部分。被分割的每个阶段使用时域上的平均幅值和Lempel-Ziv复杂度(LZC)进行计算,从而联合3个阶段的特征作为听觉认知脑电样本的特征。结果表明:(1)先发现的认知冲突相关的混合脑电成分“N1-P2&N2&Late-SW”分别体现了听觉认知控制的3个阶段;(2)一个更完整的听觉认知控制过程应包括3个阶段的时域特征:感知阶段:110~140 ms,识别阶段:260~320 ms,解决阶段:500~700 ms;(3)提出针对单次听觉认知控制脑电样本的特征提取方法,联合使用平均幅度和LZC可以获得最好的识别率(99.33%)。实验结果证明了提出的方法能够有效地检测听觉认知控制脑电数据,进而提供人脑认知控制能力评价的声学方法。相似文献

3.

提高耳语音可懂度的非对称压缩语音增强方法

周健郑文明王青云赵力《声学学报》2014,39(4):501-508

提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。相似文献

4.

一种基于听觉特性的语音失真测度方法 总被引：3，自引：0，他引：3

陈国胡修林张蕴玉朱耀庭《声学学报》2000,25(5):463-467

提出了一种基于听觉特性的语音失真测度方法——感知谱失真 PSD(Perceptual Spectrum Distortion)测度,该测度方法通过模拟人的听觉特性把语音短时频谱转变为符合听觉特性的感知频谱,再以感知谱为基础来度量语音失真程度。经过对不同质量的语音进行仿真实验以及与Itakura测度方法作对比实验,结果表明PSD测度是一种与语音质量主观评价一致性较好的语音失真测度方法。相似文献

5.

传声器阵列波束比判决语音增强方法

下载免费PDF全文

曹占中纳跃跃王晓飞付强潘接林颜永红《声学学报》2017,42(4):504-512

针对单一波束形成器难以深度抑制空间相干干扰的问题,提出了一种综合了最小方差无畸变响应波束形成器与对称子阵延时求和波束形成器的语音增强方法。定义了一种波束输出比因子,根据该因子在目标声区域和干扰声区域的幅值变化,给出了采样协方差矩阵对角加载量的调整方法,并进一步利用该因子在后滤波环节对空间干扰进行判决滤波。文中对判决滤波时的上限阈值和下限阈值的实时更新方法给出了说明。所提出的算法能进一步抑制空间干扰和噪声,且可满足实时需要。在传声器圆阵上的实验表明,该方法在输出信干噪比及语音质量上,均优于经典对角加载算法及采样协方差矩阵扫描重构算法。相似文献

6.

建立在卡尔曼滤波基础上的语音增强方法

沈亚强程仲文《声学学报》1994,19(3):227-233

本文用动态系统的一种滤波方法──卡尔曼滤波对受附加噪声污染的语音信号进行增强处理的工作.所处理的附加噪声有宽带白噪声和有色噪声,且带噪语音信号是低估噪比的、无参考噪声源的.经增强处理后,语音信号的信噪比大约有7—10dB的提高. 相似文献

7.

听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法

王玥李平崔杰《声学学报》2013,38(4):501-508

为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。相似文献

8.

语音中相位的听觉感知实验研究 总被引：2，自引：0，他引：2

陈砚圃卜凡亮戴启军卞正中俞铁城《声学学报》2003,28(1):7-11

人的听觉对语音信号中相位的感知比较迟钝,因而对语音信号进行处理和编码时常常不关心相位失真。实际上,相位失真到一定程度时会明显导致语音质量的下降。为了取得高质量的声码器,语音谱分量的相位信息是不能不考虑的。本文通过主观听觉测试实验研究了语音信号的短时Fourier变换相位谱对人的听觉感知的影响。测试结果表明:(1)如果完全舍弃原相位信息,则得到的重建语音含有很强的噪声且自然度很差;(2)不论舍弃高频段还是低频段的相位信息,均能导致听觉感知差异;(3)当相位的量阶小于π/7时,人的听觉系统将分辨不出重建语音和原始语音之间存在的差异. 相似文献

9.

基于听觉模型的耳语音的声韵切分 总被引：5，自引：0，他引：5

下载免费PDF全文

丁慧栗学丽徐柏龄《应用声学》2004,23(2):20-25,44

本文分析了耳语音的特点，并根据生理声学及心理声学的基本理论与实验资料，提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次：耳蜗对声音频率的分解机理；听觉系统的时域和频域非线性变化；中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性，因而适于耳语音识别，在耳语音声韵母切分实验中得到了满意的结果。相似文献

10.

基于听觉事件检测的汉语语音声韵切分 总被引：2，自引：0，他引：2

张宝奇张连海屈丹《声学学报》2010,35(6):701-707

提出了一种基于听觉事件检测的汉语声韵母切分方法。该方法首先使用耳蜗滤波器组对语音进行滤波,然后在每个频带上检测对应于能量突变的听觉事件,最后在不同频率范围对听觉事件进行融合以确定声韵母边界。实验结果表明,对8 kHz采样的干净语音切分准确率可达到88.9%;信噪比10 dB的语音切分准确率可达到82.9%以上。相似文献

11.

Envelope expansion methods for speech enhancement

P M Clarkson S F Bahgat 《The Journal of the Acoustical Society of America》1991,89(3):1378-1382

The last decade has seen increasing interest in techniques for the enhancement of digital speech signals. Significant gains have been made in terms of signal-to-noise ratio (SNR) and quality, but few techniques have produced improvements in intelligibility. A method for speech enhancement based on nonlinear expansion of the spectral envelope is presented. The expansion is consistent with both the long-term spectrum of the speech and with the probability that speech is present in a given sample. Objective SNR measures are used to compare this algorithm with the well-known spectral subtraction method, with an alternative expansion scheme, and with limiting SNRs resulting from perfect recovery of the amplitude spectrum. For the purpose of intelligibility assessments, a simplified version of the algorithm has been implemented on a Texas Instruments TMS320-C25 system. Listening trials with this real-time system, conducted using a modified rhyme test, have produced small, but consistent, improvements in articulation scores. 相似文献

12.

Auditory discontinuities interact with categorization: implications for speech perception 总被引：2，自引：0，他引：2

Holt LL Lotto AJ Diehl RL 《The Journal of the Acoustical Society of America》2004,116(3):1763-1773

Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories. 相似文献

13.

面向自定义语音唤醒的关键词相关的单通道语音增强

下载免费PDF全文

刘作桢吴愁黎塔赵庆卫《声学学报》2023,48(2):415-424

提出一种面向自定义语音唤醒的单通道语音增强方法。该方法预先将关键词音素信息存入文本编码矩阵,并在常规语音增强模型基础上添加一个基于注意力机制的音素偏置模块。该模块利用语音增强模型中间特征从文本编码矩阵中获取当前帧的音素信息,并将其融入语音增强模型的后续计算中,从而提升语音增强模型对关键词相关音素的增强效果。在不同噪声环境下的实验结果表明,该方法可以更有效地抑制关键词部分噪声。同时所提出方法对比常规语音增强方法与其他文本相关语音增强方法,在自定义语音唤醒性能上可以分别获得14.3%和7.6%的相对提升。相似文献

14.

Auditory enhancement of changes in spectral amplitude 总被引：1，自引：0，他引：1

Q Summerfield A Sidwell T Nelson 《The Journal of the Acoustical Society of America》1987,81(3):700-708

An auditory enhancement effect occurs when one component of a harmonic series is omitted for a few hundred milliseconds and then reintroduced: The reintroduced harmonic stands out perceptually. Three experiments are reported that studied a version of this effect in which several components of a harmonic series are enhanced to define the formants of a vowel. Using the accuracy of vowel identification to measure the prominence of the formant peaks in the effective auditory representation, forms of the effect were identified that are qualitatively similar to the incremental and decremental responses seen in primary auditory-nerve fibers. These results are compatible with an origin for the enhancement effect in peripheral auditory adaptation. However, an additional mechanism is required to account for the demonstration [Viemeister and Bacon, J. Acoust. Soc. Am. 71, 1502-1507 (1982)] that enhancement can involve a true gain in the frequency region of the reintroduced component. These effects demonstrate one way in which the auditory system may attenuate the prominence of background noises while preserving the ability to represent changes in spectral amplitude produced by newly arriving signals. 相似文献

15.

自适应零限波束形成语音增强算法鲁棒性分析

楼厦厦郑成诗李晓东《声学学报》2007,32(5):468-476

分析了双传声器自适应零限波束形成语音增强算法对传声器不一致和本底噪声的鲁棒性。结果表明:信干比越高,算法对传声器不一致的鲁棒性越差;当信干比很低时,算法对相位不一致是鲁棒的。幅度不一致会降低算法对干扰的抑制能力,但引起的目标信号失真很小。相对于传声器不一致,本底噪声对算法性能的影响较小。对自适应滤波器权值加约束和对传声器做校准可以提高算法性能。相似文献

16.

An iterative noise cross-PSD estimation for two-microphone speech enhancement 总被引：2，自引：0，他引：2

Mohsen Rahmani Ahmad Akbari Beghdad Ayad 《Applied Acoustics》2009,70(3):514-521

Among various speech enhancement methods, dual-microphone methods are of a great importance for their low cost implementation and for exploiting spatial-filtering benefits of the microphone arrays. Coherence based methods are well-known as efficient two-microphone noise reduction techniques. These techniques do not work well, when received noise signals are correlated. These can be improved when the cross power spectral density (CPSD) of noise is available. In this paper, we propose an iterative approach for estimation of the noise CPSD to be employed in coherence based methods. We compare the proposed iterative noise CPSD estimation with a noise CPSD estimation technique based on voice activity detector (VAD), both of which are employed in a two-microphone speech enhancement, separately. Evaluation results show that the two-microphone speech enhancement scheme utilizing the proposed noise CPSD estimation technique performs superior than the enhancement system using the VAD-based noise CPSD estimation. 相似文献

17.

An adaptive speech enhancement method for siren noise cancellation

Hui Ding Xiaojun Qiu Boling Xu 《Applied Acoustics》2004,65(4):385-399

Siren noises usually severely disturb the intelligibility of voice communication inside the cabs of police, paramedic and fire vehicles. It is often desired that such unwanted noise can be removed from the speech signal. In this paper, a new method is proposed to adaptively cancel siren noises and enhance speech signals. Based on the characteristics of siren noises, an anti-speech filter and a time delayer are employed in the single and dual channel noise cancellation systems to reduce the siren noises. Experiment results demonstrate that the effectiveness of the proposed method for canceling the siren noises and the performance of the enhanced speech signal is satisfying. 相似文献

18.

面向语音增强的序贯隐马尔可夫模型时频语音存在概率估计

许春冬夏日升应冬文李军锋《声学学报》2014,39(5):647-654

语音存在概率的估计是语音增强的核心技术之一,针对传统的存在概率估计方法是启发式的,没有把存在概率的估计统一到一个理论框架之中,不能保证估计最优,提出了一种基于序贯隐马尔可夫模型(SHMM)的存在概率估计方法,在每一子带上构建一个SHMM模型描述对数功率谱包络的时间序列,把谱包络序列看作一个在语音和噪声状态之间转移的动态一阶马尔可夫链,采用单高斯函数构建每一状态的概率模型,语音状态的后验概率即为语音信号的存在概率。为了满足算法实时性要求,SHMM参数估计简化为一阶回归过程,根据极大似然准则逐帧更新模型参数。实验表明:SHMM所描述的时序相关性对存在概率的估计起到关键作用,它优于一般的启发式估计方法;SHMM算法的语音增强分段信噪比(SegSNR)和对数谱失真(LSD)性能优于经典的改进型最小统计量控制递归平均(IMCRA)算法。相似文献

19.

Intelligibility enhancement for noisy whispered speech using asymmetric cost function

ZHOU Jian ;ZHENG Wenming ;WANG Qingyun ;ZHAO Li 《声学学报：英文版》2014,(3):312-322

We proposed two whispered speech enhancement methods based on asymmetric cost functions in this paper to deal with the amplification and attenuation distortions of whispered speech distinctively.The modified Itakura-Saito（MIS）distance function provides more penalties to speech amplification distortion,whereas the Kullback-Leibler（KL）divergence function gives more penalties to speech attenuation distortion.The experimental results show that the MIS function based method achieves significant improvement of intelligibility in contrast to the conventional speech enhancement algorithms when the signal-to-noise ratio（SNR）falls below-6 dB,whereas the KL function based one achieves the similar result as the minimum mean square error（MMSE）speech enhancement method.The results show that the effects of the amplification and attenuation distortions on the intelligibility of the enhanced whisper are different,where larger attenuation distortion may result in better intelligibility of speech with low SNR.However,the attenuation distortion has small effects on intelligibility of speech with high SNR. 相似文献

20.

Variable step-size subband backward BSS algorithms for speech quality enhancement

《Applied Acoustics》2014

This paper addresses the problem of the speech quality improvement using adaptive filtering algorithms. Recently in Djendi and Bendoumia (2014) [1], we have proposed a new two-channel backward algorithm for noise reduction and speech intelligibility enhancement. The main drawback of proposed two-channel subband algorithm is its poor performance when the number of subband is high. This inconvenience is well seen in the steady state regime values. The source of this problem is the fixed step-sizes of the cross-adaptive filtering algorithms that distort the speech signal when they are selected high and degrade the convergence speed behaviours when they are selected small. In this paper, we propose four modifications of this algorithm which allow improving both the convergence speed and the steady state values even in very noisy condition and a high number of subbands. To confirm the good performance of the four proposed variable-step-size SBBSS algorithms, we have carried out several simulations in various noisy environments. In these simulations, we have evaluated objective and subjective criteria as the system mismatch, the cepstral distance, the output signal-to-noise-ratio, and the mean opinion score (MOS) method to compare the four proposed variables step-size versions of the SBBSS algorithm with their original versions and with the two-channel fullband backward (2CFB) least mean square algorithm. 相似文献