共查询到17条相似文献,搜索用时 68 毫秒
1.
《声学学报:英文版》2021,(1)
从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法。在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表。对分别由幅度谱和功率谱恢复而来的两路信号进行自适应加权重构,结合相位补偿函数得到增强后的语音信号。实验结果表明,该方法在平稳、非平稳噪声环境下相比于单一谱特征的语音增强方法平均提高31.6%,改善了语音增强方法的性能。 相似文献
2.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。 相似文献
3.
提出了一种基于极大似然的噪声对数功率谱估计方法,采用高斯混合模型对每一个频带上的功率谱包络构建统计模型,将时序包络划分为语音和非语音类,它们分别对应于高斯混合模型的两个高斯分量,描述语音和非语音的统计分布,其中非语音高斯分量的均值即为噪声功率谱的最优估计.采用序贯学习的方法,在极大似然准则下逐帧更新模型参数,并逐帧给出噪声功率谱的最优估计值。此外,由于序贯更新过程中语音信号长时缺失,容易导致模型失稳,提出了一种在线的最小描述长度准则(MDL)来判断语音信号是否长时缺失,从而保证了模型的稳定性.实验表明,算法性能整体优于经典的MS和IMCRA算法。 相似文献
4.
5.
分段匹配追踪式Karhunen-Loeve非相干字典语音压缩感知 总被引:1,自引:0,他引:1
压缩感知(Compressed Sensing,CS)理论突破了经典采样定理的理论边界,为信号压缩提供了另一种途径。基于CS理论框架,做了两方面工作:为提高语音字典对信号的匹配性,设计了一种基于K-L展开的非相干语音字典;针对现有匹配追踪(MP,OMP)算法的不足,提出分段匹配追踪(Segment MP,SegMP)算法。首先对语音自相关函数进行建模并估计模型参数,构造语音自适应非相干字典,然后采用SegMP对语音稀疏向量分段观测,获得多个低维矢量,最后结合模型参数重建字典并重构信号,实现了语音压缩感知。语音测试结果表明:相比现有方案,本文方案对信号的稀疏表示更为精准,具有更好的重构质量,且降低了计算复杂度。 相似文献
6.
基于双向搜索方法的最小值控制递归平均语音增强算法 总被引:4,自引:0,他引:4
语音增强效果的提高,有赖于对噪声的准确估计和对噪声变化的及时跟踪与更新。为了提高对非平稳噪声的估计和更新能力,本文基于\ 相似文献
7.
本文介绍了一种基于功率谱减的方法来增强带有白噪声的语音信号。过量功率谱减是语音增强的一个有效的方法,其处理后产生的纯音噪声采用中心限幅的方法可以很好地得到抑制。 相似文献
8.
提出了一种融合梅尔谱增强与特征解耦的噪声鲁棒语音转换模型,即MENR-VC模型。该模型采用3个编码器提取语音内容、基频和说话人身份矢量特征,并引入互信息作为相关性度量指标,通过最小化互信息进行矢量特征解耦,实现对说话人身份的转换。为了改善含噪语音的频谱质量,模型使用深度复数循环卷积网络对含噪梅尔谱进行增强,并将其作为说话人编码器的输入;同时,在训练过程中,引入梅尔谱增强损失函数对模型整体损失函数进行了改进。仿真实验结果表明,与同类最优的噪声鲁棒语音转换方法相比,所提模型得到的转换语音在语音自然度和说话人相似度的平均意见得分方面,分别提高了0.12和0.07。解决了语音转换模型在使用含噪语音进行训练时,会导致深度神经网络训练过程难以收敛,转换语音质量大幅下降的问题。 相似文献
9.
提出两种基于非对称代价函数的耳语音增强算法,将语音增强过程中的放大失真和压缩失真区分对待。Modified ItakuraSaito (MIS)算法对放大失真给予更多的惩罚,而Kullback-Leibler (KL)算法则对压缩失真给予更多的惩罚。实验结果表明,在低于—6 dB的低信噪比情况中,经MIS算法增强后的耳语音的可懂度相比传统算法有显著提高;而KL算法则获得了同最小均方误差语音增强算法近似的可懂度提高效果,证实了耳语音中的放大失真和压缩失真对于耳语音可懂度的影响并不相同,低信噪比时较大的压缩失真有助于提高耳语音可懂度,而高信噪比时的压缩失真对耳语音可懂度影响较小。 相似文献
10.
为了提高传统正交匹配追踪(Orthogonal Matching Pursuit,OMP )算法的语音增强性能和运算速度,本研究基于稀疏编码理论,提出了一种改进的OMP算法的语音增强算法。其一,将K-奇异值分解(K-singular value decomposition,K-SVD)算法与OMP算法相结合,通过设置能量阈值的方法,提高OMP算法的语音增强性能;其二,通过改进传统OMP算法中信号稀疏逼近的计算方法,提高算法的运算速度。改进的OMP算法的语音增强算法与传统K-SVD语音增强算法相比,采用PESQ评价增强语音的质量,NCM评价语音的可懂度。在NCM的值基本保持不变的情况下,PESQ的值平均提高约12.47%,取得了更好的增强效果。取得了更好的增强效果。改进的OMP算法的运算速度与传统OMP算法相比提高近一倍。 相似文献
11.
In this contribution, a novel dual-channel speech enhancement technique is introduced. The proposed approach uses the dissimilarity between the power of received signals in the two channels as a criterion for speech enhancement and noise reduction. We claim that in near field conditions, where the distances between microphones and sound source are short, the difference in the received power levels at the two microphones is an estimate of the clean speech signal power. Then, apply this theory to present an optimum method for speech enhancement. Fortunately, the method has the ability to cope with problems such as transient noise and nearby microphones which are two of the main problems of the proposed dual-microphone speech enhancement techniques. Using objective speech quality measures and spectrogram analysis, we show that the proposed method results in improved speech quality. 相似文献
12.
针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。 相似文献
13.
In this paper, a novel single microphone channel-based speech enhancement technique is presented. While most of the conventional nonnegative matrix factorization-based approaches focus on generating a basis matrix of speech and noise for enhancement, the proposed algorithm performs an additional process to reconstruct speech from noisy speech when these two elements are highly overlapped in selected spectral bands. This process involves a log-spectral amplitude based estimator, which provides the spectrotemporal speech presence probability to obtain a more accurate reconstruction. Moreover, the proposed algorithm applies an unsupervised learning method to the input noise, so it is adaptable to any type of environmental noise without a pre-trained dictionary. The experimental results demonstrate that the proposed algorithm obtains improved speech enhancement performance compared with conventional single channel-based approaches. 相似文献
14.
Avid Avokh 《Applied Acoustics》2010,71(3):262-268
This paper aims to extend previous work on constant directivity beam-formers (CDBs), for the case of multiple desired speech sources, by designing a linearly constrained adaptive CDB (LCA-CDB) which preserves the beam-pattern in multiple look directions. Also, the proposed LCA-CDB, adaptively, minimizes the transient noise power in the output of the beam-former, and furthermore, produces some controlled nulls (controlled in both amplitude and angle) on the beam-pattern. This strengthens the system in removing permanent directional noises and producing a frequency-invariant beam-pattern with multiple main-lobes and controlled nulls in arbitrary frequency bands. Through simulating the system and the acoustical situations, the authors have tried to demonstrate the capability of the proposed method in enhancement of broadband and telephony speech in the presence of various noise sources (transient noise, permanent noise and uncorrelated white Gaussian noise). The simulation results obtained in this study confirm the efficiency of the proposed method in suppression of environmental noises. 相似文献
15.
Siren noises usually severely disturb the intelligibility of voice communication inside the cabs of police, paramedic and fire vehicles. It is often desired that such unwanted noise can be removed from the speech signal. In this paper, a new method is proposed to adaptively cancel siren noises and enhance speech signals. Based on the characteristics of siren noises, an anti-speech filter and a time delayer are employed in the single and dual channel noise cancellation systems to reduce the siren noises. Experiment results demonstrate that the effectiveness of the proposed method for canceling the siren noises and the performance of the enhanced speech signal is satisfying. 相似文献
16.
A robust power spectrum split cancellation-based spectrum sensing method for cognitive radio systems 下载免费PDF全文
Spectrum sensing is an essential component to realize the cognitive radio, and the requirement for real-time spectrum sensing in the case of lacking prior information, fading channel, and noise uncertainty, indeed poses a major challenge to the classical spectrum sensing algorithms. Based on the stochastic properties of scalar transformation of power spectral density(PSD), a novel spectrum sensing algorithm, referred to as the power spectral density split cancellation method(PSC), is proposed in this paper. The PSC makes use of a scalar value as a test statistic, which is the ratio of each subband power to the full band power. Besides, by exploiting the asymptotic normality and independence of Fourier transform,the distribution of the ratio and the mathematical expressions for the probabilities of false alarm and detection in different channel models are derived. Further, the exact closed-form expression of decision threshold is calculated in accordance with Neyman–Pearson criterion. Analytical and simulation results show that the PSC is invulnerable to noise uncertainty,and can achive excellent detection performance without prior knowledge in additive white Gaussian noise and flat slow fading channels. In addition, the PSC benefits from a low computational cost, which can be completed in microseconds. 相似文献
17.
Speech signal is corrupted unavoidably by noisy environment in subway, factory, and restaurant or speech from other speakers in speech communication. Speech enhancement methods have been widely studied to minimize noise influence in different linear transform domain, such as discrete Fourier transform domain, Karhunen-Loeve transform domain or discrete cosine transform domain. Kernel method as a nonlinear transform has received a lot of interest recently and is commonly used in many applications including audio signal processing. However this kind of method typically suffers from the computational complexity. In this paper, we propose a speech enhancement algorithm using low-rank approximation in a reproducing kernel Hilbert space to reduce storage space and running time with very little performance loss in the enhanced speech. We also analyze the root mean squared error bound between the enhanced vectors obtained by the approximation kernel matrix and the full kernel matrix. Simulations show that the proposed method can improve the computation speed of the algorithm with the approximate performance compared with that of the full kernel matrix. 相似文献