期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

采用性别相关的深度神经网络及非负矩阵分解模型用于单通道语音增强 总被引：3，自引：0，他引：3

李煦王子腾王晓飞付强颜永红《声学学报》2019,44(2):221-230

为了从带噪信号中得到纯净的语音信号,提出了一种采用性别相关模型的单通道语音增强算法。具体而言,在训练阶段,分别训练了与性别相关的深度神经网络-非负矩阵分解模型用于估计非负矩阵分解中的权重参数;在测试阶段,提出了一种基于非负矩阵分解和组稀疏惩罚的算法用于判断测试语音中说话人的性别信息,然后再采用对应的模型估计权重,并结合已训练好的字典进行语音增强。实验结果表明所提算法在噪声抑制量及语音质量上,均优于一些基于非负矩阵分解的算法和基于深度神经网络的算法。相似文献

2.

稀疏低秩噪声模型下无监督实时单通道语音增强算法

《声学学报：英文版》2015,(4)

针对现有基于字典学习的增强算法需要先验信息、不易实时处理的问题,提出一种便于实时处理的无监督的单通道语音增强算法。首先,该算法将无监督条件下背景噪声的建模问题转化为带噪语音幅度谱的稀疏低秩噪声分解;然后,采用增量非负子空间方法对背景噪声进行在线字典学习,获得能够体现背景噪声时变特性的自适应噪声字典;最后,利用所得的噪声字典,采用易于实时处理的逐帧迭代方式,对带噪语音进行处理。实验结果表明:相较于多带谱减法和基于低秩稀疏矩阵分解的增强算法,所提算法在噪声抑制方面的性能尤为显著,在多项性能评价指标上,均表现出更好的结果。相似文献

3.

稀疏低秩噪声模型下无监督实时单通道语音增强算法

下载免费PDF全文

李轶南张雄伟贾冲陈亮曾理《声学学报》2015,40(4):607-614

针对现有基于字典学习的增强算法需要先验信息、不易实时处理的问题,提出一种便于实时处理的无监督的单通道语音增强算法。首先,该算法将无监督条件下背景噪声的建模问题转化为带噪语音幅度谱的稀疏低秩噪声分解;然后,采用增量非负子空间方法对背景噪声进行在线字典学习,获得能够体现背景噪声时变特性的自适应噪声字典;最后,利用所得的噪声字典,采用易于实时处理的逐帧迭代方式,对带噪语音进行处理。实验结果表明:相较于多带谱减法和基于低秩稀疏矩阵分解的增强算法,所提算法在噪声抑制方面的性能尤为显著,在多项性能评价指标上,均表现出更好的结果。相似文献

4.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

5.

L_(1/2)稀疏约束卷积非负矩阵分解的单通道语音增强方法

《声学学报：英文版》2017,(3)

为了刻画语音信号帧间相关性和使用更少的语音基表示语音特征,提出一种采用L_(1/2)稀疏约束的卷积非负矩阵分解方法进行单通道语音增强。首先,进行噪声学习得到噪声基;然后,以噪声基为先验信息结合L_(1/2)稀疏约束卷积非负矩阵分解方法学习含噪语音中的语音基成分;最后,利用学习到的语音基和系数重建出干净语音信号。在不同噪声环境下进行的实验结果表明,本文方法优于采用L_1稀疏约束的卷积非负矩阵方法及传统的统计语音增强方法。相似文献

6.

L_1/2稀疏约束卷积非负矩阵分解的单通道语音增强方法

下载免费PDF全文

路成田猛周健王华彬陶亮《声学学报》2017,42(3):377-384

为了刻画语音信号帧间相关性和使用更少的语音基表示语音特征,提出一种采用L_1/2稀疏约束的卷积非负矩阵分解方法进行单通道语音增强。首先,进行噪声学习得到噪声基;然后,以噪声基为先验信息结合L_1/2稀疏约束卷积非负矩阵分解方法学习含噪语音中的语音基成分;最后,利用学习到的语音基和系数重建出干净语音信号。在不同噪声环境下进行的实验结果表明,本文方法优于采用L₁稀疏约束的卷积非负矩阵方法及传统的统计语音增强方法。相似文献

7.

噪声情况下采用稀疏非负矩阵分解与深度吸引子网络的人声分离算法 总被引：1，自引：1，他引：0

下载免费PDF全文

葛宛营张天骐范聪聪张天《声学学报》2021,46(1):55-66

为实现噪声情况下的人声分离,提出了一种采用稀疏非负矩阵分解与深度吸引子网络的单通道人声分离算法。首先,通过训练得到人声与噪声的字典矩阵,将其作为先验信息从带噪混合语音中分离出人声与噪声的系数矩阵;然后,根据人声系数矩阵中不同的声源成分在嵌入空间中的相似性不同,使用深度吸引子网络将其分离为各声源语音的系数矩阵;最后,使用分离得到的各语音系数矩阵与人声的字典矩阵重构干净的分离语音。在不同噪声情况下的实验结果表明,本文算法能够在抑制背景噪声的同时提高分离语音的整体质量,优于结合声噪人声分离模型的对比算法。相似文献

8.

结合幅度谱和功率谱字典的语音增强方法 总被引：1，自引：0，他引：1

下载免费PDF全文

聂玲子陈雪勤赵鹤鸣《声学学报》2021,46(1):81-91

从双路字典学习、噪声功率谱估计、语音幅度谱重构角度提出了一种改进的谱特征稀疏表示语音增强方法。在字典学习阶段,融合功率谱与幅度谱特征,采用区分性字典降低语音字典和噪声字典的相干性;在语音增强阶段,提出一种噪声功率谱估计方法对非平稳噪声进行跟踪估计;考虑到幅度谱和功率谱特征对不同噪声的适应程度不同,设计了语音重构权值表。对分别由幅度谱和功率谱恢复而来的两路信号进行自适应加权重构,结合相位补偿函数得到增强后的语音信号。实验结果表明,该方法在平稳、非平稳噪声环境下相比于单一谱特征的语音增强方法平均提高31.6%,改善了语音增强方法的性能。相似文献

9.

分段匹配追踪式Karhunen-Loeve非相干字典语音压缩感知 总被引：1，自引：0，他引：1

下载免费PDF全文

曾理张雄伟陈亮杨吉斌黄建军《声学学报》2013,38(4):493-500

压缩感知(Compressed Sensing,CS)理论突破了经典采样定理的理论边界,为信号压缩提供了另一种途径。基于CS理论框架,做了两方面工作:为提高语音字典对信号的匹配性,设计了一种基于K-L展开的非相干语音字典;针对现有匹配追踪(MP,OMP)算法的不足,提出分段匹配追踪(Segment MP,SegMP)算法。首先对语音自相关函数进行建模并估计模型参数,构造语音自适应非相干字典,然后采用SegMP对语音稀疏向量分段观测,获得多个低维矢量,最后结合模型参数重建字典并重构信号,实现了语音压缩感知。语音测试结果表明:相比现有方案,本文方案对信号的稀疏表示更为精准,具有更好的重构质量,且降低了计算复杂度。相似文献

10.

采用L_1/2稀疏约束的梅尔倒谱系数语音重建方法 总被引：1，自引：0，他引：1

下载免费PDF全文

周健刘荣敏窦云峰路成陶亮《声学学报》2018,43(6):991-999

提出了一种利用L_1/2稀疏约束从梅尔倒谱系数重建语音时域信号方法。从梅尔倒谱系数估计语音幅度谱是一个欠定问题,现有的方法均采用幅度谱最小均方误差估计或采用L1正则化进行幅度谱的稀疏约束。相比于L₁正则化模型,L_1/2的稀疏约束特性更强,为此,本文在从梅尔倒谱系数估计语音幅度谱时引入L_1/2正则化约束,并利用求解的稀疏幅度谱估计相位谱,最后利用估计的频谱重建时域语音信号。实验结果表明,与幅度谱最小均方误差法相比,本文算法所估计出的语音信号具有更高的语音质量;在噪声环境下进行语音重建实验,与L₁正则化幅度谱估计方法相比,本文算法重建的语音质量更好,表现出更好抗噪性。相似文献

11.

Comparative intelligibility investigation of single-channel noise-reduction algorithms for Chinese, Japanese, and English

Li J Yang L Zhang J Yan Y Hu Y Akagi M Loizou PC 《The Journal of the Acoustical Society of America》2011,129(5):3291-3301

A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed. 相似文献

12.

Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities

Seongjae Lee David K. Han Hanseok Ko 《Applied Acoustics》2017

In this paper, a novel single microphone channel-based speech enhancement technique is presented. While most of the conventional nonnegative matrix factorization-based approaches focus on generating a basis matrix of speech and noise for enhancement, the proposed algorithm performs an additional process to reconstruct speech from noisy speech when these two elements are highly overlapped in selected spectral bands. This process involves a log-spectral amplitude based estimator, which provides the spectrotemporal speech presence probability to obtain a more accurate reconstruction. Moreover, the proposed algorithm applies an unsupervised learning method to the input noise, so it is adaptable to any type of environmental noise without a pre-trained dictionary. The experimental results demonstrate that the proposed algorithm obtains improved speech enhancement performance compared with conventional single channel-based approaches. 相似文献

13.

Using power level difference for near field dual-microphone speech enhancement

Nima Yousefian Ahmad Akbari Mohsen Rahmani 《Applied Acoustics》2009,70(11-12):1412-1421

In this contribution, a novel dual-channel speech enhancement technique is introduced. The proposed approach uses the dissimilarity between the power of received signals in the two channels as a criterion for speech enhancement and noise reduction. We claim that in near field conditions, where the distances between microphones and sound source are short, the difference in the received power levels at the two microphones is an estimate of the clean speech signal power. Then, apply this theory to present an optimum method for speech enhancement. Fortunately, the method has the ability to cope with problems such as transient noise and nearby microphones which are two of the main problems of the proposed dual-microphone speech enhancement techniques. Using objective speech quality measures and spectrogram analysis, we show that the proposed method results in improved speech quality. 相似文献

14.

Three-dimensional sound source localization using distributed microphone arrays

《声学学报：英文版》2017,(2)

To improve the performance of sound source localization based on distributed microphone arrays in noisy and reverberant environments,a sound source localization method was proposed.This method exploited the inherent spatial sparsity to convert the localization problem into a sparse recovery problem based on the compressive sensing(CS) theory.In this method two-step discrete cosine transform(DCT)-based feature extraction was utilized to cover both short-time and long-time properties of the signal and reduce the dimensions of the sparse model.Moreover,an online dictionary learning(DL) method was used to dynamically adjust the dictionary for matching the changes of audio signals,and then the sparse solution could better represent location estimations.In addition,we proposed an improved approximate l_0norm minimization algorithm to enhance reconstruction performance for sparse signals in low signal-noise ratio(SNR).The effectiveness of the proposed scheme is demonstrated by simulation results where the locations of multiple sources can be obtained in the noisy and reverberant conditions. 相似文献

15.

Speech enhancement based on the discrete Gabor transform and multi-notch adaptive digital filters

Ergun Erçelebi 《Applied Acoustics》2004,65(8):739-762

This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained. 相似文献