期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

徐舜陈绍荣刘郁林《声学学报》2007,32(4):375-381

针对语音信号的欠定卷积混合模型,利用独立语音在时频域上的近似W-分离正交性(W-DO),提出了一种基于非线性时频掩蔽的盲分离方法。首先对多传声器观测信号在时频域上进行规范化处理,使混合信号在每个时频槽的表示与频率无关,然后采用动态聚类算法获取时频槽对应的活跃源信息,选择关于簇中心偏角的非线性函数进行时频掩蔽,从而实现语音信号的盲分离。该方法解决了经典频域盲分离算法中的频率置换问题,能有效抑制分离矩阵的空间方向扩散。仿真实验表明,与BLUES方法相比具有更优的分离性能,信噪比增益平均增加1.58 dB。相似文献

2.

Blind speech source separation via nonlinear time-frequency masking

XU Shun CHEN Shaorong LIU Yulin 《声学学报：英文版》2008,27(3):203-214

Aim at the underdetermined convolutive mixture model, a blind speech source separation method based on nonlinear time-frequency masking was proposed, where the approximate W-disjoint orthogonality （W-DO） property among independent speech signals in time-frequency domain is utilized. In this method, the observation mixture signal from multimicrophones is normalized to be independent of frequency in the time-frequency domain at first, then the dynamic clustering algorithm is adopted to obtain the active source information in each time-frequency slot, a nonlinear function via deflection angle from the cluster center is selected for time-frequency masking, finally the blind separation of mixture speech signals can be achieved by inverse STFT （short-time Fourier transformation）. This method can not only solve the problem of frequency permutation which may be met in most classic frequency-domain blind separation techniques, but also suppress the spatial direction diffusion of the separation matrix. The simulation results demonstrate that the proposed separation method is better than the typical BLUES method, the signal-noise-ratio gain （SNRG） increases 1.58 dB averagely. 相似文献

3.

A new efficient two-channel backward algorithm for speech intelligibility enhancement: A subband approach

《Applied Acoustics》2014

This paper addresses the problem of speech intelligibility enhancement by adaptive filtering algorithms employed with subband techniques. The two structures named the forward and backward blind source separation structures are extensively used in the speech enhancement and source separation areas, and largely studied in the literature with convolutive and non-convolutive mixtures. These two structures use two-microphones to generate the convolutive/non-convolutive mixing signal, and provide at the outputs the target and the jammer signal components. In this paper, we focus our interest on the backward structure employed to enhance the speech signal from a convolutive mixture. Furthermore, we propose a subband implementation of this structure to improve its behavior with speech signal. The new proposed subband-Backward BSS (SBBSS) structure allows a very important improvement of the convergence speed of the adaptive filtering algorithms when the subband-number is selected high. In order to improve the robustness of the proposed subband structure, we have adapted then applied a new criterion that combines the System Mismatch and the Mean-Errors criterion minimization. The proposed subband backward structure, when it is combined with this new criterion minimization, allows to enhance the output speech signal by reducing the distortion and the noise components. The performance of the proposed subband backward structure is validated through several objective criteria which are given and described in this paper. 相似文献

4.

时频字典学习的单通道语音增强算法

下载免费PDF全文

黄建军张雄伟张亚非邹霞《声学学报》2012,37(5):539-547

针对以往语音增强算法在非平稳噪声环境下性能急剧下降的问题,基于时频字典学习方法提出了一种新的单通道语音增强算法。首先,提出采用时频字典学习方法对噪声的频谱结构的先验信息进行建模,并将其融入到卷积非负矩阵分解的框架下;然后,在固定噪声时频字典情况下,推导了时变增益和语音时频字典的乘性迭代求解公式;最后,利用该迭代公式更新语音和噪声的时变增益系数以及语音的时频字典,通过语音时频字典和时变增益的卷积运算重构出语音的幅度谱并用二值时频掩蔽方法消除噪声干扰。实验结果表明,在多项语音质量评价指标上,本文算法都取得了更好的结果。在非平稳噪声和低信噪比环境下,相比于多带谱减法和非负稀疏编码去噪算法,本文算法更有效地消除了噪声,增强后的语音具有更好的质量。相似文献

5.

基于声源方位信息和非线性时频掩蔽的语音盲提取算法 总被引：2，自引：0，他引：2

下载免费PDF全文

夏秀渝何培宇《声学学报》2013,38(2):224-230

针对欠定卷积混合的语音信号模型,提出一种基于声源方位信息和非线性时频掩蔽的语音盲提取算法。首先对低频段混合语音信号进行时频分析估计瞬时相对时延(ITD)并采用势函数聚类分析方法估计出声源个数及其ITD,接着锁定目标提取准确的目标语音方位信息,最后利用独立语音在时频域上的近似W一分离正交性,采用非线性时频掩蔽的方法提取目标语音。仿真实验表明,该方法能锁定任意感兴趣目标方位,能有效提取目标语音,文中实验条件下信噪比增益平均达9.5 dB。相似文献

6.

Blind convolutive separation method for speech signals via joint block diagonalization

ZHANG Hua FENG Dazheng PANG Jiyong 《声学学报：英文版》2010,29(1):45-55

A blind speech source separation method for the overdetermined convolutive mixture model in time-domain is proposed via joint block-diagonalization based on the mutual- independence and short-time stationarity properties of the speech signals. Taking the sum of the F-norms of all off-diagonal sub-matrices as a criterion, a novel joint block-diagonalization method is proposed to estimate the whole mixture matrix through minimizing a sequence of quadratic sub-functions corresponding to mixture sub-matrices. Both theoretical analysis and simulations show that the proposed method has much lower complexity and faster convergence speed than the classical Jacobi-like method with no performance loss. In addition, there are almost no obvious impacts of the channel order and initialization values on the convergence speed. 相似文献

7.

卷积混迭语音信号的联合块对角化盲分离方法 总被引：1，自引：0，他引：1

张华冯大政庞继勇《声学学报》2009,34(2):167-174

针对语音信号的卷积混迭模型,利用不同语音信号之间的近似独立和短时平稳特性,提出一种基于信号二阶统计量的联合块对角化方法,解决超定卷积盲分离问题。该方法采用非对角线上各子矩阵 F -范数的平方和作为联合块对角化性能的评判准则,将原四次代价函数转化为一组较为简单的二次子代价函数,每一子代价函数用于估计酉混迭矩阵的一个子矩阵。依次最小化各子函数,迭代搜索代价函数最小点,得到混迭矩阵的估计。理论分析及实验结果表明,所提方法不仅能够达到与类Jacobi经典方法同样好的分离效果,并且具有更低的计算复杂度、更快的收敛速度和对传输信道阶数、迭代初始值不敏感的特点。相似文献

8.

车载场景结合盲源分离与多说话人状态判决的语音抽取 总被引：2，自引：0，他引：2

下载免费PDF全文

王泽林陈锴卢晶《声学学报》2020,45(5):696-706

在车载分布式传声器阵列场景中,结合盲源分离TRINICON (Triple-N ICA for convolutive mixtures)算法与多说话人状态判决实现期望语音抽取。根据分布式传声器阵列与声源的相对位置关系,设计特定的盲源分离初始化条件以保证输出通道与声源的映射关系;根据分布式传声器阵列的频响特点,设计特征矢量来进行多说话人判决,并将判决结果引入TRINICON算法参数迭代过程。在使用实际车载录音数据的仿真评测中,所提方法在不同信噪比下有较高的鲁棒性,可有效提升TRINICON算法的收敛速度和语音信号的信扰比,且可以确保准确的通道映射。评测结果表明该方法可以在车载场景中有效抽取出期望语音,为车载复杂场景下的声信息提取提供了一种可靠且收敛快速的解决方法。相似文献

9.

Using power level difference for near field dual-microphone speech enhancement

Nima Yousefian Ahmad Akbari Mohsen Rahmani 《Applied Acoustics》2009,70(11-12):1412-1421

In this contribution, a novel dual-channel speech enhancement technique is introduced. The proposed approach uses the dissimilarity between the power of received signals in the two channels as a criterion for speech enhancement and noise reduction. We claim that in near field conditions, where the distances between microphones and sound source are short, the difference in the received power levels at the two microphones is an estimate of the clean speech signal power. Then, apply this theory to present an optimum method for speech enhancement. Fortunately, the method has the ability to cope with problems such as transient noise and nearby microphones which are two of the main problems of the proposed dual-microphone speech enhancement techniques. Using objective speech quality measures and spectrogram analysis, we show that the proposed method results in improved speech quality. 相似文献

10.

Evaluation of Blind Source Separation for different algorithms based on second order statistics and different spatial configurations of directional microphones

Jedrzej Kociński Szymon Drgas Edward Ozimek 《Applied Acoustics》2012,73(2):109-116

The present study is concerned with the convolutive Blind Source Separation (BSS) of sound sources that leads to a significant speech intelligibility enhancement. Two experiments were conducted. In the first experiment two different algorithms of convolutive BSS were compared. Both methods are based on second order statistics since such approach is simple and gives satisfactory performance. The data resulted from this experiment suggested that with different approaches, different speech intelligibility improvement could be obtained. In the second experiment the influence of the spatial configuration of the cardioid microphones on the BSS performance was measured. It was revealed that the best separation for a considered spatial configuration can be obtained when microphones are directed alternately. 相似文献

11.

Speech endpoint detection in low-SNRs environment based on perception spectrogram structure boundary parameter

WU Di ZHAO Heming HUANG Chengwei XIAO Zhongzhe ZHANG Xiaojun XU Yishen TAO Zhi 《声学学报：英文版》2014,(4):428-440

The Perception Spectrogram Structure Boundary（PSSB）parameter is proposed for speech endpoint detection as a preprocess of speech or speaker recognition.At first a hearing perception speech enhancement is carried out.Then the two-dimensional enhancement is performed upon the sound spectrogram according to the difference between the determinacy distribution characteristic of speech and the random distribution characteristic of noise.Finally a decision for endpoint was made by the PSSB parameter.Experimental results show that,in a low SNR environment from-10 dB to 10 dB,the algorithm proposed in this paper may achieve higher accuracy than the extant endpoint detection algorithms.The detection accuracy of 75.2%can be reached even in the extremely low SNR at-10 dB.Therefore it is suitable for speech endpoint detection in low-SNRs environment. 相似文献

12.

低信噪比下采用感知语谱结构边界参数的语音端点检测算法

吴迪赵鹤鸣陶智张晓俊肖仲喆许宜申《声学学报》2014,39(3):392-399

提出了一种采用感知语谱结构边界参数(PSSB)的语音端点检测算法,用于在低信噪比环境下的语音信号预处理。在对含噪语音进行基于听觉感知特性的语音增强之后,针对语音信号的连续分布特性与残留噪声的随机分布特性之间的不同点,对增强后语音的时-频语谱进行二维增强,从而进一步突出连续分布的纯净语音的语谱结构。通过对增强后语音语谱结构的二维边界检测,提出PSSB参数,并用于端点检测。实验结果表明,在白噪声-10 dB到10 dB的各种信噪比环境下,采用PSSB参数的端点检测算法,相对于其它端点检测算法,更有效地检测出语音的端点。在-10 dB的极低信噪比下,提出的方法仍然有75.2%的正确率。采用PSSB参数的端点检测算法,更适合于低信噪比白噪声环境下的语音端点检测。相似文献

13.

Spatial cues alone produce inaccurate sound segregation: the effect of interaural time differences

A Schwartz JH McDermott B Shinn-Cunningham 《The Journal of the Acoustical Society of America》2012,132(1):357-368

To clarify the role of spatial cues in sound segregation, this study explored whether interaural time differences (ITDs) are sufficient to allow listeners to identify a novel sound source from a mixture of sources. Listeners heard mixtures of two synthetic sounds, a target and distractor, each of which possessed naturalistic spectrotemporal correlations but otherwise lacked strong grouping cues, and which contained either the same or different ITDs. When the task was to judge whether a probe sound matched a source in the preceding mixture, performance improved greatly when the same target was presented repeatedly across distinct distractors, consistent with previous results. In contrast, performance improved only slightly with ITD separation of target and distractor, even when spectrotemporal overlap between target and distractor was reduced. However, when subjects localized, rather than identified, the sources in the mixture, sources with different ITDs were reported as two sources at distinct and accurately identified locations. ITDs alone thus enable listeners to perceptually segregate mixtures of sources, but the perceived content of these sources is inaccurate when other segregation cues, such as harmonicity and common onsets and offsets, do not also promote proper source separation. 相似文献

14.

深海声影区时频谱干涉结构与声源定位*

下载免费PDF全文

刘与涵郭良浩章伟裕闫超董阁《应用声学》2024,43(1):12-23

被动声呐探测位于深海声影区的水面舰船辐射噪声时,接收信噪比通常较低,导致声源被动定位方法的性能较差。针对这一问题,提出一种利用接收信号时频谱干涉结构的声源距离和径向速度联合估计方法。首先根据射线声学理论,建立时频谱沿频率轴和时间轴的干涉条纹周期与声源距离和径向速度的关系,然后对接收信号时频谱进行二维傅里叶变换和多频带处理以估计上述干涉条纹周期,最后解算声源距离和径向速度。仿真和海试数据处理结果表明,相比于现有利用接收信号自相关的声源距离估计方法,该文利用时频谱二维傅里叶变换的声源定位方法具有较好的稳健性,比较适用于低信噪比条件下的声源被动定位。相似文献

15.

多声源共同作用下的混合声剂量值预测方法研究

下载免费PDF全文

闫靓陈克安 Ruedi Stoop 《物理学报》2014,63(5):54302-054302

本文借助单一声样本与人工合成的混合声样本,重点研究了由多个单一声源共同作用形成的混合声的剂量值预测方法.首先,提出了一种基于对作用时长短时化处理的声音样本剂量值确定流程,并利用该方法分别确定了单一声样本与人工合成的混合声样本的剂量值.随后,分析了混合声样本剂量值(亦称总剂量,记为LTotal)与构成混合声样本的每个单一声样本剂量值(亦称组分剂量,记为Li,i=1,2,···,K;K为单一噪声样本的个数亦称组分数量)之间的关系,实现了在已知所有单一声剂量值的前提下成功预测混合声的剂量值,为深入开展复杂声环境下的噪声源控制和噪声总剂量控制、实现高效的环境噪声治理提供了理论依据. 相似文献

16.

Algorithms for computing the time-corrected instantaneous frequency (reassigned) spectrogram, with applications

Fulop SA Fitz K 《The Journal of the Acoustical Society of America》2006,119(1):360-371

A modification of the spectrogram (log magnitude of the short-time Fourier transform) to more accurately show the instantaneous frequencies of signal components was first proposed in 1976 [Kodera et al., Phys. Earth Planet. Inter. 12, 142-150 (1976)], and has been considered or reinvented a few times since but never widely adopted. This paper presents a unified theoretical picture of this time-frequency analysis method, the time-corrected instantaneous frequency spectrogram, together with detailed implementable algorithms comparing three published techniques for its computation. The new representation is evaluated against the conventional spectrogram for its superior ability to track signal components. The lack of a uniform framework for either mathematics or implementation details which has characterized the disparate literature on the schemes has been remedied here. Fruitful application of the method is shown in the realms of speech phonation analysis, whale song pitch tracking, and additive sound modeling. 相似文献

17.

A localization approach for multiple sound sources via an expectation maximization algorithm using differential microphone arrays

《声学学报：英文版》2017,(4)

Conventional sound localization approaches with small-sized microphone arrays are usually sensitive to noise and reverberation. To deal with the problem, an approach based on expectation maximization algorithm with differential microphone arrays(DMAs) is proposed.Firstly, the parameters of Gaussian mixture model for time-frequency instantaneous direction estimation are estimated through the EM algorithm, and then the direction of each sound source is estimated via time-frequency separation. In order to overcome the weakness of existing time-frequency separation techniques, an improved method, which combines the advantages of both the hard and soft separation methods, is also proposed. The improved time-frequency separation method is shown to be less sensitive to noise and reverberation. Simulation and experimental results demonstrate that the proposed localization approach is superior to its existing counterparts in terms of localization accuracy and robustness. 相似文献

18.

语声频谱分析与显示的新方法

滕文善《声学学报》1986,11(1):56-60

本文介绍一种声谱分析及显示方法,旨在使用计算机对语声或其他类似信号进行综合研究的环境下,获得象在实验室中使用语图仪和示波器那样方便而且直观的分析手段。本方法利用计算机的图形显示器以人机对话形式进行信号频谱和波形分析。除了显示时间波形外,它可以显示辉度调制、透视和等值线等三种形式的时变功率谱图,对已生成的谱图还可进一步解剖截面和定量分析,允许用户任意指定的作图参数达15种。相似文献

19.

非线性混合模式的语音盲分离算法

下载免费PDF全文

胡亚龙李双田《应用声学》2006,25(2):82-89

本文针对FIR非线性混合模型，基于最大熵算法，提出了一种以高斯混合模式概率密度函数估计替代传统对数化概率密度估计的盲分离算法，以偶函数为非线性激活函数，采用最大期望（EM）迭代算法推导了分离算法的权向量迭代公式，通过模拟仿真实验结果与传统的最大熵和高阶累积量方法比较，新算法提高了收敛速度，并有效地完成了非线性语音分离任务，抑制了干扰语音信号的影响，提高了输出信噪比。相似文献

20.

仿选择性注意机制的语音情感识别算法

梁瑞宇赵力陶华伟王青云邹采荣《声学学报》2016,41(4):537-544

有效特征的选取一直都是语音情感识别算法的关键。为此,针对语音情感特征选择与构建的问题,一种仿选择性注意机制的语音情感识别算法被提出。考虑到语音信号的时频特性,算法首先计算语音信号的语谱图;其次,模仿选择性注意机制,计算语谱图的颜色、方向和亮度特征图,归一化后形成特征矩阵;然后,将特征矩阵重排列并进行PCA降维,形成情感识别特征向量;最后,利用改进的支持向量机分类方法进行语音情感识别。对愤怒、恐惧、高兴、悲伤和惊奇5种情感的识别实验显示,基于选择性注意的方法能够获得较好的识别效果,平均识别率为85.44%。相比于韵律特征和音质特征,语音情感识别率至少提高10%;相比于其它语谱特征,识别率提高7%左右。相似文献