首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 296 毫秒
1.
基于二次相关的语音信号时延估计改进算法   总被引:1,自引:1,他引:0  
刘敏  曾毓敏  张铭  李晨 《应用声学》2016,35(3):255-264
目前语音信号的时延估计研究,大部分采用的是广义互相关算法。然而,广义互相关时延估计算法易受噪声和混响环境影响。为此,本文提出了一种基于二次相关的语音信号时延估计改进算法,该算法对语音信号进行二次互相关运算,并结合Hilbert变换,对二次互相关峰值进行进一步的锐化处理,使得反映时延的峰值点检测更为准确。实验结果表明,改进的时延估计方法在非平稳的语音信号中能够有效地抑制噪声干扰,且在不同混响条件下时延估计具有更好的性能。  相似文献   

2.
孙兴伟  李军锋  颜永红 《声学学报》2021,46(6):1234-1241
提出一种结合卷积神经网络的编解码器模型和混响时间注意力机制的混响抑制算法,该算法通过编解码器模型实现混响抑制,并利用混响时间注意力机制克服混响环境变化对混响抑制效果的影响。该算法在编码器中使用具有不同大小的卷积核来处理混响语音幅度谱,从而获得包含多尺度上下文信息的编码特征;通过引入注意力模块,实现在不同的混响时间环境中选择性地使用不同权重的编码特征生成加权特征;最后,在解码器中使用加权特征来重建混响抑制后的语音信号幅度谱。在模拟和真实的混响环境下,该算法相对于基线系统在语音混响调制能量比上分别取得了0.36 dB和0.66 dB的提升。实验结果表明,该算法可以适应不同混响环境的变化,相对基线系统在真实混响环境下具有更高的鲁棒性。   相似文献   

3.
最大似然估计是提取目标微动特征参数的最佳估计方法,但直接用网格法求解计算量巨大,且激光探测微多普勒回波信号对应的代价函数具有高度非线性,存在多个局部最大值。为此,提出均值似然估计与蒙特卡罗结合的估计方法,给出了最大似然参数估计的闭合表达式,再通过设计压缩似然函数获得全局最大值,通过蒙特卡罗法抽样并计算循环均值估计出参数。该方法避免了传统方法中对高精度初始值和复杂迭代算法的依赖,能够实现参数的联合估计。对于多分量微多普勒信号,该方法可在参数估计的同时实现各微动分量分离,且不增加算法的复杂性。对仿真和实验数据进行估计,结果表明,该方法在达到近似于最大似然估计性能的同时可有效降低计算复杂度并确保了全局收敛,实现信号的分离和参数估计。  相似文献   

4.
基于最大似然多项式回归的鲁棒语音识别   总被引:2,自引:0,他引:2  
吕勇  吴镇扬 《声学学报》2010,35(1):88-96
本文针对最大似然线性回归算法线性假设的缺点,将多项式回归方法用于模型自适应,构建了基于最大似然多项式回归的非线性模型自适应算法。该算法在对数谱域用多项式回归方法,逼近每个Mel子带上识别环境模型均值与训练环境模型均值之间的非线性关系。多项式系数通过EM算法和最大似然准则从识别环境下的少量自适应数据中估计。实验结果表明,二阶多项式就可以较好地逼近模型均值的非线性环境变换关系。在噪声补偿和说话人自适应实验中,最大似然多项式回归算法的误识率都明显低于最大似然线性回归算法。本文算法较好地克服了线性模型自适应算法线性假设的缺陷,可同时减小噪声,和说话人的改变或其它因素对语音识别系统的影响,尤其适合说话人和噪声的联合自适应。   相似文献   

5.
偏度最大化多通道逆滤波语音去混响研究*   总被引:1,自引:1,他引:0       下载免费PDF全文
房间混响会降低语音质量和语音可懂度。高阶统计量是衡量非高斯性的重要参量,基于语音非高斯特性可实现语音去混响。本文提出一种基于高阶统计量的多通道语音去混响方法,该方法首次用多通道语音信号线性预测残差的三阶统计量偏度(Skewness)构造代价函数,以去混响重建信号线性预测残差的偏度最大化为目标自适应地更新逆滤波器;同时结合语音信号的产生模型,提出基于偏度准则的线性预测与房间脉冲响应逆滤波联合估计方法,进一步提高去混响算法性能。实验结果表明,该方法相较于已有的基于线性预测残差四阶统计量峰度(Kurtosis)的方法具有更好的去混响效果,且对噪声具有更强的鲁棒性。  相似文献   

6.
郑灏  李整林 《应用声学》2012,31(4):272-276
混响场时域信号序列的数值实现对主动声纳目标模拟器研制具有重要意义。本文所述适用于水平变化浅海环境混响时间序列的一种实现方法,基于现有的浅海简正波相干混响理论。所述方法的思路是通过相干散射场随机序列与发射信号的离散卷积来构成混响时域序列,文中以Pekeris均匀浅海环境为例,给出了相应的混响时间序列仿真结果,并对仿真结果的频谱特性、统计特性、强度衰减以及垂直相关特性等进行了系统检验。结果表明,由仿真混响时间序列所获得的混响特性与理论结果相符,可用于浅海混响序列的仿真。  相似文献   

7.
采用归一化补偿变换的与文本无关的说话人识别   总被引:10,自引:0,他引:10  
在噪声环境下,特别是当说话人识别最常用的模型——高斯混合模型(GMM)失配的情况下,需要对其输出帧似然概率的统计特性进行补偿。文章根据说话人识别的声学特性,提出了一种非线性变换方法——归一化补偿变换。理论分析和实验结果表明:与常用的最大似然(ML)变换相比,该变换能够提高系统识别率,最大可达3.7%,同时可降低误识率,最大可达45.1%。结果说明归一化补偿变换方法基本克服了在与文本无关说话人识别系统中,当说话人的个性特征不断变化、语音与噪声不能很好地分离或者降噪算法对语音有损伤、模型不能很好地匹配时,需要对模型输出的似然概率(得分)进行补偿的局限。这也说明对模型输出的似然概率进行处理是降低噪声和干扰的影响、提高说话人识别率的有效方法。  相似文献   

8.
球谐域自适应混响抵消与声源定位算法   总被引:3,自引:0,他引:3       下载免费PDF全文
提出了一种基于球谐域的自适应混响抵消与声源定位算法,该方法通过去混响处理改善语音质量,并提高球谐域定位算法在混响环境下的定位性能。推导了基于多通道线性预测的自适应混响抵消算法在球谐域的表达式,针对刚球模型提出分阶处理的去混响方法,并对去混响后的信号进行波达方向估计。采用32元球阵的仿真结果表明,相比于球谐域不分阶去混响方法,该方法最大可减少约2/3的运算量,同时语音PESQ得分及SRMR均显著提高。利用实验数据对算法性能进行测试,实验结果验证了该方法在实际声学环境中去混响和声源定位的有效性。   相似文献   

9.
提出了一种文本无关说话人识别的全特征矢量集模型及互信息评估方法,该模型通过对一组说话人语音数据在特征空间进行聚类而形成,全面地反映了说话人语音的个性特征。对于说话人语音的似然度计算与判决,则提出了一种互信息评估方法,该算法综合分析距离空间和信息空间的似然度,并运用最大互信息判决准则进行识别判决。实验分析了线性预测倒谱系数(LPCC)和Mel频率倒谱系数(MFCC)两种情况下应用全特征矢量集模型和互信息评估算法的说话人识别性能,并与高斯混合模型进行了比较。结果表明:全特征矢量集模型和互信息评估算法能够充分反映说话人语音特征,并能够有效评估说话人语音特征相似程度,具有很好的识别性能,是有效的。  相似文献   

10.
在构建混响语声数据集时,由于缺乏真实长混响房间脉冲响应且模拟的房间脉冲响应与真实不符,因而导致数据驱动的混响时间盲估计模型性能下降。提出了一种基于条件生成对抗网络的房间脉冲响应模拟法,该方法利用真实的房间脉冲响应训练条件生成对抗网络,可以根据指定的混响时间模拟更加真实的房间脉冲响应。使用不同方法模拟的房间脉冲响应构建训练集用于训练盲估计模型,通过声学实验评估模型性能。实验结果表明,由该方法模拟的房间脉冲响应训练的估计模型在不同信噪比下均具有最小的均方根误差且在长混响情况下显著优于其他模型。  相似文献   

11.
This paper compares two methods for extracting room acoustic parameters from reverberated speech and music. An approach which uses statistical machine learning, previously developed for speech, is extended to work with music. For speech, reverberation time estimations are within a perceptual difference limen of the true value. For music, virtually all early decay time estimations are within a difference limen of the true value. The estimation accuracy is not good enough in other cases due to differences between the simulated data set used to develop the empirical model and real rooms. The second method carries out a maximum likelihood estimation on decay phases at the end of notes or speech utterances. This paper extends the method to estimate parameters relating to the balance of early and late energies in the impulse response. For reverberation time and speech, the method provides estimations which are within the perceptual difference limen of the true value. For other parameters such as clarity, the estimations are not sufficiently accurate due to the natural reverberance of the excitation signals. Speech is a better test signal than music because of the greater periods of silence in the signal, although music is needed for low frequency measurement.  相似文献   

12.
An algorithm for blind estimation of reverberation time (RT) in speech signals is proposed. Analysis is restricted to the free-decaying regions of the signal, where the reverberation effect dominates, yielding a more accurate RT estimate at a reduced computational cost. A spectral decomposition is performed on the reverberant signal and partial RT estimates are determined in all signal subbands, providing more data to the statistical-analysis stage of the algorithm, which yields the final RT estimate. Algorithm performance is assessed using two distinct speech databases, achieving 91% and 97% correlation with the RTs measured by a standard nonblind method, indicating that the proposed method blindly estimates the RT in a reliable and consistent manner.  相似文献   

13.
In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design.  相似文献   

14.
The reverberation time (RT) is an important parameter for characterizing the quality of an auditory space. Sounds in reverberant environments are subject to coloration. This affects speech intelligibility and sound localization. Many state-of-the-art audio signal processing algorithms, for example in hearing-aids and telephony, are expected to have the ability to characterize the listening environment, and turn on an appropriate processing strategy accordingly. Thus, a method for characterization of room RT based on passively received microphone signals represents an important enabling technology. Current RT estimators, such as Schroeder's method, depend on a controlled sound source, and thus cannot produce an online, blind RT estimate. Here, a method for estimating RT without prior knowledge of sound sources or room geometry is presented. The diffusive tail of reverberation was modeled as an exponentially damped Gaussian white noise process. The time-constant of the decay, which provided a measure of the RT, was estimated using a maximum-likelihood procedure. The estimates were obtained continuously, and an order-statistics filter was used to extract the most likely RT from the accumulated estimates. The procedure was illustrated for connected speech. Results obtained for simulated and real room data are in good agreement with the real RT values.  相似文献   

15.
Speech signals recorded with a distant microphone usually are interfered by the spatial reverberation in the room, which severely degrades the clarity and intelligibility of speech. A speech dereverberation method based on spectral subtraction and spectral line enhancement is proposed in this paper. Following the generalized statistical reverberation model, the power spectrum of late reverberation is estimated and removed from the reverberation speech by the spectral subtraction method. Then, according to the human auditory model, a spectral line enhancement technique based on adaptive post-filtering is adopted to further eliminate the reverberant components between adjacent speech formants. The proposed method can effectively suppress the spatial reverberation and improve the auditory perception of speech. The subjective and objective evaluation results reveal that the perceptual quality of speech is greatly improved by the proposed method.  相似文献   

16.
Little is known about the extent to which reverberation affects speech intelligibility by cochlear implant (CI) listeners. Experiment 1 assessed CI users' performance using Institute of Electrical and Electronics Engineers (IEEE) sentences corrupted with varying degrees of reverberation. Reverberation times of 0.30, 0.60, 0.80, and 1.0 s were used. Results indicated that for all subjects tested, speech intelligibility decreased exponentially with an increase in reverberation time. A decaying-exponential model provided an excellent fit to the data. Experiment 2 evaluated (offline) a speech coding strategy for reverberation suppression using a channel-selection criterion based on the signal-to-reverberant ratio (SRR) of individual frequency channels. The SRR reflects implicitly the ratio of the energies of the signal originating from the early (and direct) reflections and the signal originating from the late reflections. Channels with SRR larger than a preset threshold were selected, while channels with SRR smaller than the threshold were zeroed out. Results in a highly reverberant scenario indicated that the proposed strategy led to substantial gains (over 60 percentage points) in speech intelligibility over the subjects' daily strategy. Further analysis indicated that the proposed channel-selection criterion reduces the temporal envelope smearing effects introduced by reverberation and also diminishes the self-masking effects responsible for flattened formants.  相似文献   

17.
For a mixture of target speech and noise in anechoic conditions, the ideal binary mask is defined as follows: It selects the time-frequency units where target energy exceeds noise energy by a certain local threshold and cancels the other units. In this study, the definition of the ideal binary mask is extended to reverberant conditions. Given the division between early and late reflections in terms of speech intelligibility, three ideal binary masks can be defined: an ideal binary mask that uses the direct path of the target as the desired signal, an ideal binary mask that uses the direct path and early reflections of the target as the desired signal, and an ideal binary mask that uses the reverberant target as the desired signal. The effects of these ideal binary mask definitions on speech intelligibility are compared across two types of interference: speech shaped noise and concurrent female speech. As suggested by psychoacoustical studies, the ideal binary mask based on the direct path and early reflections of target speech outperforms the other masks as reverberation time increases and produces substantial reductions in terms of speech reception threshold for normal hearing listeners.  相似文献   

18.
The main drawback of minimum variance distortionless response (MVDR) beamformer is the cancellation of the desired speech signal and its degradation in multi-path wave propagation environment. To make the adaptive algorithm robust against room reverberation and to prevent desired signal cancellation an estimation of unknown desired speaker's transfer function was proposed. The estimation is based on the signal and the interference covariance matrices. The estimated transfer function is then applied to the MVDR beamformer. The proposed algorithm was tested on a simulated room with reverberation. The results showed better quality of the restored speech compared to some typical adaptive algorithms.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号