期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

基于高斯混合模型的语音带宽扩展算法的研究 总被引：2，自引：0，他引：2

张勇胡瑞敏《声学学报》2009,34(5):471-480

为了降低高带谱失真,研究了带宽扩展算法中特征参数与高带谱包络的互信息和高带谱失真之间的函数关系,并在此基础上提出了一种扩展高斯混合模型带宽扩展算法。首先,算法选择与高带谱包络互信息大的参数构成特征矢量,并根据高斯混合模型计算特征矢量与高带谱包络的联合概率密度。其次,采用Expectation-Maximization(EM)算法估计高斯分量模型参数并计算后验概率。最后,通过后验概率估计高带谱包络。实验结果表明,与传统的高斯混合模型带宽扩展算法相比,本文算法可降低0.3 dB的高带平均谱失真,将谱失真大于10dB的语音帧减少了50%以上。相似文献

2.

窄带语音带宽扩展算法研究 总被引：1，自引：0，他引：1

张勇刘轶《声学学报》2014,39(6):764-773

为了降低谱失真,提出了一种基于隐马尔科夫模型的窄带语音带宽扩展算法。首先,算法选取与宽带谱包络互信息大的参数构成特征矢量,并利用隐马尔可夫状态和过去观察特征矢量的联合先验概率估计条件后验概率。其次,以条件后验概率为基础,算法结合贝叶斯条件参数估计法和最小均方差准则估计宽带谱包络。针对宽带激励信号估计,基于信号高频和低频的谐波相关性,提出了一种中频激励扩展算法。实验结果表明,与传统的基于隐马尔可夫模型的带宽扩展算法相比,本文算法可降低0.187 dB的平均谱失真,将谱失真大于10 dB的语音帧减少了34.3%。相似文献

3.

面向语音增强的约束序贯高斯混合模型噪声功率谱估计 总被引：1，自引：0，他引：1

下载免费PDF全文

许春冬张震战鸽应冬文李军锋颜永红《声学学报》2017,42(5):633-640

提出了一种基于极大似然的噪声对数功率谱估计方法,采用高斯混合模型对每一个频带上的功率谱包络构建统计模型,将时序包络划分为语音和非语音类,它们分别对应于高斯混合模型的两个高斯分量,描述语音和非语音的统计分布,其中非语音高斯分量的均值即为噪声功率谱的最优估计.采用序贯学习的方法,在极大似然准则下逐帧更新模型参数,并逐帧给出噪声功率谱的最优估计值。此外,由于序贯更新过程中语音信号长时缺失,容易导致模型失稳,提出了一种在线的最小描述长度准则(MDL)来判断语音信号是否长时缺失,从而保证了模型的稳定性.实验表明,算法性能整体优于经典的MS和IMCRA算法。相似文献

4.

连续音素的改进深信度网络的识别算法*

下载免费PDF全文

阴法明赵焱赵力《应用声学》2019,38(1):39-44

为提高连续语音识别中的音素识别率,提出一种基于改进并行回火训练的受限波尔兹曼机的音素识别算法。首先,利用经过等能量划分后的改进并行回火算法来训练受限玻尔兹曼机,接着将受限玻尔兹曼机堆叠组成一个深信度网络,从而作为深度神经网络预训练的基础模型,然后通过softmax层输出,得到用于音素状态后验概率检测的深度神经网络。接着,利用少量的标签数据,根据反向传播算法对网络权重进行微调。最后,将所得后验概率作为隐马尔科夫的发射概率,然后利用Viterbi解码器实现音素识别。在TIMIT语料库上的实验表明,识别率相比于传统的对比散度类算法提高了约4.5%,在不增加计算量的情况下比原始并行回火算法提高约1%。相似文献

5.

基于混合映射模型的语音转换算法研究 总被引：3，自引：0，他引：3

康永国双志伟陶建华张维《声学学报》2006,31(6):555-562

分析了语音转换研究中使用高斯混合模型映射算法时转换特征出现过平滑的问题,认为协方差矩阵估计不准确导致的转换特征细节信息的丢失是产生过平滑问题的主要原因,提出了使用码本映射和高斯混合模型共同转换声学特征细节的混合映射算法。此外提出了利用音素信息进行快速高斯混合模型训练的训练方法。客观评价表明使用音素信息的训练方法比常规方法性能指标平均提高了12．87％,而混合映射算法在使用音素信息的训练方法基础上比传统高斯混合模型转换算法性能指标提高了27．13％相似文献

6.

帧同步混合小波包变换模拟听觉模型的语音增强的研究

朱学文杨道淳王炜牟峰徐柏龄《声学学报》2003,(1)

首先介绍了帧同步混合小波包的分析方法。该方法结合了小波包时频窗口可变的特点和STFT的分帧处理形式。它既能够保证语音信号处理中帧长的要求,即可实时处理,义能获得对信号频域上的最佳分解,是一种类似FFT的小波包的快速算法。在此基础上,应用该方法模拟了听觉模型,并运用于语音增强。实验表明,即使在-5 dB低信噪比的条件下,也能获得良好的除噪效果和听觉效果。该方法还可运用于语音的编码、合成和识别等领域。相似文献

7.

采用压缩感知的改进的语音转换算法

简志华王向文《声学学报》2014,39(3):400-406

提出了一种基于压缩感知的考虑语音帧间信息的语音转换算法。根据连续多帧语音的线谱对参数所构成的矢量在离散余弦变换域具有稀疏性,利用压缩感知技术对该矢量压缩成短矢量,并将该压缩后的短矢量作为特征参数训练语音转换函数。实验测试结果表明,选择合适的语音帧数时,该算法的性能要比传统的采用加权频率卷绕的转换算法提高3.21%。这说明,充分有效地利用语音帧间的相关信息会使转换语音保持更稳定的帧间声学特性,有利于提高语音转换系统的性能, 相似文献

8.

面向自定义语音唤醒的关键词相关的单通道语音增强

下载免费PDF全文

刘作桢吴愁黎塔赵庆卫《声学学报》2023,48(2):415-424

提出一种面向自定义语音唤醒的单通道语音增强方法。该方法预先将关键词音素信息存入文本编码矩阵,并在常规语音增强模型基础上添加一个基于注意力机制的音素偏置模块。该模块利用语音增强模型中间特征从文本编码矩阵中获取当前帧的音素信息,并将其融入语音增强模型的后续计算中,从而提升语音增强模型对关键词相关音素的增强效果。在不同噪声环境下的实验结果表明,该方法可以更有效地抑制关键词部分噪声。同时所提出方法对比常规语音增强方法与其他文本相关语音增强方法,在自定义语音唤醒性能上可以分别获得14.3%和7.6%的相对提升。相似文献

9.

基于谱间线性滤波的高光谱图像压缩感知 总被引：2，自引：1，他引：1

计振兴孔繁锵《光子学报》2012,41(1):82-86

根据高光谱图像较强的谱间相关性,提出一种基于谱间线性滤波的高光谱图像压缩感知方法.高光谱图像进行压缩重构时,利用相邻波谱的谱间相关性,对重构的当前帧与前一谱段的重构图像进行谱间线性滤波,降低了重构帧的噪音信息,纠正了重构帧的轮廓信息,从而提高重构质量.在进行谱间线性滤波时,保留重构帧的低频系数,高频系数与前一波谱重构图像的高频小波变换系数进行线性加权求和,达到滤波的效果.通过实验表明,该方法能够有效提升图像重构质量,并降低重构时间. 相似文献

10.

用于三维重建的点云单应性迭代最近点配准算法

《光学学报》2015,(5)

点云配准是光学三维(3D)轮廓测量术的关键技术之一。无标志点的点云配准大多由迭代最近点(ICP)算法实现。为提高ICP算法的性能,提出了一种基于点云单应性的迭代最近点配准算法。描述了该算法中单应性点对的建立方法,并推导了点云之间的坐标变换。用一种手持式三维轮廓扫描仪对一个同时具备高频轮廓和低频轮廓的石膏像进行扫描,共得到92帧点云。利用改进ICP算法,82帧点云被成功配准。同时也利用三种具有代表性的ICP算法对这92帧点云进行配准实验以作比较。实验表明,该算法具有稳健性强、收敛速度快、收敛精度高的优点,有助于三维模型的快速重建。相似文献

11.

Zeming Liu Tian Chen Keming Wei Guanzheng Liu Bin Liu 《Entropy (Basel, Switzerland)》2021,23(12)

Congestive heart failure (CHF) is a chronic cardiovascular condition associated with dysfunction of the autonomic nervous system (ANS). Heart rate variability (HRV) has been widely used to assess ANS. This paper proposes a new HRV analysis method, which uses information-based similarity (IBS) transformation and fuzzy approximate entropy (fApEn) algorithm to obtain the fApEn_IBS index, which is used to observe the complexity of autonomic fluctuations in CHF within 24 h. We used 98 ECG records (54 health records and 44 CHF records) from the PhysioNet database. The fApEn_IBS index was statistically significant between the control and CHF groups (p < 0.001). Compared with the classical indices low-to-high frequency power ratio (LF/HF) and IBS, the fApEn_IBS index further utilizes the changes in the rhythm of heart rate (HR) fluctuations between RR intervals to fully extract relevant information between adjacent time intervals and significantly improves the performance of CHF screening. The CHF classification accuracy of fApEn_IBS was 84.69%, higher than LF/HF (77.55%) and IBS (83.67%). Moreover, the combination of IBS, fApEn_IBS, and LF/HF reached the highest CHF screening accuracy (98.98%) with the random forest (RF) classifier, indicating that the IBS and LF/HF had good complementarity. Therefore, fApEn_IBS effusively reflects the complexity of autonomic nerves in CHF and is a valuable CHF assessment tool. 相似文献

12.

Contact phase modulation method for acoustic nonlinear parameter measurement in solid 总被引：1，自引：0，他引：1

Vila M Vander Meulen F Dos Santos S Haumesser L Bou Matar O 《Ultrasonics》2004,42(1-9):1061-1065

In this work, a new method to measure in contact the nonlinearity parameter beta of solid plates is presented. A high frequency (HF) tone-burst signal of 20 MHz is inserted in the material by a contact-transducer (with a suitable coupling). A low frequency (LF) pulse (2.5 MHz) is applied to the other face, in the opposite direction, so that the nonlinear interaction of the two waves takes place during the back propagation toward the HF transducer. This collinear interaction creates a phase modulation of the HF tone-burst which is proportional to the beta coefficient and the particle velocity of the LF wave. To determine this particle velocity, in time domain, an extended self-reciprocity calibration of the contact LF transducer is used. A numeric phase demodulation is then performed, giving the beta coefficient of the sample. The proposed method is validated by nonlinearity parameter measurements in Fused Silica. The nonlinear parameter of Fused Silica measured is found to be in good agreement with the literature, and specially the negative sign of this parameter. 相似文献

13.

Nonlinear propagation delay and pulse distortion resulting from dual frequency band transmit pulse complexes

Hansen R Måsøy SE Tangen TA Angelsen BA 《The Journal of the Acoustical Society of America》2011,129(2):1117-1127

A method of acoustic imaging is discussed that potentially can improve the diagnostic capabilities of medical ultrasound. The method, given the name second order ultrasound field imaging, is achieved by the processing of the received signals from transmitted dual frequency band pulse complexes with at least partly overlapping high frequency (HF) and low frequency (LF) pulses. The transmitted HF pulses are used for image reconstruction whereas the transmitted LF pulses are used to manipulate the elastic properties of the medium observed by the HF imaging pulses. In the present paper, nonlinear propagation effects observed by a HF imaging pulse due to the presence of a LF manipulation pulse is discussed. When using dual frequency band transmit pulse complexes with a large separation in center frequency (e.g., 1:10), these nonlinear propagation effects are manifested as a nonlinear HF propagation delay and a HF pulse distortion different from conventional harmonic distortion. In addition, with different transmit foci for the HF and LF pulses, nonlinear aberration will occur. 相似文献

14.

Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance

Jafari A Almasganj F Bidhendi MN 《Chaos (Woodbury, N.Y.)》2010,20(3):033106

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincare? section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincare? sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincare? sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features. 相似文献

15.

短时频谱通用背景模型群联合韵律的年龄语音转换

下载免费PDF全文

惠琳俞一彪《声学学报》2017,42(6):762-768

提出一种短时频谱通用背景模型群与韵律参数相结合进行年龄语音转换的方法。谱参数转换方面,同一年龄段各说话者提取语音短时谱系数并建立高斯混合模型,然后依据语音特征相似性对说话者进行聚类,每一类训练一个通用背景模型,最终得到通用背景模型群和一组短时频谱转换函数。谱参数转换之后再对共振峰进一步微调。韵律参数转换方面,基频和语速分别建立单高斯和平均时长率模型来推导转换函数。实验结果显示,提出的方法在ABX和MOS等评价指标上比传统的双线性法有明显的优势,相对单一通用背景模型法的对数似然度变化率提高了4%。这一结果表明提出的方法能够使转换语音具有良好目标倾向性的同时有较好的语音质量,性能较传统方法有明显提升。相似文献

16.

卷积噪声环境下语音信号鲁棒特征提取

吕钊吴小培张超李密《声学学报》2010,35(4):465-470

提出了一种基于独立分量分析(ICA)的语音信号鲁棒特征提取算法,用以解决在卷积噪声环境下语音信号的训练与识别特征不匹配的问题。该算法通过短时傅里叶变换将带噪语音信号从时域转换到频域后,采用复值ICA方法从带噪语音的短时谱中分离出语音信号的短时谱,然后根据所得到的语音信号短时谱计算美尔倒谱系数(MFCC)及其一阶差分作为特征参数。在仿真与真实环境下汉语数字语音识别实验中,所提算法相比较传统的MFCC其识别正确率分别提升了34.8%和32.6%。实验结果表明基于ICA方法的语音特征在卷积噪声环境下具有良好的鲁棒性。相似文献

17.

层叠式“产生/判别”混合模型的语音情感识别

下载免费PDF全文

黄永明章国宝董飞李悦《声学学报》2013,38(2):231-240

提出了层叠式“产生/判别”混合模型的语音情感识别方法。首先,提取63维语句级特征,运用Fisher从中选择12个最佳的语句级特征,建立小波神经网络(WNN)的层叠式产生式模型进行语音情感识别;然后提取69维帧级特征,采用SFS选择出待使用的8维特征,将高斯混合模型(GMM)进行多维概率输出,建立层叠式“产生/判别”混合模型进行语音情感识别。实验结果显示:(1)层叠式“产生/判别”混合模型较单独WNN、GMM、HMM (隐马尔可夫模型)、SVM (支持向量机)的识别率要高;(2)层叠式“产生/判决式”混合模型识别率较基于WNN的层叠产生式模型高;(3) M=13,D维GMM-MAP/SVM (MAP,最大后验概率)串联融合模型为最优的层叠式“产生/判别”混合模型,能获得最高85.1%的识别率。相似文献

18.

Single-image super-resolution with joint-optimization of TV regularization and sparse representation

Jinzheng Lu Bin Wu 《Optik》2014

A super-resolution (SR) reconstruction framework is proposed using regularization restoration combined with learning-based resolution enhancement via sparse representation. With the viewpoint of conventional learning methods, the original image can be split into low frequency (LF) and high frequency (HF) components. The reconstruction mainly focuses on the process of HF part, while the LF one is founded simply by typical interpolation function. For the severely blurred single-image, we first use regularization restoration technology to recover it. Then the regularized output remarkably betters the quality of LF used in traditional learning-based methods. Last, image resolution enhancement with characteristic of edge preserving can implement based on the acquired relatively sharp intermediate image and the pre-constructed over-complete dictionary for sparse representation. Specifically, the regularization can favorably weaken the dependence of atoms on the course of degradation. With both techniques, we can noticeably eliminate the blur and the edge artifacts in the enlarged image simultaneously. Various experimental results demonstrate that the proposed approach can produce visually pleasing resolution for severely blurred image. 相似文献