首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到18条相似文献,搜索用时 78 毫秒
1.
黄德智  蔡莲红 《声学学报》2006,31(6):542-548
在源滤波器模型的基础上,利用统计学习方法,建立了一种面向声音变换的混合参数化模型。该模型包括浊音声学模型、清音声学模型和韵律补偿模型三部分。基于线性预测分析和mel倒谱分析的浊音声学模型,刻画了说话人声腔的共振特性。基于线性预测分析和噪声源分析的清音声学模型,反映了说话人发清音的特点。基于统计学习方法的韵律补偿模型描述了音高、能量与时长等分布特性。在该混合参数化模型的基础上,提出了一个声音变换算法,并将其应用到汉语音节的变换问题上。实验结果表明,对清浊音和韵律特性分别建模的变换算法能够提高重建语音的清晰度和可懂度,缩小重建语音与目标语音之间的感知距离,使重建语音具有目标说话人的韵律特征.  相似文献   

2.
汉语语音的非线性动力学特征及其降噪应用   总被引:4,自引:0,他引:4  
分析了汉语语音的相关维、最小嵌入维数以及重构相图。分析结果表明汉语语音具有混沌特征。根据这些非线性特征可以有效区分汉语中的浊音、清音和随机噪声,从而可以用于语音降噪。介绍了本地投影法混沌语音降噪的原理与算法,并利用该算法对一些典型的元音和辅音进行降噪,获得了较好的降噪效果。  相似文献   

3.
本文介绍一种预分浊音型的LPC基音提取算法,对语言信号先用线性预测系数a_0和a_1的差值分出浊音区,然后只对浊音部分进行基音提取。提取基频时,数据率减半,用LPC的自相关方法产生8个预测系数的倒滤波器,倒滤波后的误差信号,用平均幅差函数(AMDF)方法提取基频,再线性插值,最后用非线性平滑滤波,并将所得结果和一个半自动的精确算法,以及简化倒滤波(SIFT)算法进行比较,说明我们提出的算法,对背景噪声40dB以下的连续语言是准确有效的。它避免了清音和无声间隙区的音调计算,且浊音和清音的判别比较准确。  相似文献   

4.
肖东  莫福源  陈庚  马力 《应用声学》2016,35(1):77-83
过渡段对语音清晰度、可懂度和人耳听觉感知都起到不可忽视的作用。参数语音编码中,包含有过渡段的语音帧能否得到恰当处理,是决定其合成语音是否清晰可懂的关键。本文以混合激励线性预测编码为参考,将其中的语音帧划分为静音、清音、浊音、过渡四大类后分别处理,在以往低码率语音编码(1 kbps)工作基础上,比较了八种过渡帧划分方法对合成语音PESQ MOS的影响。经分析后发现:不同的过渡帧对PESQ MOS的贡献也不同。由清、静音向浊音变化的过渡帧的贡献最大;介于浊辅音与元音之间的过渡帧的贡献也不应被忽略。  相似文献   

5.
本文介绍了一种实用的汉语语音合成系统,本系统采用线性预测方法对语音信号进行分析与合成,并对码本进行矢量量化,码率压缩到1200bit/s,大幅度降低了语音的存储能量,合成出的语音仍具有较高的清晰度.这种以数字信号处理专用芯片TMS320C10为处理器的便携式语音合成器,合成语音数量大,成本低,功耗较小,既可联机使用,也可用电池供电,自成系统.实际应用证明,汉语多功能语音合成器实用性强,用途广泛,适于推广.  相似文献   

6.
为了克服低信噪比输入下,语音增强造成语音清音中的弱分量损失,造成重构信号包络失真的问题。论文提出了一种新的语音增强方法。该方法根据语音感知模型,采用不完全小波包分解拟合语音临界频带,并对语音按子带能量进行清浊音区分处理,在阈值计算上,提出了一种清浊音分离,基于子带信号能量的小波包自适应阈值算法。通过仿真实验,客观评测和听音测试表明,该算法在低信噪比输入时较传统算法,能够更加有效地减少重构信号包络失真,在不损伤语音清晰度和自然度的前提下,使输出信噪比明显提高。将该算法与能量谱减法结合,进行二次增强能进一步提高降噪输出的语音质量。  相似文献   

7.
说话人辨认中有效参数的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
说话人辨认技术是语音识别中的一项重要应用,在我们研究的系统中,LPC参数并不是都很有效,我们用统计分析方法对12个预测系数、12个部分相关系数、12个对数面积比系数、12个倒谱系数、12个相关系数、短时能量、短时平均过零率及基音共63个参数,进行方差比检验,用10名男青年的三个元音的发音,在半年内采集97种语音作为试验材料,选出15个方差比比较大的作为识别参数,识别率为89.19%,采用样本刷新技术后,识别率达到97.3%。  相似文献   

8.
程山英 《应用声学》2017,25(8):155-158
为满足交通控制和诱导系统的实时性需求,减少交通拥挤状况,降低交通事故突发频率,需要对短时交通流进行预测。当前的短时交通流预测方法是采用K-近邻的非参数回归对其进行预测,预测过程中没有将预测模型中关键因素对交通流的影响进行详细的说明,导致预测结果不准确,存在短时交通流预测误差较大的问题。为此,提出一种基于模糊神经网络的短时交通流预测方法。该方法首先以历史短时交通流数据样本序列为基础,将提取的关联维数作为短时交通流的混沌特征量,然后以该特征量为依据,对短时交通流数据进行聚类,使相同的短时交通流聚合类样本比不同的交通流聚合类样本更为贴近,采用高斯过程回归对短时交通流预测模型进行建设,建设过程中利用差分方法对短时交通流预测序列进行平稳化操作之后,对短时交通流预测模型进行训练,将GPR模型引入至短时交通流预测过程中,得到交通流预测方差估计值,并确定交通流预测值置信区间,由此实现短时交通流的预测。由此实现短时交通流的预测。实验结果证明,所提方法可以准确地预测交通运输系统的实时状况,为车辆行驶的最佳路线进行了有效引导,减少了自然影响方面和人为因素对短时交通流预测结果的干扰,为交通部门对交通路况的控制管理提供了依据。  相似文献   

9.
本文提出了语音信号的一种时域-频域-能量表示,并给出了算法,可用于孤立词语音识别,这种时域-频域-能量表示有两个特点,基于短时能量梯度的非线性时间规正,可保留语音信号频域的过滤特性,丢掉其稳态特性,计算量小,适于实时应用。  相似文献   

10.
本文提出了语音信号的一种时域─频域─能量表示,并给出了算法,可用于孤立词语音识别.这种时域─频域─能量表示有两个特点:基于短时能量梯度的非线性时间规正,可保留语音信号频域的过渡特性,丢掉其稳态特性;计算量小,适于实时应用.  相似文献   

11.
The well-known speech production model is considered, where the speech signal is modeled as the output of an all-pole filter driven either by some white noise sequence (unvoiced speech) or by the sum of a periodic excitation and a noise sequence (voiced speech). Approximate maximum-likelihood (ML) estimation algorithms for the unvoiced case are well known. The ML estimator of the parameters is obtained for the voiced speech model. These parameters consist of the parameters of the periodic excitation (pitch parameters) and the parameters of the filter [linear prediction coefficient (LPC) parameters]. The results of the application of the algorithm on simulated and on real speech data are presented.  相似文献   

12.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.  相似文献   

13.
李国锋  刘莹 《应用声学》1996,15(5):41-44
本文介绍了一种利用复倒谱来实现气声发音重建的方法。首先分析了气声发音的语音特征;进而在复倒谱序列中加入基频特征使其恢复到正常的语音。对元音[a]以及实际语音段进行了处理,均有较好的效果。  相似文献   

14.
利用倒谱方法实现气声发育的重建   总被引:1,自引:0,他引:1       下载免费PDF全文
李国锋  刘莹 《应用声学》1996,15(5):41-44
本文介绍了一种利用复倒谱来实现气声发音重建的方法,首先分析了气声发音的语音特征;进而在复倒谱序列中加入基频率特征其恢复到正常的语音,对元音(a)以及实际语音段进行了处理。均有较好的效果。  相似文献   

15.
A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the noise source within the vocal tract. Analysis of fricatives [see text] demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay.  相似文献   

16.
It is well known that the non-stationary wideband noise is the most difficult to be removed in speech enhancement. In this paper a novel speech enhancement algorithm based on the dyadic wavelet transform and the simplified Karhunen-Loeve transform (KLT) is proposed to suppress the non-stationary wideband noise. The noisy speech is decomposed into components by the wavelet space and KLT-based vector space, and the components are processed and reconstructed, respectively, by distinguishing between voiced speech and unvoiced speech. There are no requirements of noise whitening and SNR pre-calculating. In order to evaluate the performance of this algorithm in more detail, a three-dimensional spectral distortion measure is introduced. Experiments and comparison between different speech enhancement systems by means of the distortion measure show that the proposed method has no drawbacks existing in the previous methods and performs better shaping and suppressing of the non-stationary wideband noise for speech enhancement.  相似文献   

17.
A voiced speech signal can be expressed as a sum of sinusoidal components of which instantaneous frequency and amplitude continuously vary with time. Determining these parameters from the input, the time-varying characteristics are crucial error sources for the algorithms, which assume their stationarity within a local analysis segment. To overcome this problem, a new method is proposed, local vector transform (LVT), which can determine instantaneous frequency and amplitude for nonstationary sinusoids. The method does not assume the local stationarity. The effectiveness of LVT was examined in parameter determination for synthesized and naturally uttered speech signals. The instantaneous frequency for the first harmonic component was determined with an accuracy almost equal to that of the time-corrected instantaneous frequency method and higher accuracy than that of spectral peak-picking, autocorrelation, and cepstrum. The instantaneous amplitude was also determined accurately by LVT while considerable errors were left in the other algorithms. The signal reconstructed from the determined parameters by LVT agreed well with the corresponding component of voiced speech. These results suggest that the method is effective for analyzing time-varying voiced speech signals.  相似文献   

18.
In this paper, an accurate pitch and voiced/unvoiced determination algorithm for speech analysis is described. The algorithm is called AMPEX (auditory model-based pitch extractor) and it performs a temporal analysis of the outputs emerging from a new auditory model. However, in spite of its use of an auditory model, AMPEX should not be regarded as a substitute for any psychophysical theory of human auditory pitch perception. What is mainly described is the design of a computationally efficient auditory model, the perceptually motivated determination of the model parameters, the conception of a reliable pitch extractor for speech analysis, and the elaboration of an experimental procedure for evaluating the performance of such a pitch extractor. In the course of the evaluation experiment several kinds of speech stimuli including clean speech, bandpass-filtered speech, and noisy speech were presented to three different pitch extractors. The experimental results clearly indicate that AMPEX outperforms the best algorithms available.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号