共查询到18条相似文献,搜索用时 125 毫秒
1.
针对语音无线通信中带宽资源受限的问题,提出基于压缩采样的低速率语音编码算法。以基尼系数为指标,比较不同稀疏变换域下语音信号的稀疏性,分析常见重构算法对语音信号压缩采样观测信号的重构特性。对标准耳蜗滤波器——伽马啁啾滤波器组的参数进行研究,并以梯度投影稀疏重建(GPSR)算法重构语音信号。利用语音质量感知评估(PESQ)、信噪比和主观听觉测试,对编解码后的合成语音信号进行了质量评估。实验表明,基于压缩感知的语音编码器以4 kbps的低速率对语音进行编码时,PESQ得分可达到3.16,计算复杂度相对较低,可以用于实际的语音编码环境。 相似文献
2.
中远距离(>10 km)水声语音通信时,由于可利用带宽窄、复杂多变等不利因素对信息传输率的制约,语音编码速率应降到尽可能的低。利用水声信道传播时延大的特点,结合人耳听觉感知的特性,在深入研究混合激励线性预测编码(MELP)标准之后,提出一种语音编码速率可调节的变比特率语音编码算法。其平均码速率约600 bps,主观语音质量评估平均得分(PESQ MOS)约2.8分。对该编码算法性能进行了计算机仿真和海上实验验证。实验及仿真表明,在误码率不高于10-3时,本算法表现良好且稳定,合成语音清晰可懂,易于辨认说话人。 相似文献
3.
4.
一种高质量短时延的4kb/s的语音编码算法 总被引:1,自引:0,他引:1
4kb/s低速率语音编码是近年来语音信号处理研究的重要课题,也是国际电信联盟电信委员会(ITU_T)下一步标准化的目标。本文提出一种混合激励的编码算法,它对浊音采用谐波激励,对清音采用码激励,该编码能在短时延(分析帧长为10ms,采样率为8kHz)的情况下获得很好的合成语音质量。本文分析了该算法的原理,并给出了模拟结果。 相似文献
5.
6.
不同语音帧的激励信号复杂性不同,所以采用相同个数的脉冲作为激励信号并不合理。针对这一点,提出了个数可变脉冲线性预测编码算法。该算法不固定脉冲个数,而是根据激励信号的复杂度而确定。个数可变脉冲线性预测编码的目的是用尽量少的脉冲数来满足误差约束,这可以看作一个稀疏表示问题。进而,给出了具体的脉冲搜索算法以及个数可变脉冲线性预测编码方案。实验结果发现增加脉冲可以减少误差,但是前面搜索出的脉冲对误差的贡献要大于后搜索出的脉冲。与G.723.1和G.729比较发现,个数可变脉冲线性预测编码可以在约4.2 kbps的编码速率下获得优于G.723.1的合成语音,但略差于G.729。本文算法的编码时间较长,是下一步需要解决的问题。 相似文献
7.
耳语是一种特殊发音方式,将耳语转换为正常语音是提升耳语质量和可懂度的关键方法。为了充分利用语音的频域和时域相关性实现耳语转换,提出了使用深度卷积神经网络(Deep Convolutional Neural Networks,DCNN)将耳语转换为正常语音。它的卷积层用来提取连续帧语音谱包络之间的频域与时域的相关特征,而全连接层用来拟合耳语在卷积层提取的特征和对应正常语音之间的映射关系。实验结果表明与深度神经网络(Deep Neural Networks,DNN)模型相比,DCNN模型获得的转换后语音的梅尔倒谱失真度(Cepstral Distance,CD)降低了4.64%,而语音质量感知评价(Perceptual Evaluation of Speech Quality,PESQ)、短时客观可懂度(Short-Time Objective Intelligibility,STOI)与平均主观意见分(Mean Opinion Score,MOS)分别提高了5.41%,5.77%,9.68%。 相似文献
8.
9.
语音质量的客观评价可以代替昂贵的人工评分,但是目前客观指标的计算通常需要纯净的参考语音,这在许多实际声学系统中很难获得。为此提出了一种融合辅助目标学习和卷积循环网络(CRN)的非侵入式语音质量评价算法。为降低算法的复杂度,算法采用基于仿人耳听觉特性滤波器的Bark频率倒谱系数(BFCCs)作为CRN的输入。算法首先构建一个卷积神经网络(CNN)从BFCCs中提取帧级特征。然后,构建双向的长短记忆网络,在帧级特征中建模长期的时间依赖性和序列特征。最后,利用自注意力机制自适应地从帧级特征中筛选出有用信息,将其整合至话语层面的特征中,并将这些话语级特征映射为客观得分。为改善质量评测的有效性,算法采用多任务训练策略,引入语音激活检测(VAD)作为辅助学习目标。基于开源数据库的实验显示,与其他非侵入式算法相比,提出的算法和平均主观意见分(MOS)具有更好的相关性。而且,算法参数规模较小且对ITU-T P.808发布的带有主观MOS的失真语音数据库具有良好的泛化能力,接近语音质量感知评估(PESQ)指标的精度。 相似文献
10.
11.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction. 相似文献
12.
13.
14.
15.
为了克服低信噪比输入下,语音增强造成语音清音中的弱分量损失,造成重构信号包络失真的问题。论文提出了一种新的语音增强方法。该方法根据语音感知模型,采用不完全小波包分解拟合语音临界频带,并对语音按子带能量进行清浊音区分处理,在阈值计算上,提出了一种清浊音分离,基于子带信号能量的小波包自适应阈值算法。通过仿真实验,客观评测和听音测试表明,该算法在低信噪比输入时较传统算法,能够更加有效地减少重构信号包络失真,在不损伤语音清晰度和自然度的前提下,使输出信噪比明显提高。将该算法与能量谱减法结合,进行二次增强能进一步提高降噪输出的语音质量。 相似文献
16.
It is well known that the non-stationary wideband noise is the most difficult to be removed in speech enhancement. In this paper a novel speech enhancement algorithm based on the dyadic wavelet transform and the simplified Karhunen-Loeve transform (KLT) is proposed to suppress the non-stationary wideband noise. The noisy speech is decomposed into components by the wavelet space and KLT-based vector space, and the components are processed and reconstructed, respectively, by distinguishing between voiced speech and unvoiced speech. There are no requirements of noise whitening and SNR pre-calculating. In order to evaluate the performance of this algorithm in more detail, a three-dimensional spectral distortion measure is introduced. Experiments and comparison between different speech enhancement systems by means of the distortion measure show that the proposed method has no drawbacks existing in the previous methods and performs better shaping and suppressing of the non-stationary wideband noise for speech enhancement. 相似文献
17.
D Burshtein 《The Journal of the Acoustical Society of America》1992,91(3):1531-1537
The well-known speech production model is considered, where the speech signal is modeled as the output of an all-pole filter driven either by some white noise sequence (unvoiced speech) or by the sum of a periodic excitation and a noise sequence (voiced speech). Approximate maximum-likelihood (ML) estimation algorithms for the unvoiced case are well known. The ML estimator of the parameters is obtained for the voiced speech model. These parameters consist of the parameters of the periodic excitation (pitch parameters) and the parameters of the filter [linear prediction coefficient (LPC) parameters]. The results of the application of the algorithm on simulated and on real speech data are presented. 相似文献
18.
Bettens F Grenez F Schoentgen J 《The Journal of the Acoustical Society of America》2005,117(1):328-337
The article presents an analysis of vocal dysperiodicities in connected speech produced by dysphonic speakers. The processing is based on a comparison of the present speech fragment with future and past fragments. The size of the dysperiodicity estimate is zero for periodic speech signals. A feeble increase of the vocal dysperiodicity is guaranteed to produce a feeble increase of the estimate. No spurious noise boosting occurs owing to cycle insertion and omission errors, or phonetic segment boundary artifacts. Additional objectives of the study have been investigating whether deviations from periodicity are larger or more commonplace in connected speech than in sustained vowels, and whether sentences that comprise frequent voice onsets and offsets are noisier than sentences that comprise few. The corpora contain sustained vowels as well as grammatically- and phonetically matched sentences. An acoustic marker that correlates with the perceived degree of hoarseness summarizes the size of the dysperiodicities. The marker values for sustained vowels have been highly correlated with those for connected speech, and the marker values for sentences that comprise few voiced/unvoiced transients have been highly correlated with the marker values for sentences that comprise many. 相似文献