期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

武朋辉杨百龙时磊《应用声学》2015,34(1):17-23

针对语音无线通信中带宽资源受限的问题,提出基于压缩采样的低速率语音编码算法。以基尼系数为指标,比较不同稀疏变换域下语音信号的稀疏性,分析常见重构算法对语音信号压缩采样观测信号的重构特性。对标准耳蜗滤波器——伽马啁啾滤波器组的参数进行研究,并以梯度投影稀疏重建(GPSR)算法重构语音信号。利用语音质量感知评估(PESQ)、信噪比和主观听觉测试,对编解码后的合成语音信号进行了质量评估。实验表明,基于压缩感知的语音编码器以4 kbps的低速率对语音进行编码时,PESQ得分可达到3.16,计算复杂度相对较低,可以用于实际的语音编码环境。相似文献

2.

水声通信中低码速率语音编码算法的研究

下载免费PDF全文

肖东莫福源陈庚郭圣明马力《声学学报》2013,38(5):589-596

中远距离(>10 km)水声语音通信时,由于可利用带宽窄、复杂多变等不利因素对信息传输率的制约,语音编码速率应降到尽可能的低。利用水声信道传播时延大的特点,结合人耳听觉感知的特性,在深入研究混合激励线性预测编码(MELP)标准之后,提出一种语音编码速率可调节的变比特率语音编码算法。其平均码速率约600 bps,主观语音质量评估平均得分(PESQ MOS)约2.8分。对该编码算法性能进行了计算机仿真和海上实验验证。实验及仿真表明,在误码率不高于10-3时,本算法表现良好且稳定,合成语音清晰可懂,易于辨认说话人。相似文献

3.

量子语音多带激励算法

下载免费PDF全文

梁彦霞聂敏刘欣张美玲姜静《物理学报》2014,(12):21-25

将经典语音多带激励(MBE)算法应用于量子领域,提出经典信息与量子信息的对应关系,并给出相应的信息测度方法.对量子语音MBE的编解码算法用C语言进行仿真实验,结果证明,语音分解与合成前后的波形相近,用PESQ软件客观测试语音,MOS分数为3.337. 相似文献

4.

一种高质量短时延的4kb/s的语音编码算法 总被引：1，自引：0，他引：1

秦龙成立新《声学学报》2002,(4)

４ｋｂ/ｓ低速率语音编码是近年来语音信号处理研究的重要课题,也是国际电信联盟电信委员会(ＩＴＵ_Ｔ)下一步标准化的目标。本文提出一种混合激励的编码算法,它对浊音采用谐波激励,对清音采用码激励,该编码能在短时延(分析帧长为１０ｍｓ,采样率为８ｋＨｚ)的情况下获得很好的合成语音质量。本文分析了该算法的原理,并给出了模拟结果。相似文献

5.

改进LVAMDF及综合多因素基音检测算法

薛帅强陈波陈菲 《应用声学》2016,24(4):253-256

在对语音信号静音、清音、浊音划分的基础上,针对语音信号周期特征明显段分布随机性问题,提出改进的变长度平均幅度差函数LVAMDF及综合多因素基音检测算法,该算法对语音信号进行周期特征明显段和周期特征不明显段的聚类划分,同时,获取周期特征明显语音段的基音周期,针对少数基音周期划分倍频或半频问题,提出识别、修正方法,其识别、修正率极高。在对大量真实语音处理中,能够精确的检测出语音特征明显段的基音周期端点,基本没有倍频和半频划分,并且和AMDF、ACF算法作了对比。相似文献

6.

个数可变脉冲线性预测编码研究

马震《应用声学》2017,36(1):48-53

不同语音帧的激励信号复杂性不同,所以采用相同个数的脉冲作为激励信号并不合理。针对这一点,提出了个数可变脉冲线性预测编码算法。该算法不固定脉冲个数,而是根据激励信号的复杂度而确定。个数可变脉冲线性预测编码的目的是用尽量少的脉冲数来满足误差约束,这可以看作一个稀疏表示问题。进而,给出了具体的脉冲搜索算法以及个数可变脉冲线性预测编码方案。实验结果发现增加脉冲可以减少误差,但是前面搜索出的脉冲对误差的贡献要大于后搜索出的脉冲。与G.723.1和G.729比较发现,个数可变脉冲线性预测编码可以在约4.2 kbps的编码速率下获得优于G.723.1的合成语音,但略差于G.729。本文算法的编码时间较长,是下一步需要解决的问题。相似文献

7.

利用深度卷积神经网络将耳语转换为正常语音 总被引：1，自引：0，他引：1

下载免费PDF全文

连海伦周健胡雨婷郑文明《声学学报》2020,45(1):137-144

耳语是一种特殊发音方式,将耳语转换为正常语音是提升耳语质量和可懂度的关键方法。为了充分利用语音的频域和时域相关性实现耳语转换,提出了使用深度卷积神经网络(Deep Convolutional Neural Networks,DCNN)将耳语转换为正常语音。它的卷积层用来提取连续帧语音谱包络之间的频域与时域的相关特征,而全连接层用来拟合耳语在卷积层提取的特征和对应正常语音之间的映射关系。实验结果表明与深度神经网络(Deep Neural Networks,DNN)模型相比,DCNN模型获得的转换后语音的梅尔倒谱失真度(Cepstral Distance,CD)降低了4.64%,而语音质量感知评价(Perceptual Evaluation of Speech Quality,PESQ)、短时客观可懂度(Short-Time Objective Intelligibility,STOI)与平均主观意见分(Mean Opinion Score,MOS)分别提高了5.41%,5.77%,9.68%。相似文献

8.

帧同步混合小波包变换模拟听觉模型的语音增强的研究

朱学文杨道淳王炜牟峰徐柏龄《声学学报》2003,(1)

首先介绍了帧同步混合小波包的分析方法。该方法结合了小波包时频窗口可变的特点和STFT的分帧处理形式。它既能够保证语音信号处理中帧长的要求,即可实时处理,义能获得对信号频域上的最佳分解,是一种类似FFT的小波包的快速算法。在此基础上,应用该方法模拟了听觉模型,并运用于语音增强。实验表明,即使在-5 dB低信噪比的条件下,也能获得良好的除噪效果和听觉效果。该方法还可运用于语音的编码、合成和识别等领域。相似文献

9.

融合辅助目标学习和卷积循环网络的非侵入式语音质量评价算法

下载免费PDF全文

唐闺臣梁瑞宇孔凡留谢跃鞠梦洁《声学学报》2022,47(5):692-702

语音质量的客观评价可以代替昂贵的人工评分,但是目前客观指标的计算通常需要纯净的参考语音,这在许多实际声学系统中很难获得。为此提出了一种融合辅助目标学习和卷积循环网络（CRN）的非侵入式语音质量评价算法。为降低算法的复杂度,算法采用基于仿人耳听觉特性滤波器的Bark频率倒谱系数（BFCCs）作为CRN的输入。算法首先构建一个卷积神经网络（CNN）从BFCCs中提取帧级特征。然后,构建双向的长短记忆网络,在帧级特征中建模长期的时间依赖性和序列特征。最后,利用自注意力机制自适应地从帧级特征中筛选出有用信息,将其整合至话语层面的特征中,并将这些话语级特征映射为客观得分。为改善质量评测的有效性,算法采用多任务训练策略,引入语音激活检测（VAD）作为辅助学习目标。基于开源数据库的实验显示,与其他非侵入式算法相比,提出的算法和平均主观意见分（MOS）具有更好的相关性。而且,算法参数规模较小且对ITU-T P.808发布的带有主观MOS的失真语音数据库具有良好的泛化能力,接近语音质量感知评估（PESQ）指标的精度。相似文献

10.

语音编码技术及其硬件实现

张鑫孙峰邓代竹《光学与光电技术》2009,7(5):12-14

压缩编码技术是无线语音通信的关键技术之一。介绍了语音编码技术的基本概念及分类,并选用AMBE多带激励压缩编码算法,通过单片机控制专用语音压缩DSP芯片,提出了一种适合低速无线语音、数据实时通信的系统解决方案,完成了硬件、软件设计,实现了低速率下的语音、数据的同步大气传输。测试结果显示,在语音编码速率为2．4kbps以下时,仍然可以得到音质较好的语音输出。相似文献

11.

Segregation of unvoiced speech from nonspeech interference

Hu G Wang D 《The Journal of the Acoustical Society of America》2008,124(2):1306-1319

Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction. 相似文献

12.

利用倒谱方法实现气声发育的重建

下载免费PDF全文

李国锋刘莹《应用声学》1996,15(5):41-44

本文介绍了一种利用复倒谱来实现气声发音重建的方法。首先分析了气声发音的语音特征；进而在复倒谱序列中加入基频特征使其恢复到正常的语音。对元音［a］以及实际语音段进行了处理，均有较好的效果。相似文献

13.

利用倒谱方法实现气声发育的重建 总被引：1，自引：0，他引：1

下载免费PDF全文

李国锋刘莹《应用声学》1996,15(5):41-44

本文介绍了一种利用复倒谱来实现气声发音重建的方法，首先分析了气声发音的语音特征；进而在复倒谱序列中加入基频率特征其恢复到正常的语音，对元音（ａ）以及实际语音段进行了处理。均有较好的效果。相似文献

14.

短时非线性预测方法对汉语语音特性的研究

下载免费PDF全文

徐歆胡水清陶超杜功焕《应用声学》2003,22(5):36-40,44

本文应用Short[8-10]改进了的短时非线性预测方法对正常语速的汉语音节和短语进行了分析!究，揭示了汉语语音中浊音和清音的短时非线性预测能力的差异，并且发现这种差异即使在强背景噪声下仍能用短时非线性预测方法加以辨别。这些为浊音和清音的切分提供了一种可能性手段。相似文献

15.

小波包自适应阈值语音降噪新算法

下载免费PDF全文

田玉静左红伟董玉民王超《应用声学》2011,30(1):72-80

为了克服低信噪比输入下,语音增强造成语音清音中的弱分量损失,造成重构信号包络失真的问题。论文提出了一种新的语音增强方法。该方法根据语音感知模型,采用不完全小波包分解拟合语音临界频带,并对语音按子带能量进行清浊音区分处理,在阈值计算上,提出了一种清浊音分离,基于子带信号能量的小波包自适应阈值算法。通过仿真实验,客观评测和听音测试表明,该算法在低信噪比输入时较传统算法,能够更加有效地减少重构信号包络失真,在不损伤语音清晰度和自然度的前提下,使输出信噪比明显提高。将该算法与能量谱减法结合,进行二次增强能进一步提高降噪输出的语音质量。相似文献

16.

An approach based on simplified KLT and wavelet transform for enhancing speech degraded by non-stationary wideband noise

Hong Wei Lou Guang Rui Hu 《Journal of sound and vibration》2003,268(4):717-729

It is well known that the non-stationary wideband noise is the most difficult to be removed in speech enhancement. In this paper a novel speech enhancement algorithm based on the dyadic wavelet transform and the simplified Karhunen-Loeve transform (KLT) is proposed to suppress the non-stationary wideband noise. The noisy speech is decomposed into components by the wavelet space and KLT-based vector space, and the components are processed and reconstructed, respectively, by distinguishing between voiced speech and unvoiced speech. There are no requirements of noise whitening and SNR pre-calculating. In order to evaluate the performance of this algorithm in more detail, a three-dimensional spectral distortion measure is introduced. Experiments and comparison between different speech enhancement systems by means of the distortion measure show that the proposed method has no drawbacks existing in the previous methods and performs better shaping and suppressing of the non-stationary wideband noise for speech enhancement. 相似文献

17.

Joint modeling and maximum-likelihood estimation of pitch and linear prediction coefficient parameters.

D Burshtein 《The Journal of the Acoustical Society of America》1992,91(3):1531-1537

The well-known speech production model is considered, where the speech signal is modeled as the output of an all-pole filter driven either by some white noise sequence (unvoiced speech) or by the sum of a periodic excitation and a noise sequence (voiced speech). Approximate maximum-likelihood (ML) estimation algorithms for the unvoiced case are well known. The ML estimator of the parameters is obtained for the voiced speech model. These parameters consist of the parameters of the periodic excitation (pitch parameters) and the parameters of the filter [linear prediction coefficient (LPC) parameters]. The results of the application of the algorithm on simulated and on real speech data are presented. 相似文献

18.

Estimation of vocal dysperiodicities in disordered connected speech by means of distant-sample bidirectional linear predictive analysis

Bettens F Grenez F Schoentgen J 《The Journal of the Acoustical Society of America》2005,117(1):328-337

The article presents an analysis of vocal dysperiodicities in connected speech produced by dysphonic speakers. The processing is based on a comparison of the present speech fragment with future and past fragments. The size of the dysperiodicity estimate is zero for periodic speech signals. A feeble increase of the vocal dysperiodicity is guaranteed to produce a feeble increase of the estimate. No spurious noise boosting occurs owing to cycle insertion and omission errors, or phonetic segment boundary artifacts. Additional objectives of the study have been investigating whether deviations from periodicity are larger or more commonplace in connected speech than in sustained vowels, and whether sentences that comprise frequent voice onsets and offsets are noisier than sentences that comprise few. The corpora contain sustained vowels as well as grammatically- and phonetically matched sentences. An acoustic marker that correlates with the perceived degree of hoarseness summarizes the size of the dysperiodicities. The marker values for sustained vowels have been highly correlated with those for connected speech, and the marker values for sentences that comprise few voiced/unvoiced transients have been highly correlated with the marker values for sentences that comprise many. 相似文献