期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

张志浩王坤侠《应用声学》2022,41(5):843-850

语声情感识别对人机交互和情感计算研究领域具有重要作用,各类研究方法层出不穷。近期研究学者应用卷积神经网络和长短期记忆网络方法提取对数Mel谱图空间特征和时间特征,取得了一定的成果。然而不论是卷积神经网络还是长短期记忆网络提取特征时,都会产生特征冗余,导致语声情感识别效果下降。针对这一问题,该文提出了一种基于时空注意力机制的卷积-递归神经网络模型,采用对数Mel谱图和其一阶差分、二阶差分作为特征输入,在使用卷积神经网络提取空间特征和长短期记忆网络提取时间特征时,加入空间注意力和时间注意力机制,从而使上述网络能够更好地提取到对数Mel谱图中有效表征情感的空间特征和时间特征。该模型在Emo-DB和IEMOCAP语声数据集上的加权准确率分别达到86.8%、69.4%,未加权准确率分别达到84.7%、65.5%,优于当前大多数先进方法。相似文献

2.

基于变分模态分解的语音情感识别方法*

下载免费PDF全文

王玮蔚张秀再《应用声学》2019,38(2):237-244

针对传统语音情感特征参数在进行情感分类时性能不佳的问题,该文提出了一种基于变分模态分解的语音情感识别方法。情感语音信号首先由变分模态分解提取固有模态函数,然后对所选主导固有模态函数进行重新聚合,再提取梅尔倒谱系数和各固有模态函数的希尔伯特边际谱。为了验证该文提出的特征性能,选用两种语音数据库(EMODB、RAVDESS)进行实验,按该文方法提取特征后使用极限学习机进行语音情感分类识别。实验结果表明:相比基于经验模态分解和集合经验模态分解的语音情感特征,该文提出的特征有更好的识别性能,验证了该方法的实用性。相似文献

3.

基于基音参数规整及统计分布模型距离的语音情感识别 总被引：17，自引：0，他引：17

王治平赵力邹采荣《声学学报》2006,31(1):28-34

提出一种根据基音提取的频率分辨率确定自适应窗口的改进Parzen窗方法估计基音概率密度,兼顾了基音统计分布模型在低频段的高分辨率和高频段的平滑;提出利用不同性别的基音分布规律的性别区分算法,对于长句可以达到98%的识别率;通过分析基音均值、方差、统计分布模型在性别上的差异,对基音参数进行基于性别差异的规整;引入规整后的基音均值和基音方差,以及基音统计分布模型距离作为情感特征参数;最后利用K最近邻方法对汉语情感语料进行识别。利用常规方法提取的参数最后得到的识别率为73.8%,而使用经过性别差异规整的基音参数和基音统计分布距离的识别率提高到81%。相似文献

4.

仿选择性注意机制的语音情感识别算法

梁瑞宇赵力陶华伟王青云邹采荣《声学学报》2016,41(4):537-544

有效特征的选取一直都是语音情感识别算法的关键。为此,针对语音情感特征选择与构建的问题,一种仿选择性注意机制的语音情感识别算法被提出。考虑到语音信号的时频特性,算法首先计算语音信号的语谱图;其次,模仿选择性注意机制,计算语谱图的颜色、方向和亮度特征图,归一化后形成特征矩阵;然后,将特征矩阵重排列并进行PCA降维,形成情感识别特征向量;最后,利用改进的支持向量机分类方法进行语音情感识别。对愤怒、恐惧、高兴、悲伤和惊奇5种情感的识别实验显示,基于选择性注意的方法能够获得较好的识别效果,平均识别率为85.44%。相比于韵律特征和音质特征,语音情感识别率至少提高10%;相比于其它语谱特征,识别率提高7%左右。相似文献

5.

层叠式“产生/判别”混合模型的语音情感识别

下载免费PDF全文

黄永明章国宝董飞李悦《声学学报》2013,38(2):231-240

提出了层叠式“产生/判别”混合模型的语音情感识别方法。首先,提取63维语句级特征,运用Fisher从中选择12个最佳的语句级特征,建立小波神经网络(WNN)的层叠式产生式模型进行语音情感识别;然后提取69维帧级特征,采用SFS选择出待使用的8维特征,将高斯混合模型(GMM)进行多维概率输出,建立层叠式“产生/判别”混合模型进行语音情感识别。实验结果显示:(1)层叠式“产生/判别”混合模型较单独WNN、GMM、HMM (隐马尔可夫模型)、SVM (支持向量机)的识别率要高;(2)层叠式“产生/判决式”混合模型识别率较基于WNN的层叠产生式模型高;(3) M=13,D维GMM-MAP/SVM (MAP,最大后验概率)串联融合模型为最优的层叠式“产生/判别”混合模型,能获得最高85.1%的识别率。相似文献

6.

基于深度学习的语声抑郁识别*

下载免费PDF全文

吴情胡维平陈丹丹肖婷《应用声学》2022,41(5):837-842

世界各地抑郁症患者数量不断增多,抑郁症的诊断和治疗面临着医生短缺问题,针对这一问题,提出了CNN和结合注意力机制的BLSTM特征融合模型。从特征选择和网络构架两方面进行了研究,对比了几种经典语声特征,得出梅尔倒谱系数对抑郁分类效果最好,再将梅尔倒谱系数分别送进CNN和结合注意力机制的BLSTM网络实现抑郁分类。在DAIC-WOZ数据集上进行实验,所提出的方法对语声抑郁的分类精确度达到78.06 %,F1分数达到74.68%。关键词：抑郁识别;语声分析;分类相似文献

7.

A neurocognitive model of recognition and pitch segregation

McLachlan N 《The Journal of the Acoustical Society of America》2011,130(5):2845-2854

This paper describes a neurocognitive model of pitch segregation in which it is proposed that recognition mechanisms initiate early in auditory processing pathways so that long-term memory templates may be employed to segregate and integrate auditory features. In this model neural representations of pitch height are primed by the location and pattern of excitation across auditory filter channels in relation to long-term memory templates for common stimuli. Since waveform driven pitch mechanisms may produce information at multiple frequencies for tonal stimuli, pitch priming was assumed to include competitive inhibition that would allow only one pitch estimation at any time. Consequently concurrent pitch information must be relayed to short-term memory via a parallel mechanism that employs pitch information contained in the long-term memory template of the chord. Pure tones, harmonic complexes and two pitch chords of harmonic complexes were correctly classified by the correlation of templates comprising auditory nerve excitation and off-frequency inhibition with the excitation patterns of stimuli. The model then replicated behavioral data for pitch matching of concurrent vowels. Comparison of model outputs to the behavioral data suggests that inability to recognize a stimulus was associated with poor pitch segregation due to the use of inappropriate pitch priming strategies. 相似文献

8.

基于隐马尔科夫模型的汉语韵律词基频模型 总被引：3，自引：1，他引：2

朱东来王仁华凌震华李威《声学学报》2002,27(6):523-528

提出了一种基于隐马尔科夫模型(HMM)的汉语韵律词的统计基频模型。模型能反映韵律环境和基频曲线参数之间的映射关系,从模型可以估计一段基频曲线和一段文本之间的相关度,也可以从文本生成相应的基频曲线。本方法使用HMM作为基木框架,具有HMM理论体系所能支配的各种优点。同时将韵律作为模型单元,使得模型能够反映韵律层次级的连续变调。最后给出了实验结果并对模型的应用前景进行了展望。相似文献

9.

基于改进卷积神经网络算法的语音识别

下载免费PDF全文

杨洋汪毓铎《应用声学》2018,37(6):940-946

为了解决传统卷积神经网络识别连续语音数据时识别性能较差的问题,提出一种改进的卷积神经网络算法。该方法引入Fisher准则以及L2正则化约束,在反向传播调整参数阶段,既保证参数误差的最小化,又确保分类以后的样本类间分布较分散,类内分布较集中,同时保证网络权值具有合适的数量级以有效缓解过拟合问题;采用一种更符合生物神经元激活特性的新型log激活函数进行卷积神经网络的优化,进一步提高语音识别的正确率。在语音识别库TIMIT以及THCHS30上的实验结果表明,相较于传统卷积神经网络算法,本文提出的改进算法能较好的提高语音识别率,且泛化能力更强。相似文献

10.

基于小波域多尺度统计建模的水下噪声的检测与识别 总被引：3，自引：0，他引：3

周越韩鹏杨杰《声学学报》2003,28(6):518-525

研究了基于小波域多尺度树结构的水下噪声信号统计概率模型的建模方法。实验分析表明,船舶辐射噪声的模型特征不仪与海洋环境噪声的有很大差异,而且在不同工况下自身也表现出了不同特点。根据海洋环境噪声和船舶辐射噪声在模型参数上的特征差异,提出了一种从海洋环境噪声中检测船舶辐射噪声的方法,实验证明了该方法的检测性能大大优于以前提出的几种检测方法。另外,为了更好地解决船舶辐射噪声的分类问题,在研究了基于隐马尔可夫统计模型的分类方法的基础上,还提出结合支撑向量机和隐马尔可夫模型的综合分类方法,实验分析也取得了较好的结果。相似文献

11.

注意力机制融合前端网络中间层的语声情感识别

下载免费PDF全文

朱应俊周文君朱川马建敏《应用声学》2023,42(5):1090-1098

为了使机器能够更好地理解人的情感并改善人机交互体验,可对语声特征及分类网络进行融合以提升情感识别性能。本文从网络融合的角度,把基于梅尔倒谱系数和逆梅尔倒谱系数的二维卷积神经网络和基于散射卷积网络系数的长短期记忆网络作为前端网络,提取前端网络的中间层作为话语级的特征表示,利用压缩-激励(SE)通道注意力机制对前端网络的中间层的权重进行调整并融合,然后由深度神经网络后端分类器输出情感分类结果。在汉语情感数据集中进行五折交叉验证的对比实验,实验结果表明,基于SE通道注意力机制的网络融合方式可以有效地利用不同前端网络在语声情感识别任务中的优势,提高语声情感识别的准确率。相似文献

12.

Speech wideband extension based on Gaussian mixture model

ZHANG Yong HU Ruimin 《声学学报：英文版》2009,28(4):362-377

To decrease the spectral distortion of highband envelope, the function of spectral distortion and mutual information between feature vector and highband envelope was studied, and an extended Gaussian Mixture Model （GMM） bandwidth extension algorithm was proposed based on the research. The feature parameters which have larger mutual information with highband envelope were selected to constitute the feature vector, and the GMM was adopted to compute the joint probability density of the feature vector and highband envelope. Then the highband envelope was estimated via the posterior probabilities computed from the model parameters estimated by Expectation-Maximization （EM） algorithm. The experimental results show that the spectral distortion is lower than the algorithm, such as the traditional algorithm based on GMM, by 0.3 dB and the number of frames with spectral distortion over 10 dB sharply reduced over 50%. 相似文献

13.

基于高斯混合模型的语音带宽扩展算法的研究 总被引：2，自引：0，他引：2

张勇胡瑞敏《声学学报》2009,34(5):471-480

为了降低高带谱失真,研究了带宽扩展算法中特征参数与高带谱包络的互信息和高带谱失真之间的函数关系,并在此基础上提出了一种扩展高斯混合模型带宽扩展算法。首先,算法选择与高带谱包络互信息大的参数构成特征矢量,并根据高斯混合模型计算特征矢量与高带谱包络的联合概率密度。其次,采用Expectation-Maximization(EM)算法估计高斯分量模型参数并计算后验概率。最后,通过后验概率估计高带谱包络。实验结果表明,与传统的高斯混合模型带宽扩展算法相比,本文算法可降低0.3 dB的高带平均谱失真,将谱失真大于10dB的语音帧减少了50%以上。相似文献

14.

Automatic modulation recognition based on a multiscale network with statistical features

《Physical Communication》2023

Automatic modulation recognition (AMR) can be used in dynamic spectrum access (DSA) techniques to reduce the pressure on spectrum resources. In this paper, we propose a multiscale convolution-based network model called MSNet-SF, which combines traditional statistical features and deep learning (DL) to balance recognition accuracy and complexity. In the model, the feature information is extracted by two multiscale modules, which consist of unit convolution and three different sizes of convolution kernels arranged in parallel. Additionally, the sparse connectivity of unit convolution enables the network to be more lightweight. Five statistical features (four higher-order cumulants (HOCs) and one zero-centered normalized instantaneous magnitude tightness) are also input into the model and are fully fused with the main feature map by multiplication to achieve complementarity of long-term and short-term features. This approach yields a large performance gain at a small cost and greatly reduces the confusion between QAM16 and QAM64. Simulation results in the RML2018.10A dataset show that the average recognition accuracy of the model improved by 4% after adding the statistical features and achieved an accuracy of more than 97% from 12 dB. 相似文献

15.

基于ARMA模型的汉语讲话者识别 总被引：3，自引：0，他引：3

林宝成陈永彬《声学学报》1998,23(3):229-234

实现了一个仅用鼻音声母且与文本无关的汉语讲话者识别系统,根据讲话者在讲话时鼻腔相对固定、发鼻音时咽腔稳定,以及汉语鼻音声母(只有m-和n-两种)少(全部音节分别只有53和48个)的特点,使用极零(ARMA)模型获得所有汉语鼻声母音节的极点和零点系数的谱参数。系统在对20个讲话者识别时,其性能为:各个人所有单个声母测试时,总正识率为87.92%;分别随机地选用各人的人3、4、5个声母平均后测试时,则平均正识率可达91.67%、95.00%、96.67%、99.97%。相似文献

16.

全局特征及弱尺度融合策略的小样本语音情感识别

下载免费PDF全文

黄永明章国宝李雄达飞鹏《声学学报》2012,37(3):330-338

语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。相似文献

17.

语音情感识别中的特征选择方法

下载免费PDF全文

褚钰李田港叶硕叶光明《应用声学》2020,39(2):223-230

为了解决传统卷积神经网络在识别中文语音时预测错误率较高、泛化性能弱的问题,首先以深度卷积神经网络(DCNN)-连接时序分类(CTC)为研究对象,深入分析了不同卷积层、池化层以及全连接层的组合对其性能的影响;其次,在上述模型的基础上,提出了多路卷积神经网络(MCNN)-连接时序分类(CTC),并联合SENet提出了深度SE-MCNN-CTC声学模型,该模型融合了MCNN与SENet的优势,既能加强卷积神经网络的深层信息的传递、避免梯度问题,又可以对提取的特征图进行自适应重标定。最终实验结果表明:SE-MCNN-CTC相较于DCNN-CTC错误率相对降低13.51%,模型最终的错误率达22.21%;算法改进后的声学模型可以有效地提升泛化性能。相似文献

18.

一种改进的DNN-HMM的语音识别方法* 总被引：1，自引：1，他引：1

下载免费PDF全文

李云红梁思程贾凯莉张秋铭宋鹏何琛王刚毅李禹萱《应用声学》2019,38(3):371-377

针对深度神经网络与隐马尔可夫模型(DNN-HMM)结合的声学模型在语音识别过程中建模能力有限等问题,提出了一种改进的DNN-HMM模型语音识别算法。首先根据深度置信网络(DBN)结合深度玻尔兹曼机(DBM),建立深度神经网络声学模型,然后提取梅尔频率倒谱系数(MFCC)和对数域的Mel滤波器组系数(Fbank)作为声学特征参数,通过TIMIT语音数据集进行实验。实验结果表明:结合了DBM的DNN-HMM模型相比DNN-HMM模型更具优势,其中,使用MFCC声学特征在词错误率与句错误率方面分别下降了1.26%和0.20%。此外,使用默认滤波器组的Fbank特征在词错误率与句错误率方面分别下降了0.48%和0.82%,并且适量增加滤波器组可以降低错误率。总之,研究取得句错误率与词错误率分别降低到21.06%和3.12%的好成绩。相似文献

19.

Spontaneous speech recognition using a statistical coarticulatory model for the vocal-tract-resonance dynamics

Deng L Ma J 《The Journal of the Acoustical Society of America》2000,108(6):3036-3048

A statistical coarticulatory model is presented for spontaneous speech recognition, where knowledge of the dynamic, target-directed behavior in the vocal tract resonance is incorporated into the model design, training, and in likelihood computation. The principal advantage of the new model over the conventional HMM is the use of a compact, internal structure that parsimoniously represents long-span context dependence in the observable domain of speech acoustics without using additional, context-dependent model parameters. The new model is formulated mathematically as a constrained, nonstationary, and nonlinear dynamic system, for which a version of the generalized EM algorithm is developed and implemented for automatically learning the compact set of model parameters. A series of experiments for speech recognition and model synthesis using spontaneous speech data from the Switchboard corpus are reported. The promise of the new model is demonstrated by showing its consistently superior performance over a state-of-the-art benchmark HMM system under controlled experimental conditions. Experiments on model synthesis and analysis shed insight into the mechanism underlying such superiority in terms of the target-directed behavior and of the long-span context-dependence property, both inherent in the designed structure of the new dynamic model of speech. 相似文献

20.

Pattern recognition and statistical mechanics

John M. Richardson 《Journal of statistical physics》1969,1(1):71-88

A mathematical connection is established between classes of problems in pattern recognition and in statistical mechanics. More explicitly, the former class embraces problems arising from the decision-theoretic approach to the automatic recognition of certain properties of patterns containing many targets. The latter class contains almost all problems involving the statistical mechanics of classical systems of interacting particles. The usefulness of the mathematical connection lies in the fact that it provides a bridge for the transfer of approximation methodologies from one area to the other. As examples of such a transfer this paper presents applications of a least mean square approximation method, which is well known in pattern recognition, to two problems in classical statistical mechanics, namely, the one-dimensional Ising problem and the one-component plasma problem. These problems were chosen because their solutions are well understood (the exact solution of the one-dimensional Ising model and the solution of the one-component plasma that is exact in the low concentration limit are both very well known) and consequently they are appropriate as test beds for the new approximation method. The simplest nontrivial approximate trial functions were used for the calculation of the average values of certain observables and the results were in agreement with the corresponding exact results for the Ising model in the limit of high temperature and for the one-component plasma in the limit of low concentration. 相似文献