共查询到19条相似文献,搜索用时 46 毫秒
1.
情感语声合成技术对于人机交互具有重要的意义。面对儿童情感语声合成所需汉语语声数据资源缺乏以及模型训练时长较长等问题,该文提出利用迁移学习实现汉语儿童情感语声合成的方法。首先基于汉语语声数据库训练深度学习模型实现中文语声端到端合成模型,再使用高质量大样本的中文情感语料库完成情感语声合成模型,最后利用自行采样的小样本汉语儿童情感语料对模型进行迁移学习实现低资源的语声合成。客观实验结果中梅尔倒谱失真指标为4.91,主观听辨实验指标分别为3.61和4.17。通过实验对比表明,该文的方法在情感语声合成技术的应用上具有良好的性能表现,并且优于现有先进的低资源情感语声合成方法。 相似文献
2.
自动说话人认证系统是一种常用的目标说话人身份认证方案,但它在合成语声的攻击下表现出脆弱性,合成语声检测系统试图解决这一问题。该文提出了一种基于Transformer编码器的合成语声检测方法,利用自注意力机制学习输入特征内部的长期依赖关系。合成语声检测问题并不关注句子的抽象语义特征,用参数量较小的模型也能得到较好的检测性能。该文分别测试了4种常用合成语声检测特征在Transformer编码器上的表现,在国际标准的ASVspoof2019挑战赛的逻辑攻击数据集上,基于线性频率倒谱系数特征和Transformer编码器的系统等错误率与串联检测代价函数分别为3.13%和0.0708,且模型参数量仅为0.082 M,在较小参数量下得到了较好的检测性能。 相似文献
3.
世界各地抑郁症患者数量不断增多,抑郁症的诊断和治疗面临着医生短缺问题,针对这一问题,提出了CNN和结合注意力机制的BLSTM特征融合模型。从特征选择和网络构架两方面进行了研究,对比了几种经典语声特征,得出梅尔倒谱系数对抑郁分类效果最好,再将梅尔倒谱系数分别送进CNN和结合注意力机制的BLSTM网络实现抑郁分类。在DAIC-WOZ数据集上进行实验,所提出的方法对语声抑郁的分类精确度达到78.06 %,F1分数达到74.68%。
关键词:抑郁识别;语声分析;分类 相似文献
4.
5.
具有自注意机制的Transformer网络在语声识别研究领域渐渐得到广泛关注.该文围绕着将位置信息嵌入与语声特征相结合的方向,研究更加适合普通话语声识别模型的位置编码方法.实验结果得出,采用卷积编码的输入表示代替正弦位置编码,可以更好地融合语声特征上下文联系和相对位置信息,获得较好的识别效果.训练的语声识别系统是在Tr... 相似文献
6.
话音网关主要完成IP通信系统与系统外部模拟线路的互联,是指挥所用于对上对下通信联络的重要设备,对部队指挥顺畅至关重要。当前排除故障的主要手段是待设备发出警报并停止工作之后才知晓并处理。针对此情况提出一种设备故障前置预测法,即对设备主要部件输出信号实际进行监控并与设备理论输出值进行比较,进一步研究发现由于理论输出值一般是一个范围,会影响判定精度,因此改为利用灰色预测模型的预测值与实际信号进行比较,如两个信号的差值较大则认为部件有隐患。从实际仿真来看,使用灰色预测模型的方法要好于第一种方法。 相似文献
7.
在当今“信息社会”里,用来传递信息的现代通信成为国民经济不可缺少的部分.语声在通信中占有重要地位,信息社会对语声研究提出了更高的要求.语声还能在通信以外的广阔领域发挥作用.我国的语声研究任重而道远. 相似文献
8.
均方误差函数是深度学习单通道语声增强算法最常用的一种代价函数。然而,均方误差值的大小与语声质量好坏并非完全相关。为了提高算法性能,该文在深度神经网络训练中引入了两类与人耳听觉相关的代价函数。第一类是加权欧氏距离代价函数,考虑了人耳听觉掩蔽效应;第二类是Itakura-Satio代价函数、COSH代价函数和加权似然比代价函数,强调语声谱峰的重要性,侧重于恢复干净语声谱峰信息。基于长短期记忆网络结构分析比较了两类代价函数在深度学习单通道语声增强算法中的性能,并与均方误差代价函数进行对比。实验结果表明,基于加权欧式距离代价函数的深度神经网络单通道语声增强算法能够获得更好的语声质量和更低的噪声残留。 相似文献
9.
目前,语言声学技术在西方已经成为声学中最活跃、最受到关注的研究和发展领域之一。它的应用日趋广泛,从信息、工业自动化、交通运输,到日常生活用品,乃至于帮助残疾人,无所不在.在法国,据统计,仅语言合成和识别的产品市场,1985年就达到了1亿6千多万法郎(人民币1元目前约合2法郎).到1990年预计将接近10亿法郎. 相似文献
10.
话音活动检测是语音交互和通信系统的重要部分,其作用是区分输入信号中的语音段和背景噪声段,检测的依据主要是语音和噪声的各种时频特性,其中,浊语音的周期性和谐波特性是一种广泛应用的特征。但是在行驶的汽车环境中,由于噪声非平稳且信噪比较低,这类特征较难得到可靠的检测。为此,本文根据浊音谐波结构的基本规律,利用时变噪声环境中各频带信噪比不同的特点,提出一种较为鲁棒的谐波快速检测算法。算法以较小的时频块为分析单元,利用一组基频在对数尺度上变化的谐波模板,自适应地搜索谐波结构清晰的部分,并以此检测浊语音信号。实验证明,该算法能够在行驶的汽车环境中达到较可靠的话音/非话音区别效果。 相似文献
11.
Prader-Willi syndrome (PWS) is a multisystem disorder caused by DNA abnormalities involving chromosome 15. Major characteristics are infant hypotonia, hypogonadism, mental retardation, a short stature, atypical facial appearance, and the onset of obesity due to insatiable hunger in early childhood. Also, speech and language abnormalities have been reported including voice disorders. These have seldom been studied in detail, however. This paper reports the results of an acoustic and aerodynamic investigation of the voice in 22 individuals with PWS. Two age groups were distinguished, a group of children [chronological age (CA) 6 years, 7 months through 11 years, 7 months; total intelligence quotient (TIQ) 40-88] and a group of adolescents and adults (CA 17 years, 1 month through 29 years, 5 months; TIQ 41-94). Both aerodynamic and acoustic parameters were obtained and compared with normative data from the Belgian Study Group on Voice Disorders. It was found that voice difficulties do commonly occur in individuals with PWS including impairment of frequency levels, voice quality, and poor aerodynamic capabilities. 相似文献
12.
13.
Jeannette D. Hoit 《Journal of voice》1995,9(4):341-347
This paper examines how breathing differs in the upright and supine body positions. Passive and active forces and associated chest wall motions are described for resting tidal breathing and speech breathing performed in the two positions. Clinical implications are offered regarding evaluation and treatment of breathing behavior in clients with speech and voice disorders. 相似文献
14.
15.
Margaret Denny 《Journal of voice》2000,14(1):34-46
Variability in inspired lung volume prior to speech is only partially accounted for by speech-related concerns such as the length and loudness of the planned utterance. Control mechanisms known to influence volume variability in non-speech breathing could potentially account for some of this variability, but only if they operate during speech as well. This investigation was designed to test for the presence of several such mechanisms during reading aloud. Lung volumes were recorded from 5 normal females as they read silently, then aloud. Inspired volumes were correlated with the volumes of the previous and following expirations and with inspiratory duration. Coefficients of variation were calculated for inspiratory volume, duration, and mean flow. Time-series analyses were used to compare periodicity in inspired volume for quiet and speech breathing. Control mechanisms operating during both quiet breathing and reading aloud included slow oscillations in inspired volume and minimized variability in mean flow. Inspired volume prior to speech was weakly but significantly correlated with preceding and following expired volume. It is concluded that some control strategies typical of quiet breathing contribute to volume variability in speech breathing. 相似文献
16.
17.
18.
《Journal of voice》2014,28(4):440-448
ObjectiveTo correlate change in Voice Handicap Index (VHI)-10 scores with corresponding voice laboratory measures across five voice disorders.Study DesignRetrospective study.MethodsOne hundred fifty patients aged >18 years with primary diagnosis of vocal fold lesions, primary muscle tension dysphonia-1, atrophy, unilateral vocal fold paralysis (UVFP), and scar. For each group, participants with the largest change in VHI-10 between two periods (TA and TB) were selected. The dates of the VHI-10 values were linked to corresponding acoustic/aerodynamic and audio-perceptual measures. Change in voice laboratory values were analyzed for correlation with each other and with VHI-10.ResultsVHI-10 scores were greater for patients with UVFP than other disorders. The only disorder-specific correlation between voice laboratory measure and VHI-10 was average phonatory airflow in speech for patients with UVFP. Average airflow in repeated phonemes was strongly correlated with average airflow in speech (r = 0.75). Acoustic measures did not significantly change between time points.ConclusionsThe lack of correlations between the VHI-10 change scores and voice laboratory measures may be due to differing constructs of each measure; namely, handicap versus physiological function. Presuming corroboration between these measures may be faulty. Average airflow in speech may be the most ecologically valid measure for patients with UVFP. Although aerodynamic measures changed between the time points, acoustic measures did not. Correlations to VHI-10 and change between time points may be found with other acoustic measures. 相似文献