首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Transformer编码器的合成语声检测系统
引用本文:万伊,杨飞然,杨军.基于Transformer编码器的合成语声检测系统[J].应用声学,2023,42(1):26-33.
作者姓名:万伊  杨飞然  杨军
作者单位:中国科学院声学研究所噪声与振动重点实验室,中国科学院声学研究所噪声与振动重点实验室,中国科学院声学研究所噪声与振动重点实验室
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:自动说话人认证系统是一种常用的目标说话人身份认证方案,但它在合成语声的攻击下表现出脆弱性,合成语声检测系统试图解决这一问题。该文提出了一种基于Transformer编码器的合成语声检测方法,利用自注意力机制学习输入特征内部的长期依赖关系。合成语声检测问题并不关注句子的抽象语义特征,用参数量较小的模型也能得到较好的检测性能。该文分别测试了4种常用合成语声检测特征在Transformer编码器上的表现,在国际标准的ASVspoof2019挑战赛的逻辑攻击数据集上,基于线性频率倒谱系数特征和Transformer编码器的系统等错误率与串联检测代价函数分别为3.13%和0.0708,且模型参数量仅为0.082 M,在较小参数量下得到了较好的检测性能。

关 键 词:自动说话人认证  合成语声检测  Transformer编码器
收稿时间:2021/11/8 0:00:00
修稿时间:2022/12/22 0:00:00

Transformer encoder-based spoofing countermeasure for synthetic speech detection
WAN YI,YANG FEI RAN and YANG JUN.Transformer encoder-based spoofing countermeasure for synthetic speech detection[J].Applied Acoustics,2023,42(1):26-33.
Authors:WAN YI  YANG FEI RAN and YANG JUN
Institution:Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences,Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences,Key Laboratory of Noise and Vibration Research, Institute of Acoustics, Chinese Academy of Sciences
Abstract:The automatic speaker verification system is a commonly used solution for target speaker identity authentication, but it shows vulnerability under the attack of synthetic speech, which can be alleviated by a spoofing countermeasure system. In this paper, we introduce a synthetic speech detection method based on the Transformer encoder, which uses the self-attention mechanism to learn the long-term dependencies of the input features. Synthetic speech detection does not focus on the abstract semantic features of the sentences, and a model with small parameters can also performs well. This paper evaluated the performance of four commonly used synthetic speech detection features on Transformer encoders. On the evaluation set of the ASVspoof2019 challenge logical access scenario, the proposed system based on linear frequency cepstral coefficient features and Transformer encoder achieves an equal error rate (EER) of 3.13% and a tandem detection cost function (t-DCF) of 0.0708, respectively, and the parameters of the model is only 0.082M, a better detection performance is obtained with a smaller model.
Keywords:automatic speaker verification  synthetic speech detection  transformer encoder
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号