首页 | 本学科首页   官方微博 | 高级检索  
     检索      

一种基于聚类的门控卷积网络语声分离方法*
引用本文:罗宇,胡维平,吴华楠.一种基于聚类的门控卷积网络语声分离方法*[J].应用声学,2023,42(5):1099-1105.
作者姓名:罗宇  胡维平  吴华楠
作者单位:广西师范大学电子工程学院,广西师范大学电子工程学院,广西师范大学电子工程学院
基金项目:国家自然科学基金项目(面上项目,重点项目,重大项目)
摘    要:基于深度聚类的语音分离方法已被证明能有效地解决混合语音中说话人输出标签排列的问题,然而,现有关于聚类进行说话人分离方法,大多数是优化嵌入使每个源的重建误差最小化。本文以时域卷积网络(ConvTasNet)为基础网络,设计了一种改进基于聚类的门控卷积(Gate-conv Cluster)语音分离方法,在时域上通过堆叠的门控卷积网络,实现端到端深度聚类的源分离。该框架将非线性门控激活用于时域卷积网络中,提取语音信号的深层次特征;同时在高维特征空间中聚类对语音信号的特征进行表示和划分,为恢复不同信号源提供了一个长期的说话者表示信息。该框架解决了说话人输出标签排列问题并对语音信号的长期依赖性进行建模。通过华尔街日报数据集进行实验得出,该方法在SDRi(信源失真比)和Si-SNR(尺度不变信源噪声比)指标上分别达到了16.72 dB和16.33 dB的效果。

关 键 词:深度聚类  门控卷积  语音分离
收稿时间:2022/8/16 0:00:00
修稿时间:2023/8/29 0:00:00

Clustering-based speech separation method for gated convolutional networks
luoyu,huweiping and wuhuanan.Clustering-based speech separation method for gated convolutional networks[J].Applied Acoustics,2023,42(5):1099-1105.
Authors:luoyu  huweiping and wuhuanan
Institution:GXNU,GXNU,GXNU
Abstract:Deep clustering-based speech separation methods have been shown to be effective in solving the problem of speaker output label alignment in mixed speech, however, most of the existing methods on clustering for speaker separation optimize the embedding to minimize the reconstruction error of each source. In this paper, we design an improved Gate-convolutional Cluster (Gate-conv Cluster) speech separation method based on the time-domain convolutional network (ConvTasNet) as the base network. The framework uses nonlinear gated activation in time-domain convolutional networks to extract deep features of speech signals; and clustering in a high-dimensional feature space to represent and segment the features of speech signals, providing a long-term speaker representation information for recovering different sources. The framework solves the speaker output label alignment problem and models the long-term dependency of speech signals. Experiments with the Wall Street Journal dataset yield that the method achieves 16.72 dB and 16.33 dB in the SDRi (source distortion ratio) and Si-SNR (scale-invariant source noise ratio) metrics, respectively.
Keywords:Deep clustering  gated convolution  speech separation
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号