首页 | 官方网站   微博 | 高级检索  
     

融合声纹信息的能量谱图在鸟类识别中的研究
引用本文:杨春勇,祁宏达,彭焱秋,尹滨,侯金,舒振宇,陈少平.融合声纹信息的能量谱图在鸟类识别中的研究[J].应用声学,2020,39(3):453-463.
作者姓名:杨春勇  祁宏达  彭焱秋  尹滨  侯金  舒振宇  陈少平
作者单位:中南民族大学,中南民族大学,中南民族大学,中南民族大学,中南民族大学,中南民族大学,中南民族大学
摘    要:常用的梅尔倒谱系数结合高斯混合模型(MFCC+GMM)方法的鸟鸣声识别技术难适应噪声环境,模型难以收敛,且计算复杂度高。该文提出一种融合声纹信息的能量谱图的鸟类识别方法(VPS-BR),该方法利用鸟类鸣声在能量谱图上所表现的多维差异性,定量识别鸣声声纹特征。通过对分贝能量进行颜色映射得到能量谱图,提取其视觉特征所表达的声学特征,分析归纳得到鸟类特有鸣声模式。在特征提取步骤中,选用识别速度快的局部二值模式、识别鲁棒性高的方向梯度直方图两个参数表征鸟鸣声谱图的边缘声纹;在识别步骤中,用局部二值模式和方向梯度直方图两种特征分别与支持向量机、K最近邻和随机森林3种分类器算法进行两两组合构建识别模型测试。对15种原始带噪鸟类鸣声数据集进行交叉验证,VPS-BR模型的平均识别率比MFCC+GMM组合模型高出11.3%,方向梯度直方图特征与K最近邻分类器的组合模型识别率达90.5%,表现出较好的抗噪性能和识别性能。最后针对样本数据集缺乏问题,使用生成对抗网络进行图像增强,进一步将识别率提升1.48%。

关 键 词:鸟类识别  能量谱图  局部二值模式  方向梯度直方图  生成对抗网络
收稿时间:2019/7/12 0:00:00
修稿时间:2020/4/26 0:00:00

Research on the application of energy spectrum with voiceprint information in bird recognition
Yang Chunyong,Qi Hongd,Peng Yanqiu,Yin Bin,Hou Jin,Shu Zhenyu and Chen Shaoping.Research on the application of energy spectrum with voiceprint information in bird recognition[J].Applied Acoustics,2020,39(3):453-463.
Authors:Yang Chunyong  Qi Hongd  Peng Yanqiu  Yin Bin  Hou Jin  Shu Zhenyu and Chen Shaoping
Affiliation:South-central University For Nationalities,South-central University For Nationalities,South-central University For Nationalities,South-central University For Nationalities,South-central University For Nationalities,South-central University For Nationalities,South-central University For Nationalities
Abstract:The bird’s voice recognition technology combined with the Mel-frequency cepstral coefficients and the Gaussian mixture model(MFCC+GMM)method is difficult to adapt to the noise environment,and its computational complexity is high.In this paper,a novel bird recognition method using voice-power spectrum(VPS-BR)to express acoustic features is proposed.It utilizes the multi-dimensional difference of bird sounds on the power spectrum to quantitatively identify the texture features of the sound.In the feature extraction step,the edge texture of the bird’s voice-power spectrum is characterized by local binary pattern(LBP)and direction gradient histogram(HOG);in the identification step,the VPS-BR model is constructed by combining LBP and HOG with support vector machine,K nearest neighbor(KNN)and random forest.The cross-validation of 15 original noisy bird sound data sets from the Xeno-Canto website shows that the recognition rate of the VPS-BR model is better than the MFCC+GMM model;HOG and KNN combined model recognition rate can reach 90.5%,shows good noise-reception recognition performance.Finally,for the lack of sample data set,image enhancement is made by using generated-adversarial-network,and the recognition rate is further increased by 1.48%.
Keywords:Birds recognition  Power spectrogram  HOG  LBP  GAN
本文献已被 CNKI 维普 等数据库收录!
点击此处可从《应用声学》浏览原始摘要信息
点击此处可从《应用声学》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司    京ICP备09084417号-23

京公网安备 11010802026262号