首页 | 本学科首页   官方微博 | 高级检索  
     

采用2D-Haar声学特征超向量的快速特定音频识别方法
引用本文:吕英, 罗森林, 高晓芳, 谢尔曼, 潘丽敏. 采用2D-Haar声学特征超向量的快速特定音频识别方法[J]. 声学学报, 2015, 40(5): 739-750. DOI: 10.15949/j.cnki.0371-0025.2015.05.015
作者姓名:吕英  罗森林  高晓芳  谢尔曼  潘丽敏
作者单位:1.北京理工大学 信息系统及安全对抗实验中心 北京 100081
基金项目:国家242计划(2005C48)和北京理工大学科技创新计划(2011CX01015)资助
摘    要:针对特定音频事件识别技术在大数据音频处理任务中的准确性和快速性问题,提出一种基于2D-Haar声学特征超向量和AdaBoost算法的快速特定音频事件泛化识别方法。首先将多个连续音频帧的常用声学特征构成“声学特征图”,进而提取维数高达数十万的Haar-like声学特征,然后使用AdaBoost.MH或速度较快的Random AdaBoost特征筛选算法,筛选出较高代表性的Haar-like声学特征模式组合,从而构成2D-Haar声学特征超向量;最后分析特定音频事件子类间的共性和差异性,提取子类别的共性,弱化子类间的差异,训练后得到一个泛化的音频事件模板,可支持多子类的泛化识别,能够准确检测并定位音频流中的特定音频事件。实验结果表明,使用2D-Haar声学特征超向量可以获得比MFCC,PLP,LPCC等常用声学特征约5%的识别精度提升、7~20倍的训练速度提升和5-10倍的识别速度提升,在网格法寻得最优参数配置下,可获得93.38%的准确率,95.03%的查全率,这为大数据量的特定音频事件识别提供了一种准确快速的处理方法。

收稿时间:2014-02-09
修稿时间:2014-05-14

A rapid audio event detection method by adopting 2D-Haar acoustic super feature vector
LÜ Ying, LUO Senlin, GAO Xiaofang, XIE Erman, PAN Limin. A rapid audio event detection method by adopting 2D-Haar acoustic super feature vector[J]. ACTA ACUSTICA, 2015, 40(5): 739-750. DOI: 10.15949/j.cnki.0371-0025.2015.05.015
Authors:   Ying,LUO Senlin,GAO Xiaofang,XIE Erman,PAN Limin
Affiliation:1.Information System and Security Countermeasures Experimental Center, Beijing Institute of Technology Beijing 100081
Abstract:Aiming at the problem of accuracy and rapidity of audio event detection in the mass-data audio processing tasks, a generic method of rapidly recognizing audio event based on 2D-Haar acoustic super feature vector and AdaBoost is proposed. Firstly, it combines certain number of continuous audio frames to be an "acoustic feature image", secondly, uses AdaBoost.MH or fast Random AdaBoost feature selection algorithm to select high representative 2D-Haar pattern combinations to construct super feature vectors; thirdly, analyzes the commonality and differences between subcategories, then extracts common features and reduces different features to obtain a generic audio event template, which can support the accurate identification of multiple sub-classes and detect and locate the specific audio event from the audio stream accurately. Experimental results show that the use of 2D-Haar acoustic feature super vector can make recognition accuracy 5% higher than ones that MFCC, PLP, LPCC and other traditional acoustic features yielded, and can make the training processing 7~20 times faster and the recognition processing 5~10 times faster, it can even achieve an average precision of 93.38%, an average recall of 95.03% under the optimal parameter configuration found by grid method. Above all, it can provide an accurate and fast mass-data processing method for audio event detection. 
Keywords:
点击此处可从《声学学报》浏览原始摘要信息
点击此处可从《声学学报》下载全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号