首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 125 毫秒
1.
张少康  田德艳 《应用声学》2019,38(2):267-272
传统水下声目标识别分类方法具有较强的人机交互特性,无法满足未来水下无人平台智能识别分类水声目标的需求。针对这一问题,提出了一种基于梅尔倒谱系数(MFCC)的水下声目标智能识别分类方法,该方法通过提取水下声目标梅尔倒谱系数特征,采用长短时记忆网络(LSTM)构建了智能识别分类模型。使用实际水声信号对该方法进行了验证,结果表明,基于梅尔倒谱系数的水下声目标智能识别分类方法能够在不依赖人工提取特征的情况下,对目标噪声进行识别分类,具备智能化识别分类能力。  相似文献   

2.
室内两步法监督式学习双耳声源距离估计   总被引:1,自引:0,他引:1       下载免费PDF全文
提出一种室内环境下两步法监督式学习双耳声源距离估计算法,该算法通过预先估计声源方位角信息以克服声源方位角的变化对声源距离估计性能的不利影响.该算法第1步利用深度神经网络模型估计声源的方位角,并将不同方位角的双耳信号分类;第2步中对每个方位角的双耳信号采用独立的深度神经网络模型进行声源距离估计,其中距离特征选用双耳信号的一些双耳特征和统计特性。在仿真和实际环境下,本文提出的两步法声源距离估计算法的距离估计准确率比现有算法提高了3%~5%左右,并且在各种不匹配环境下的距离估计准确率比现有算法高出5%~10%左右。实验结果表明利用声源方位角信息可以有效提高双耳声源距离估计算法的性能。   相似文献   

3.
赵乾坤  刘峰  梁秀兵  汪涛  宋永强 《应用声学》2023,42(5):1033-1041
水声目标被动识别是水声信号处理领域的研究热点之一。海洋环境中存在的不规则噪声干扰,使得基于传统方法的水声目标被动识别技术在实际的应用场景中效果不佳。本文采用一种基于时延网络(Time Delay Neural Network,TDNN)模型的舰船辐射噪声目标识别方法,该方法利用目标的短时平稳特性和长时关联特性对目标的声纹特征进行建模,使用梅尔谱图提取目标信号的初级特征,再通过融合注意力机制和时延神经网络的深度学习模型实现高级特性提取,最后再利用余弦相似度实现不同目标的类别划分。该方法在ShipsEar数据集和自行采集的数据进行测试验证,目标识别准确率分别达到79.2%和73.9%,可证明本文方法的有效性。  相似文献   

4.
水声目标识别一直是水声领域研究的重点问题之一,深度学习方法可以有效地解决目标识别问题,然而,水声样本的稀少限制了该方法的应用。该文 提出一种基于数据增强的水声信号深度学习目标识别方法,该方法以Mel功率谱作为网络的输入特征,通过对原始信号在时域和时频域的拉伸和掩蔽等变换,实现数据扩展和增加泛化性能的目的,最后,利用改进的VGG网络模型实现目标分类。实验结果表明,该文方法得到的水下目标识别准确率(95.2%) 要优于其他4种对比方法,证明了该文提出的网络模型和数据增强方法均有助于提高目标分类性能。  相似文献   

5.
郭洋  周翊  管鲁阳  鲍明 《应用声学》2019,38(1):8-15
针对直升机探测中目标运动过程连续识别的鲁棒性问题,提出了一种基于复合深度神经网络的直升机声学特征提取和识别框架。复合深度神经网络由卷积神经网络和长短时记忆神经网络以并行结构组合,进行直升机声学特征的优化,完成直升机类型识别。针对直升机声信号特性,对卷积神经网络进行了改进,使得该复合深度神经网络在信号短时谱基础上优化声信号特征表征并提取前后帧之间的相关信息,弥补通常声目标识别方法不能充分利用目标信号时间历程信息的缺陷。真实外场实验数据测试结果显示:相较于传统识别方法,该算法显著提升了直升机进入有效探测范围后连续识别的鲁棒性和目标识别正确率。  相似文献   

6.
基于稀疏表示和特征加权的离格双耳声源定位*   总被引:1,自引:1,他引:0       下载免费PDF全文
基于头相关传递函数数据库的传统双耳声源定位方法的定位角度往往被限定在头相关传递函数数据库的离散测量点上。当头相关传递函数数据库的测量方位角间隔较大时,这类算法的性能会显著下降,这就是典型的离格问题。该文提出了基于加权宽带稀疏贝叶斯学习的离格双耳声源定位算法。首先该算法建立离格双耳信号的稀疏表示模型,然后利用双耳相干与扩散能量比特征对各个频点进行加权以降低噪声和混响的影响,最后通过加权宽带稀疏贝叶斯学习方法估计离格声源的方位角。实验结果表明,该算法在各种复杂的声学环境下都有着较高的定位精度和鲁棒性,特别是提高了离格条件下的声源定位性能。  相似文献   

7.
为解决水声目标小样本模式识别问题,有效地提高复杂海洋环境中的识别精度,提出了一种基于经验模式分解(EMD)、特征距离评估技术(FDET)和组合支持向量机(CSVMs)的水声目标智能识别方法。首先,将滤波、Hilbert包络解调和EMD等信号处理方法对水声目标的辐射噪声信号进行预处理,提取7个包含原始信号和预处理信号的时域和频域统计特征的特征集。然后,通过FDET从原始特征集中选择出7个敏感特征集。最后,将7个敏感特征集输入到7个支持向量机分类器中,利用遗传算法对7个分类器的结果进行合并,构成CSVMs分类器,从而实现对水声目标的智能识别。将该方法应用于舰船等水声目标的识别中,研究结果表明,该方法的识别性能优于单一SVMs分类器:同时,经过FDET得到的敏感特征集能明显地提高识别精度。  相似文献   

8.
特征提取是水下无源声呐目标分类识别的关键步骤,提出了一种基于听觉Patterson-Holdsworth耳蜗模型的听觉域张量特征提取方法。将耳蜗模型的滤波器冲激响应视为信号分解的基函数,根据听觉模型非线性尺度或常规线性尺度确定不同通道的中心频率,然后计算出相应通道的增益和带宽,并量化冲激响应的阶数和相位参数,得到信号分解基,再根据信号分解原理得到通道数×阶数×相位数的三阶张量特征,并通过计算测试样本张量特征与训练样本张量特征间的相似性实现了水下无源声呐目标的分类识别。海上实录无源声呐目标的分类识别实验表明,提取的张量特征具有比较好的分类识别性能,听觉模型等效矩形带宽尺度优于线性尺度划分中心频率,能够提高无源声呐的目标指示能力。   相似文献   

9.
谢菠荪  刘路路  江建亮 《声学学报》2021,46(6):1223-1233
双耳重放的目标之一是在耳机重放中产生不同方向和距离的虚拟源感知。本文研究了动态双耳Ambisonics重放自由场虚拟源方向和距离信息的简化信号处理方法。该信号处理方法包括两步:第1步是基于目标声场的球谐函数分解,合成采用扬声器的近场Ambisonics重放中逐级重构目标声场的信号;第2步是采用虚拟扬声器重放的方法,用动态头相关函数滤波处理将Ambisonics的扬声器重放信号转换为双耳重放信号并用耳机重放。进一步研究了动态双耳Ambisonics的阶数对定位效果的影响,为简化信号处理提供依据。对重放产生的双耳声压分析表明,5阶动态双耳Ambisonics重放足以提供听觉方向定位和距离感知的重要信息。同时心理声学的实验结果表明,结合声源距离相关的响度因素,5阶动态双耳Ambisonics重放可产生不同方向和1.0 m以下不同近场距离的自由场虚拟源的听觉感知。本文的方法仅需要固定距离的48个均匀空间方向的远场非个性化HRTF处理,实现了信号处理的简化。   相似文献   

10.
提出了基于帧特征、段特征联合建模的语音识别模型。该模型采用描述谱参数轨迹的段特征,在段尺度上实现了对语音信号帧间相关性的显式建模;采用段特征依赖的非平稳时间序列产生模型,实现了段特征与帧特征间的相关性建模,并在帧尺度上通过参数化的均值轨迹函数,实现了对语音信号帧间相关性的隐式建模。本文给出了基于帧特征、段特征联合统计距离优化的分段算法以及内嵌EM迭代的模型参数估计算法。对非特定人汉语孤立韵母以及多话者汉语基本音节的识别实验表明,该模型的识别性能优于标准HMM及趋势HMM。  相似文献   

11.
适当均衡耳机到鼓膜的传递函数可有效提高耳机声重放效果。耳廓与耳道滤波效应引起的幅度峰谷有助于人耳听觉感知,以平直幅频响应为目标的幅度均衡无法保持适当的峰谷。该文提出了基于roex滤波器与Mel频率倒谱系数的耳机到鼓膜的传递函数平滑方法,用于模拟人耳听觉感知特性和平滑耳机到鼓膜的传递函数,使均衡后的幅频响应保持相应的峰谷,避免了幅度峰谷过渡均衡。实验结果表明,进行耳机到鼓膜的传递函数平滑的幅度均衡对提高耳机的音色有显著作用,基于Mel频率倒谱系数平滑的幅度均衡对提高耳机的音色最为显著。  相似文献   

12.
There are many approaches to achieving high-performance speech enhancement. The modeling of the human auditory system is a good approach, since human beings can focus on target speech under concurrent speech conditions. One example of the binaural models is the time domain binaural model. However, this model has a high-calculation cost because the algorithm is based on auto-correlation, which is computationally intensive. Another example is the frequency domain binaural model proposed by Nakashima et al. [Nakashima H, Chisaki Y, Usagawa T, Ebata M. Frequency domain binaural model based on interaural phase and level differences. Acoust Sci Technol 2003;24(4):172-8]. Since the frequency domain binaural model uses the fast fourier transform, the calculation cost is much lower than that of the time domain binaural model. Therefore, it is not difficult to perform real-time processing using recent hardware such as digital signal processors and even laptop personal computers. However the quality of the segregated sound obtained using the frequency domain binaural model depends on system parameters such as frequency resolution and frame shift length for overlap adding in time domain. This paper introduces the construction of a prototype of a hearing assistant system based on the frequency domain binaural model. The detailed implementation techniques and parameter tuning are mentioned. The proposed system runs in real-time after parameter tuning. The directional attenuation levels, that is, the directivity patterns of the proposed system is measured. Finally, it is shown that the prototype can extract sounds coming from specific directions in real-time.  相似文献   

13.
This article presents a quantitative binaural signal detection model which extends the monaural model described by Dau et al. [J. Acoust. Soc. Am. 99, 3615-3622 (1996)]. The model is divided into three stages. The first stage comprises peripheral preprocessing in the right and left monaural channels. The second stage is a binaural processor which produces a time-dependent internal representation of the binaurally presented stimuli. This stage is based on the Jeffress delay line extended with tapped attenuator lines. Through this extension, the internal representation codes both interaural time and intensity differences. In contrast to most present-day models, which are based on excitatory-excitatory interaction, the binaural interaction in the present model is based on contralateral inhibition of ipsilateral signals. The last stage, a central processor, extracts a decision variable that can be used to detect the presence of a signal in a detection task, but could also derive information about the position and the compactness of a sound source. In two accompanying articles, the model predictions are compared with data obtained with human observers in a great variety of experimental conditions.  相似文献   

14.
A scheme for analyzing the timbre in spatial sound with binaural auditory model is proposed and the Ambisonics is taken as an example for analysis.Ambisonics is a spatial sound system based on physical sound field reconstruction.The errors and timbre colorations in the final reconstructed sound field depend on the spatial aliasing errors on both the recording and reproducing stages of Ambisonics.The binaural loudness level spectra in Ambisonics reconstruction is calculated by using Moore's revised loudness model and then compared with the result of real sound source,so as to evaluate the timbre coloration in Ambisonics quantitatively.The results indicate that,in the case of ideal independent signals,the high-frequency limit and radius of region without perceived timbre coloration increase with the order of Ambisonics.On the other hand,in the case of recording by microphone array,once the high-frequency limit of microphone array exceeds that of sound field reconstruction,array recording influences little on the binaural loudness level spectra and thus timbre in final reconstruction up to the highfrequency limit of reproduction.Based on the binaural auditory model analysis,a scheme for optimizing design of Ambisonics recording and reproduction is also suggested.The subjective experiment yields consistent results with those of binaural model,thus verifies the effectiveness of the model analysis.  相似文献   

15.
This and two accompanying articles [Breebaart et al., J. Acoust. Soc. Am. 110, 1074-1088 (2001); 110, 1105-1117 (2001)] describe a computational model for the signal processing in the binaural auditory system. The model consists of several stages of monaural and binaural preprocessing combined with an optimal detector. In the present article the model is tested and validated by comparing its predictions with experimental data for binaural discrimination and masking conditions as a function of the spectral parameters of both masker and signal. For this purpose, the model is used as an artificial observer in a three-interval, forced-choice adaptive procedure. All model parameters were kept constant for all simulations described in this and the subsequent article. The effects of the following experimental parameters were investigated: center frequency of both masker and target, bandwidth of masker and target, the interaural phase relations of masker and target, and the level of the masker. Several phenomena that occur in binaural listening conditions can be accounted for. These include the wider effective binaural critical bandwidth observed in band-widening NoS(pi) conditions, the different masker-level dependence of binaural detection thresholds for narrow- and for wide-band maskers, the unification of IID and ITD sensitivity with binaural detection data, and the dependence of binaural thresholds on frequency.  相似文献   

16.
刘阳  谢菠荪 《声学学报》2015,40(5):717-729
提出用双耳听觉模型对空间声音色进行分析的普遍方法,并以Ambisonics为例进行了分析。Ambisonics是基于物理声场重构的空间声系统,其最终重构声场误差以及音色改变是由传声器捡拾和重放空间混叠误差共同引起的。采用修正的Moore双耳响度模型计算了Ambisonics重构声场的双耳响度级谱并和目标声场的情况比较,从而定量评价重构声场的音色改变。结果表明,在理想捡拾信号的情况下,无音色改变重放的上限频率和区域大小随Ambisonics的阶数而增加。而对于传声器阵列捡拾的情况,只要阵列的上限频率大于Ambisonics重放的上限频率,在重放的上限频率以下,传声器阵列空间混叠误差对最终重构声场及其感知音色的影响就可以忽略。在此基础上,提出了一种综合考虑捡拾与重放性能的Ambisonics系统优化设计方法。心理声学实验得到了和双耳听觉模型一致的结果,从而也验证了模型分析的有效性。   相似文献   

17.
Although the speech transmission index (STI) is a well-accepted and standardized method for objective prediction of speech intelligibility in a wide range of environments and applications, it is essentially a monaural model. Advantages of binaural hearing in speech intelligibility are disregarded. In specific conditions, this leads to considerable mismatches between subjective intelligibility and the STI. A binaural version of the STI was developed based on interaural cross correlograms, which shows a considerably improved correspondence with subjective intelligibility in dichotic listening conditions. The new binaural STI is designed to be a relatively simple model, which adds only few parameters to the original standardized STI and changes none of the existing model parameters. For monaural conditions, the outcome is identical to the standardized STI. The new model was validated on a set of 39 dichotic listening conditions, featuring anechoic, classroom, listening room, and strongly echoic environments. For these 39 conditions, speech intelligibility [consonant-vowel-consonant (CVC) word score] and binaural STI were measured. On the basis of these conditions, the relation between binaural STI and CVC word scores closely matches the STI reference curve (standardized relation between STI and CVC word score) for monaural listening. A better-ear STI appears to perform quite well in relation to the binaural STI model; the monaural STI performs poorly in these cases.  相似文献   

18.
A new methodology of voice conversion in cepstrum eigenspace based on structured Gaussian mixture model is proposed for non-parallel corpora without joint training.For each speaker,the cepstrum features of speech are extracted,and mapped to the eigenspace which is formed by eigenvectors of its scatter matrix,thereby the Structured Gaussian Mixture Model in the EigenSpace(SGMM-ES)is trained.The source and target speaker's SGMM-ES are matched based on Acoustic Universal Structure(AUS)principle to achieve spectrum transform function.Experimental results show the speaker identification rate of conversion speech achieves95.25%,and the value of average cepstrum distortion is 1.25 which is 0.8%and 7.3%higher than the performance of SGMM method respectively.ABX and MOS evaluations indicate the conversion performance is quite close to the traditional method under the parallel corpora condition.The results show the eigenspace based structured Gaussian mixture model for voice conversion under the non-parallel corpora is effective.  相似文献   

19.
It is shown that a simple cross-correlation model is not adequate to explain both binaural masking level difference (MLD) and spatial selective attention. The reason is that for a low-intensity signal in NoS(pi) condition the maximal activity in the binaural analyzer as a function of interaural delay in single spectral channel is independent of signal intensity. On the other hand, if detection ability is associated with the isolation of tonically firing units, MLD is simply explained as the increase in firing synchronization as a function of the signal's interaural phase difference (IPD). Quantitatively results are presented based on numerical solutions of the model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号