首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 78 毫秒
1.
为解决背景音及噪音等条件下音频检索识别率低的问题,提出静音掩蔽和频域分段的音频指纹检索算法。首先采用端点检测技术进行语音预处理,将有效语音帧重新组合并利用相邻子带能量差对其提取指纹特征,可有效解决静音帧指纹特征不鲁棒的问题。然后在检索匹配时根据不同音频信号在频域范围内的分布特点,对音频指纹在不同频率区间进行分段和加权,以更精确地计算模板和待检音频之间的相似度。实验表明,与Philips基线算法相比,所提算法在检索速度上提升了一倍,在受背景音等干扰的数据集上,平均准确率与召回率分别绝对提升17.94%和4.66%;与最新Philips算法相比,平均准确率与召回率分别绝对提升13.68%和2.45%。  相似文献   

2.
基于分段的实时声频检索方法   总被引:1,自引:0,他引:1  
提出了基于分段的实时声频检索方法,并讨论了在实时检索中的控制策略。该方法将检索目标划分为片段序列,并使用检索窗控制参与检索的片段。在多目标检索中,利用声频的类别信息加快检索速度。实验证明检索方法的速度快、可控性好、实时性强,具有良好的缺失鲁棒性(Robustness),查全率和查准率分别达到100%和99.7%;将声频分类可有效提高多目标检索的速度,声频分类方法的平均正确率为95.7%。解决了声频检索中检索反应滞后时间长、检索速度随检索目标长度增加呈线性下降等问题。  相似文献   

3.
4.
5.
Sound indexing and segmentation of digital documents especially in the internet and digital libraries are very useful to simplify and to accelerate the multimedia document retrieval. We can imagine that we can extract multimedia files not only by keywords but also by speech semantic contents. The main difficulty of this operation is the parameterization and modelling of the sound track and the discrimination of the speech, music and noise segments. In this paper, we will present a Speech/Music/Noise indexing interface designed for audio discrimination in multimedia documents. The program uses a statistical method based on ANN and HMM classifiers. After pre-emphasis and segmentation, the audio segments are analysed by the cepstral acoustic analysis method. The developed system was evaluated on a database constituted of music songs with Arabic speech segments under several noisy environments.  相似文献   

6.
基于听觉感知的噪声语义描述是噪声声品质研究的基础性问题,已有研究未将语义描述与噪声来源、频谱特性以及产品运行状态等物理信息联系起来。该文分别针对飞机舱内噪声、车辆噪声和空气净化器噪声这3组典型噪声开展了主观评价实验,并通过多维尺度分析和主成分分析描述了3组噪声的语义空间,系统分析了不同类型噪声的描述词,同时解释了描述词与噪声物理属性之间的联系。研究发现:飞机舱内噪声、车辆噪声以及空气净化器噪声可以由4维、4维和3维语义空间进行描述;不同类型噪声在语义描述中具有共性与个性,3组噪声语义的主要维度均与嘈杂感相关,而噪声的个性描述词与其声源的物理属性密切相关;进行声品质建模及应用时,应同时考虑噪声共性和个性描述词对听觉感知的影响,采取有针对性的措施以提升产品声品质。该文从听觉感知的角度进行了噪声特性的语义描述和分析,研究结果可为产品声品质以及噪声控制研究提供帮助。  相似文献   

7.
8.
9.
10.
Heart sound signals reflect valuable information about heart condition. Previous studies have suggested that the information contained in single-channel heart sound signals can be used to detect coronary artery disease (CAD). But accuracy based on single-channel heart sound signal is not satisfactory. This paper proposed a method based on multi-domain feature fusion of multi-channel heart sound signals, in which entropy features and cross entropy features are also included. A total of 36 subjects enrolled in the data collection, including 21 CAD patients and 15 non-CAD subjects. For each subject, five-channel heart sound signals were recorded synchronously for 5 min. After data segmentation and quality evaluation, 553 samples were left in the CAD group and 438 samples in the non-CAD group. The time-domain, frequency-domain, entropy, and cross entropy features were extracted. After feature selection, the optimal feature set was fed into the support vector machine for classification. The results showed that from single-channel to multi-channel, the classification accuracy has increased from 78.75% to 86.70%. After adding entropy features and cross entropy features, the classification accuracy continued to increase to 90.92%. The study indicated that the method based on multi-domain feature fusion of multi-channel heart sound signals could provide more information for CAD detection, and entropy features and cross entropy features played an important role in it.  相似文献   

11.
A large number of the vocalization studies on mammals are based on time-frequency analysis of the produced sounds. The patterns, which are extracted from the time-frequency representations, determine the classification in the different sound categories. However, there are situations where this pattern related recognition does not allow a precise characterization of the vocalization to be obtained. In these situations, a feasible alternative, which can help by giving the dominant component of the sound, is to measure the strength of the tonal and pulsed constituent units. In this work, the use of a ratio of pulsed to tonal strength is proposed to objectively measure the distribution of energy between these two components. This pulsed to tonal ratio (PTR) can be computed with the aid of the discrete cosine transform. It is demonstrated that the PTR can be obtained with a relatively simple expression without having to go through the time- frequency representation. This work presents examples that show how the PTR can be used to distinguish between two very similar Beluga whale sounds and how to dynamically track the power distribution between the pulsed and tonal components in non-stationary signals.  相似文献   

12.
杨帆  杨杰朝 《应用声学》2014,33(6):554-559
当前已有大量研究人员使用基于Lab VIEW的虚拟声频信号分析系统开展研究工作,Lab VIEW正在成为声频信号测量分析的主要研究手段,但是现有成果均使用了以频率为代表的物理声学单位,没有同时提供在音乐实践常用的音分标记法,影响了研究成果的普适性。本文通过分析频率与音分的转换方法,使用Lab VIEW为平台编写频率与音分转换应用,结果表明,基于Lab VIEW的频率与音分转换应用,应用运算稳定,适用性好。该设计方便了物理声学单位转换为音乐声学单位,便于研究成果能够推广到更广泛的声学研究中去。  相似文献   

13.
Xi Lu  Yiping Cao  Pei Lu  Aiping Zhai 《Optik》2012,123(8):697-702
In this paper, Arnold transformation and double random-phase encoding technique widely used in digital image information hiding are introduced to digital audio information hiding. The digital audio is transformed into a 2-D image called sound map and then the sound map will be divided into many windows and each window will be encrypted based on the Arnold transformation. Finally sound map will be re-encrypted based on double random-phase encoding technique. This method offers many advantages for digital audio information hiding: improve security and high attack immunity.  相似文献   

14.
A method for analyzing and displaying electroglottographic (EGG) signals (and their first derivative, DEGG) is introduced: the electroglottographic wavegram ("wavegram" hereafter). To construct a wavegram, the time-varying fundamental frequency is measured and consecutive individual glottal cycles are identified. Each cycle is locally normalized in duration and amplitude, the signal values are encoded by color intensity and the cycles are concatenated to display the entire voice sample in a single image, similar as in sound spectrography. The wavegram provides an intuitive means for quickly assessing vocal fold contact phenomena and their variation over time. Variations in vocal fold contact appear here as a sequence of events rather than single phenomena, taking place over a certain period of time, and changing with pitch, loudness and register. Multiple DEGG peaks are revealed in wavegrams to behave systematically, indicating subtle changes of vocal fold oscillatory regime. As such, EGG wavegrams promise to reveal more information on vocal fold contacting and de-contacting events than previous methods.  相似文献   

15.
李晗  陈克安  田旭华 《应用声学》2016,35(4):294-301
以平板结构导纳函数为纽带,建立冲击声信号特征与声源特性之间的关联,获得与声源属性密切相关的特征用于目标分类。针对四边简支矩形被击板,借助信号参数识别算法获得与声源物理属性有关的6维导纳特征,并从冲击声样本中提取80维音色特征,将音色特征和导纳特征做相关性分析,获得与声源物理属性相关的信号特征集。利用BP神经网络进行分类,结果表明,当采用与特定声源物理属性相关的信号特征子集时,分类效果达到同组最优。  相似文献   

16.
This paper describes field measurements to assess innovative correlation techniques for the study of meteorological and topographical effects on sound propagation. To take advantage of the properties of coded signals in a time-varying system, the correlation signal is produced by the modulation of a code sequence onto an acoustic carrier. An established method of increasing signal-to-noise ratio is to use correlation techniques with maximum length sequences. However, this standard method is restricted in its use outdoors because of the time-variant nature of the atmosphere. On the other hand, the correlation properties of a directly carrier-modulated code sequence modulation signal may be exploited in a time-varying environment. An experiment is described in which the correlation properties of the spread spectrum signal are demonstrated and are used to calculate accurate times of flight that compare well with sonic anemometer measurements of speed of sound. The results illustrate that an acoustical spread spectrum system can provide significantly improved ways of measuring sound propagation outdoors.  相似文献   

17.
Experiments were performed to study the production of broadband sound in confined pulsating jets through orifices with a time-varying area. The goal was to better understand broadband sound generation at the human glottis during voicing. The broadband component was extracted from measured sound signals by the elimination of the periodic component through ensemble averaging. Comparisons were made between the probability density functions of the broadband sound in pulsating jets and of comparable stationary jets. The results indicate that the quasi-steady approximation may be valid for the broadband component when the turbulence is well established and the turbulence kinetic energy is comparatively large. A wavelet analysis of the broadband sound showed that random sound production was modulated at the driving frequency. Two distinct sound production peaks were observed during one cycle, presumably associated firstly with jet formation and secondly with flow deceleration during orifice closing. Most high-frequency sound was produced during the closing phase. Deviations from quasi-steady behavior were observed. As the driving frequency increased, sound production during the opening phase was reduced, possibly due to the shorter time available for turbulence to develop. These results may be useful for better quality voice synthesis.  相似文献   

18.
19.
环己烷13C NMR化学位移的预测   总被引:1,自引:0,他引:1  
通过计算机辅助方法对13C NMR化学位移进行预测.这个方法包括分子拓扑指数和几何参数特征值的计算及对所选特征进行的变量压缩,并对所选共振碳的化学位移与其提取的特征进行多元回归分析,从而得出其相关数学模型.本文预测了环己烷中45个仲碳原子的13C化学位移,其标准误差约为1.4ppm.  相似文献   

20.
通过计算机辅助方法对1 3 CNMR化学位移进行预测 .这个方法包括分子拓扑指数和几何参数特征值的计算及对所选特征进行的变量压缩 ,并对所选共振碳的化学位移与其提取的特征进行多元回归分析 ,从而得出其相关数学模型 .本文预测了环己烷中 45个仲碳原子的1 3 C化学位移 ,其标准误差约为 1 .4ppm .  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号