首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 109 毫秒
1.
本文提出了语音信号的一种时域─频域─能量表示,并给出了算法,可用于孤立词语音识别.这种时域─频域─能量表示有两个特点:基于短时能量梯度的非线性时间规正,可保留语音信号频域的过渡特性,丢掉其稳态特性;计算量小,适于实时应用.  相似文献   

2.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。  相似文献   

3.
基于听觉模型的耳语音的声韵切分   总被引:5,自引:0,他引:5       下载免费PDF全文
丁慧  栗学丽  徐柏龄 《应用声学》2004,23(2):20-25,44
本文分析了耳语音的特点,并根据生理声学及心理声学的基本理论与实验资料,提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次:耳蜗对声音频率的分解机理;听觉系统的时域和频域非线性变化;中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性,因而适于耳语音识别,在耳语音声韵母切分实验中得到了满意的结果。  相似文献   

4.
本文针对语音信号稀疏表示及压缩感知问题,将听觉感知引入稀疏系数筛选过程,用掩蔽阈值筛选重要系数,以得到更符合听觉感受的语音稀疏表示。通过对一帧浊音信号分别采用掩蔽阈值和能量阈值方法进行系数筛选对比实验,结果表明掩蔽阈值法具有更好的稀疏表示效果。为验证听觉感知对语音压缩感知性能的影响,与能量阈值法对照对测试语音进行压缩感知观测和重构,通过压缩比、信噪比、主观平均意见分等主客观指标评价其性能,结果表明,掩蔽阈值法可有效地提高压缩比且保证重构语音具有较高的主观听觉质量。  相似文献   

5.
在低信噪比和突发背景噪声条件下,已有的深度学习网络模型在单通道语音增强方面效果并不理想,而人类可以利用语音的长时相关性对不同的语音信号形成综合感知。因此刻画语音的长时依赖关系有助于改进低信噪比和突发背景噪声下的增强性能。受该特性的启发,提出一种融合多头注意力机制和U-net深度网络的增强模型TU-net,实现基于时域的端到端单通道语音增强。TU-net网络模型采用U-net网络的编解码层对带噪语音信号进行多尺度特征融合,并利用多头注意力机制实现双路径Transformer,用于计算语音掩模,更好地建模长时相关性。该模型在时域、时频域和感知域计算损失函数,并通过加权组合损失函数指导训练。仿真实验结果表明,TU-net在低信噪比和突发背景噪声条件下增强语音信号的语音质量感知评估(PESQ)、短时客观可懂度(STOI)和信噪比增益等多个评价指标都优于同类的单通道增强网络模型,且保持相对较少的网络模型参数。  相似文献   

6.
罗宇  胡维平  吴华楠 《应用声学》2023,42(5):1099-1105
基于深度聚类的语音分离方法已被证明能有效地解决混合语音中说话人输出标签排列的问题,然而,现有关于聚类进行说话人分离方法,大多数是优化嵌入使每个源的重建误差最小化。本文以时域卷积网络(ConvTasNet)为基础网络,设计了一种改进基于聚类的门控卷积(Gate-conv Cluster)语音分离方法,在时域上通过堆叠的门控卷积网络,实现端到端深度聚类的源分离。该框架将非线性门控激活用于时域卷积网络中,提取语音信号的深层次特征;同时在高维特征空间中聚类对语音信号的特征进行表示和划分,为恢复不同信号源提供了一个长期的说话者表示信息。该框架解决了说话人输出标签排列问题并对语音信号的长期依赖性进行建模。通过华尔街日报数据集进行实验得出,该方法在SDRi(信源失真比)和Si-SNR(尺度不变信源噪声比)指标上分别达到了16.72 dB和16.33 dB的效果。  相似文献   

7.
提出一种基于小波变换思想的水下测距方法.根据信号的能量一致性以及小波的带通滤波特性,并以二元样条插值为架构,实现信号的时频结合.该方法先将时域信号进行小波时域分解滤波,获得较为完整的时域有效信息,然后对初步处理的时域信号进行小波频域分解,通过找寻信号时频域对应的能量最值位置锁定目标,实现精确测距目的.进行不同衰减长度水体的连续光水下测距实验,分析该方法对连续光水下探测的影响.经实验验证,该测距方法在输出功率2.3 W内,成功实现对8个衰减长度内目标的准确测量,其测距精度小于1 cm.  相似文献   

8.
万柏坤 《物理实验》1992,12(3):144-147,143
近年来,相位信息在物理信号检测中的重要作用正越来越为人注目。一般说来,物理信号既可在时域中表示,亦可在频域中表示。在频域中,信号将由各频率分量的振幅和相位两部分信息组成。X线晶体衍射、声音频谱分析、激光全息测量和图象信号处  相似文献   

9.
脱粘界面超声检测信号的小波多分辨率分析与重构   总被引:6,自引:1,他引:5  
针对钢-橡胶分层粘接结构的超声检测回波信号,利用小波变换的多分辨率时频特性提出了小波特征参数提取算法,据此在对检测信号进行的时间-尺度分布分析基础上,重构出脱粘界面回波信号的时域和频域特征。实验检测结果表明本文方法在现代工业NDT&E中的应用有着可喜的前景。  相似文献   

10.
构建了典型后门耦合目标,从时域和频域两个维度对目标进行了回波特性仿真,发现目标的孔缝-腔体结构出现强耦合时回波频域波形可观察到幅度凹坑,且强耦合频率时的回波时域波形呈双峰状,与非强耦合回波存在明显差异。通过改变后门耦合目标尺寸和形态仿真验证了所发现的回波信号特征规律,利用发现的回波信号特征规律可从回波信号中提取出未知目标腔体后门强耦合微波参数。  相似文献   

11.
This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained.  相似文献   

12.
The voice conversion (VC) technique recently has emerged as a new branch of speech synthesis dealing with speaker identity. In this work, a linear prediction (LP) analysis is carried out on speech signals to obtain acoustical parameters related to speaker identity - the speech fundamental frequency, or pitch, voicing decision, signal energy, and vocal tract parameters. Once these parameters are established for two different speakers designated as source and target speakers, statistical mapping functions can then be applied to modify the established parameters. The mapping functions are derived from these parameters in such a way that the source parameters resemble those of the target. Finally, the modified parameters are used to produce the new speech signal. To illustrate the feasibility of the proposed approach, a simple to use voice conversion software has been developed. This VC technique has shown satisfactory results. The synthesized speech signal virtually matching that of the target speaker.  相似文献   

13.
This paper addresses the problem of speech intelligibility enhancement by adaptive filtering algorithms employed with subband techniques. The two structures named the forward and backward blind source separation structures are extensively used in the speech enhancement and source separation areas, and largely studied in the literature with convolutive and non-convolutive mixtures. These two structures use two-microphones to generate the convolutive/non-convolutive mixing signal, and provide at the outputs the target and the jammer signal components. In this paper, we focus our interest on the backward structure employed to enhance the speech signal from a convolutive mixture. Furthermore, we propose a subband implementation of this structure to improve its behavior with speech signal. The new proposed subband-Backward BSS (SBBSS) structure allows a very important improvement of the convergence speed of the adaptive filtering algorithms when the subband-number is selected high. In order to improve the robustness of the proposed subband structure, we have adapted then applied a new criterion that combines the System Mismatch and the Mean-Errors criterion minimization. The proposed subband backward structure, when it is combined with this new criterion minimization, allows to enhance the output speech signal by reducing the distortion and the noise components. The performance of the proposed subband backward structure is validated through several objective criteria which are given and described in this paper.  相似文献   

14.
The purpose of this study was to determine whether individuals show differences in speech and voice during reading of the same news before and after attending a radio announcing course. Twenty-five students of a Radio Announcing Course in Sao Paulo city, 17 men and 8 women, aged 19 to 55 years, participated in this study. The readings were recorded in a professional audio studio, and the speech samples were submitted to perceptual and acoustic analysis. For the perceptual analysis, the samples were randomly presented in pairs and five trained speech pathologists identified each recording as pre- and posttraining, and also justified their choices by indicating what parameters better based their judgment: type of voice, articulation and pronunciation, loudness, pitch, resonance, speech rate, respiratory coordination, and use of emphasis. The acoustic parameters analyzed were mean, minimum, and maximum fundamental frequency, frequency range, text duration, and pause duration. The perceptual analysis showed that the posttraining speech samples were considered the best productions in 80% of the evaluations. Emphasis characterized the readings (70.4%), followed by type of voice (44.8%) and pitch (40.8%). Acoustic analysis showed higher mean fundamental frequency and increase of frequency range posttraining. These results indicated richer modulation in the posttraining readings. There are differences in the readings of the same news pre- and posttraining in a radio announcing course, and the posttraining reading was considered the best production, indicating the positive effect of the training.  相似文献   

15.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

16.
In this paper, a single-channel speech enhancement algorithm based on non-linear and multi-band Adaptive Gain Control (AGC) is proposed. The algorithm requires neither Signal-to-Noise Ratio (SNR) nor noise parameters estimation. It reduces the background noise in the temporal domain rather than the spectral domain using a non-linear and automatically adjustable gain function for multi-band AGC. The gain function varies in time and is deduced from the temporal envelope of each frequency band to highly compress the frequency regions where noise is present and lightly compress the frequency regions where speech is present. Objective evaluation using the PESQ (Perceptual Evaluation of Speech Quality) metric shows that the proposed algorithm performs better than three benchmarks, namely: the spectral subtraction, the Wiener filter based on a priori SNR estimation and a band-pass modulation filtering algorithm. In addition, blind subjective tests show that the proposed algorithm introduces less musical noise compared to the benchmark algorithms and was preferred 78.8% of the time in terms of signal quality. The proposed algorithm is implemented in a miniature low power digital signal processor to validate its feasibility and complexity for smart hearing protection in noisy environments.  相似文献   

17.
In the many studies done on informational masking, interfering speech reduces speech intelligibility. This effect is often used to secure privacy in public spaces. These applications require estimates of how much masking is required. In general, masking effects are estimated by using spectrum information as excitation patterns. However, estimates of informational masking can hardly be obtained by only using spectrum information. Therefore, we estimated the effects of informational masking using time-domain information. Then, we calculated the cepstra of the envelopes’ magnitude histograms. If these cepstra are different between the target and the masker, the signals are not similar in the time-domain. Furthermore, the effect of informational masking would be low. Therefore, we considered the histograms’ cepstra distances (HCD) to estimate signal similarities. The signal similarities in our first experiment were estimated using five maskers by utilizing the HCD. These maskers were random noise, music, female speech, male speech, and target speaker’s speech. Male and female speech were more similar to the target speech than music and noise. Also, the same speaker’s speech was the most similar in the set of maskers. A listening test was carried out in the second experiment to verify the HCD. A double masker was used in this experiment as an effective informational masker. It has similar characteristics to reversal speech. The listening test results suggest the double-masker’s masking effects has the same relation with HCD. This suggests informational masking can be estimated by signal similarity using the HCD.  相似文献   

18.
汉语耳语音孤立字识别研究   总被引:6,自引:0,他引:6       下载免费PDF全文
杨莉莉  林玮  徐柏龄 《应用声学》2006,25(3):187-192
耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%.  相似文献   

19.
徐歆  胡水清  陶超  杜功焕 《应用声学》2003,22(5):36-40,44
本文应用Short[8-10]改进了的短时非线性预测方法对正常语速的汉语音节和短语进行了分析!究,揭示了汉语语音中浊音和清音的短时非线性预测能力的差异,并且发现这种差异即使在强背景噪声下仍能用短时非线性预测方法加以辨别。这些为浊音和清音的切分提供了一种可能性手段。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号