首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 78 毫秒
1.
本文提出了语音信号的一种时域-频域-能量表示,并给出了算法,可用于孤立词语音识别,这种时域-频域-能量表示有两个特点,基于短时能量梯度的非线性时间规正,可保留语音信号频域的过滤特性,丢掉其稳态特性,计算量小,适于实时应用。  相似文献   

2.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。  相似文献   

3.
基于听觉模型的耳语音的声韵切分   总被引:5,自引:0,他引:5       下载免费PDF全文
丁慧  栗学丽  徐柏龄 《应用声学》2004,23(2):20-25,44
本文分析了耳语音的特点,并根据生理声学及心理声学的基本理论与实验资料,提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次:耳蜗对声音频率的分解机理;听觉系统的时域和频域非线性变化;中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性,因而适于耳语音识别,在耳语音声韵母切分实验中得到了满意的结果。  相似文献   

4.
本文针对语音信号稀疏表示及压缩感知问题,将听觉感知引入稀疏系数筛选过程,用掩蔽阈值筛选重要系数,以得到更符合听觉感受的语音稀疏表示。通过对一帧浊音信号分别采用掩蔽阈值和能量阈值方法进行系数筛选对比实验,结果表明掩蔽阈值法具有更好的稀疏表示效果。为验证听觉感知对语音压缩感知性能的影响,与能量阈值法对照对测试语音进行压缩感知观测和重构,通过压缩比、信噪比、主观平均意见分等主客观指标评价其性能,结果表明,掩蔽阈值法可有效地提高压缩比且保证重构语音具有较高的主观听觉质量。  相似文献   

5.
在低信噪比和突发背景噪声条件下,已有的深度学习网络模型在单通道语音增强方面效果并不理想,而人类可以利用语音的长时相关性对不同的语音信号形成综合感知。因此刻画语音的长时依赖关系有助于改进低信噪比和突发背景噪声下的增强性能。受该特性的启发,提出一种融合多头注意力机制和U-net深度网络的增强模型TU-net,实现基于时域的端到端单通道语音增强。TU-net网络模型采用U-net网络的编解码层对带噪语音信号进行多尺度特征融合,并利用多头注意力机制实现双路径Transformer,用于计算语音掩模,更好地建模长时相关性。该模型在时域、时频域和感知域计算损失函数,并通过加权组合损失函数指导训练。仿真实验结果表明,TU-net在低信噪比和突发背景噪声条件下增强语音信号的语音质量感知评估(PESQ)、短时客观可懂度(STOI)和信噪比增益等多个评价指标都优于同类的单通道增强网络模型,且保持相对较少的网络模型参数。  相似文献   

6.
罗宇  胡维平  吴华楠 《应用声学》2023,42(5):1099-1105
基于深度聚类的语音分离方法已被证明能有效地解决混合语音中说话人输出标签排列的问题,然而,现有关于聚类进行说话人分离方法,大多数是优化嵌入使每个源的重建误差最小化。本文以时域卷积网络(ConvTasNet)为基础网络,设计了一种改进基于聚类的门控卷积(Gate-conv Cluster)语音分离方法,在时域上通过堆叠的门控卷积网络,实现端到端深度聚类的源分离。该框架将非线性门控激活用于时域卷积网络中,提取语音信号的深层次特征;同时在高维特征空间中聚类对语音信号的特征进行表示和划分,为恢复不同信号源提供了一个长期的说话者表示信息。该框架解决了说话人输出标签排列问题并对语音信号的长期依赖性进行建模。通过华尔街日报数据集进行实验得出,该方法在SDRi(信源失真比)和Si-SNR(尺度不变信源噪声比)指标上分别达到了16.72 dB和16.33 dB的效果。  相似文献   

7.
本文介绍了一种实用的汉语语音合成系统,本系统采用线性预测方法对语音信号进行分析与合成,并对码本进行矢量量化,码率压缩到1200bit/s,大幅度降低了语音的存储能量,合成出的语音仍具有较高的清晰度.这种以数字信号处理专用芯片TMS320C10为处理器的便携式语音合成器,合成语音数量大,成本低,功耗较小,既可联机使用,也可用电池供电,自成系统.实际应用证明,汉语多功能语音合成器实用性强,用途广泛,适于推广.  相似文献   

8.
提出一种基于小波变换思想的水下测距方法.根据信号的能量一致性以及小波的带通滤波特性,并以二元样条插值为架构,实现信号的时频结合.该方法先将时域信号进行小波时域分解滤波,获得较为完整的时域有效信息,然后对初步处理的时域信号进行小波频域分解,通过找寻信号时频域对应的能量最值位置锁定目标,实现精确测距目的.进行不同衰减长度水体的连续光水下测距实验,分析该方法对连续光水下探测的影响.经实验验证,该测距方法在输出功率2.3 W内,成功实现对8个衰减长度内目标的准确测量,其测距精度小于1 cm.  相似文献   

9.
万柏坤 《物理实验》1992,12(3):144-147,143
近年来,相位信息在物理信号检测中的重要作用正越来越为人注目。一般说来,物理信号既可在时域中表示,亦可在频域中表示。在频域中,信号将由各频率分量的振幅和相位两部分信息组成。X线晶体衍射、声音频谱分析、激光全息测量和图象信号处  相似文献   

10.
基于数学形态滤波的语音信号基音特征提取   总被引:4,自引:1,他引:3  
数学形态滤波是一种关于信号形状处理的非线性变换,它能简化信号、消除较小分量而保留信号的基本形状特征.本文基于数学形态滤波方法提出了两个分别在时域和频域提取语音信号基音周期的方案,在频域提取基音周期的同时还能提取出语音信号的谱包络。它们具有简单、直观和计算效率高等特点。由于数学形态滤波运算是并行的、局部的,新方案适于并行化处理和易于硬件化实现。实验结果表明,选择合理的数学形态滤波参数以及线性预测编码参数,能获得准确的语音信号基音特征。  相似文献   

11.
The voice conversion (VC) technique recently has emerged as a new branch of speech synthesis dealing with speaker identity. In this work, a linear prediction (LP) analysis is carried out on speech signals to obtain acoustical parameters related to speaker identity - the speech fundamental frequency, or pitch, voicing decision, signal energy, and vocal tract parameters. Once these parameters are established for two different speakers designated as source and target speakers, statistical mapping functions can then be applied to modify the established parameters. The mapping functions are derived from these parameters in such a way that the source parameters resemble those of the target. Finally, the modified parameters are used to produce the new speech signal. To illustrate the feasibility of the proposed approach, a simple to use voice conversion software has been developed. This VC technique has shown satisfactory results. The synthesized speech signal virtually matching that of the target speaker.  相似文献   

12.
This paper presents a new method to speech enhancement based on time-frequency analysis and adaptive digital filtering. The proposed method for dual-channel speech enhancement was developed by tracking frequencies of corrupting signal by the discrete Gabor transform (DGT) and implementing multi-notch adaptive digital filter (MNADF) at those frequencies. Since no a priori knowledge of the noise source statistics is required this method differs from traditional speech enhancement methods. Specifically, the proposed method was applied to the case where speech quality and intelligibility deteriorate in the presence of background noise. Speech coders and automatic speech recognition (ASR) systems are designed to act on clean speech signals. Therefore, corrupted speech signals by the noise must be enhanced before their processing. The method uses a primary input containing the corrupted speech signal while a reference input containing the noise only. In this paper, we designed MNADF instead of single-notch adaptive digital filter and used DGT to track frequencies of corrupting signal because fast filtering process and fast measure of the time-dependent noise frequency are of great importance in speech enhancement process. Therefore, MNADF was implemented to take advantage of fast filtering process. Different types of noises from Noisex-92 database were used to degrade real speech signals. Objective measures, the study of the speech spectrograms and global signal-to-noise ratio (SNR), segmental SNR (segSNR), Itakura-Saito distance measure as well as subjective listing test demonstrated consistently superior enhancement performance of the proposed method over traditional speech enhancement method such as spectral subtraction. Combining MNADF and DGT, excellent speech enhancement was obtained.  相似文献   

13.
为了克服低信噪比输入下,语音增强造成语音清音中的弱分量损失,造成重构信号包络失真的问题。论文提出了一种新的语音增强方法。该方法根据语音感知模型,采用不完全小波包分解拟合语音临界频带,并对语音按子带能量进行清浊音区分处理,在阈值计算上,提出了一种清浊音分离,基于子带信号能量的小波包自适应阈值算法。通过仿真实验,客观评测和听音测试表明,该算法在低信噪比输入时较传统算法,能够更加有效地减少重构信号包络失真,在不损伤语音清晰度和自然度的前提下,使输出信噪比明显提高。将该算法与能量谱减法结合,进行二次增强能进一步提高降噪输出的语音质量。  相似文献   

14.
S.K. Tang 《Applied Acoustics》2008,69(12):1318-1331
A survey on the speech related acoustical parameters in the Hong Kong classrooms having standardized architectural layouts is carried out in the present study. Results suggest that these acoustical parameters are highly correlated with each other even across different octave bands. It is also found that the relationships between parameters of different kinds do not depend on the frequency bands. Besides, the present results indicate that the sound pulse decay inside a not very reverberant classroom consists of an initial fast decay, leading to deviations of the field survey results from those predicted by the exponential decay under the uniform sound energy decay assumption. It is believed that the strong correlations between the various speech related acoustical parameters and the regression information obtained in the present study can help the estimation of the speech quality of the classrooms in the design stage.  相似文献   

15.
为了提高传统正交匹配追踪(Orthogonal Matching Pursuit,OMP )算法的语音增强性能和运算速度,本研究基于稀疏编码理论,提出了一种改进的OMP算法的语音增强算法。其一,将K-奇异值分解(K-singular value decomposition,K-SVD)算法与OMP算法相结合,通过设置能量阈值的方法,提高OMP算法的语音增强性能;其二,通过改进传统OMP算法中信号稀疏逼近的计算方法,提高算法的运算速度。改进的OMP算法的语音增强算法与传统K-SVD语音增强算法相比,采用PESQ评价增强语音的质量,NCM评价语音的可懂度。在NCM的值基本保持不变的情况下,PESQ的值平均提高约12.47%,取得了更好的增强效果。取得了更好的增强效果。改进的OMP算法的运算速度与传统OMP算法相比提高近一倍。  相似文献   

16.
This paper addresses the problem of speech intelligibility enhancement by adaptive filtering algorithms employed with subband techniques. The two structures named the forward and backward blind source separation structures are extensively used in the speech enhancement and source separation areas, and largely studied in the literature with convolutive and non-convolutive mixtures. These two structures use two-microphones to generate the convolutive/non-convolutive mixing signal, and provide at the outputs the target and the jammer signal components. In this paper, we focus our interest on the backward structure employed to enhance the speech signal from a convolutive mixture. Furthermore, we propose a subband implementation of this structure to improve its behavior with speech signal. The new proposed subband-Backward BSS (SBBSS) structure allows a very important improvement of the convergence speed of the adaptive filtering algorithms when the subband-number is selected high. In order to improve the robustness of the proposed subband structure, we have adapted then applied a new criterion that combines the System Mismatch and the Mean-Errors criterion minimization. The proposed subband backward structure, when it is combined with this new criterion minimization, allows to enhance the output speech signal by reducing the distortion and the noise components. The performance of the proposed subband backward structure is validated through several objective criteria which are given and described in this paper.  相似文献   

17.
The purpose of this study was to determine whether individuals show differences in speech and voice during reading of the same news before and after attending a radio announcing course. Twenty-five students of a Radio Announcing Course in Sao Paulo city, 17 men and 8 women, aged 19 to 55 years, participated in this study. The readings were recorded in a professional audio studio, and the speech samples were submitted to perceptual and acoustic analysis. For the perceptual analysis, the samples were randomly presented in pairs and five trained speech pathologists identified each recording as pre- and posttraining, and also justified their choices by indicating what parameters better based their judgment: type of voice, articulation and pronunciation, loudness, pitch, resonance, speech rate, respiratory coordination, and use of emphasis. The acoustic parameters analyzed were mean, minimum, and maximum fundamental frequency, frequency range, text duration, and pause duration. The perceptual analysis showed that the posttraining speech samples were considered the best productions in 80% of the evaluations. Emphasis characterized the readings (70.4%), followed by type of voice (44.8%) and pitch (40.8%). Acoustic analysis showed higher mean fundamental frequency and increase of frequency range posttraining. These results indicated richer modulation in the posttraining readings. There are differences in the readings of the same news pre- and posttraining in a radio announcing course, and the posttraining reading was considered the best production, indicating the positive effect of the training.  相似文献   

18.
The intelligibility of sentences processed to remove temporal envelope information, as far as possible, was assessed. Sentences were filtered into N analysis channels, and each channel signal was divided by its Hilbert envelope to remove envelope information but leave temporal fine structure (TFS) intact. Channel signals were combined to give TFS speech. The effect of adding low-level low-noise noise (LNN) to each channel signal before processing was assessed. The addition of LNN reduced the amplification of low-level signal portions that contained large excursions in instantaneous frequency, and improved the intelligibility of simple TFS speech sentences, but not more complex sentences. It also reduced the time needed to reach a stable level of performance. The recovery of envelope cues by peripheral auditory filtering was investigated by measuring the intelligibility of 'recovered-envelope speech', formed by filtering TFS speech with an array of simulated auditory filters, and using the envelopes at the output of these filters to modulate sinusoids with frequencies equal to the filter center frequencies (i.e., tone vocoding). The intelligibility of TFS speech and recovered-envelope speech fell as N increased, although TFS speech was still highly intelligible for values of N for which the intelligibility of recovered-envelope speech was low.  相似文献   

19.
In this paper, a single-channel speech enhancement algorithm based on non-linear and multi-band Adaptive Gain Control (AGC) is proposed. The algorithm requires neither Signal-to-Noise Ratio (SNR) nor noise parameters estimation. It reduces the background noise in the temporal domain rather than the spectral domain using a non-linear and automatically adjustable gain function for multi-band AGC. The gain function varies in time and is deduced from the temporal envelope of each frequency band to highly compress the frequency regions where noise is present and lightly compress the frequency regions where speech is present. Objective evaluation using the PESQ (Perceptual Evaluation of Speech Quality) metric shows that the proposed algorithm performs better than three benchmarks, namely: the spectral subtraction, the Wiener filter based on a priori SNR estimation and a band-pass modulation filtering algorithm. In addition, blind subjective tests show that the proposed algorithm introduces less musical noise compared to the benchmark algorithms and was preferred 78.8% of the time in terms of signal quality. The proposed algorithm is implemented in a miniature low power digital signal processor to validate its feasibility and complexity for smart hearing protection in noisy environments.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号