首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 109 毫秒
1.
A novel image fusion algorithm based on wavelet-based contourlet transform (WBCT) and principal component analysis (PCA) is proposed. The PCA method is adopted for the low-frequency components. Using the proposed algorithm to choose the greater of the active measures, the region consistency test is performed for the high-frequency components. Experiments show that the proposed method works better in preserving the edge and texture information than wavelet transform method and Laplacian pyramid (LP) method do in image fusion. Four indicators for the fusion image are given to compare the proposed method with other methods.  相似文献   

2.
A novel image fusion algorithm based on wavelet-based contourlet transform(WBCT)and principal com- ponent analysis(PCA)is proposed.The PCA method is adopted for the low-frequency components.Using the proposed algorithm to choose the greater of the active measures,the region consistency test is per- formed for the high-frequency components.Experiments show that the proposed method works better in preserving the edge and texture information than wavelet transform method and Laplacian pyramid(LP) method do in image fusion.Four indicators for the fusion image are given to compare the proposed method with other methods.  相似文献   

3.
A speaker model called complete feature corpus (CFC) and an evaluation algorithm of mutual information (MIE) are proposed for text-independent speaker identification. The CFC model represents the speech and pronunciation characteristics of speaker by a feature vector corpus which was trained from some typical speech samples. It hires multi-step mini-max search matching scheme for MIE algorithm to evaluate the similarity of speech features between input speech and the models in distance and information space. Maximum mutual information (MMI) decision criterion is used to decide the identity of speaker. Experiments on performance analysis with comparison to GMM method show that proposed model and evaluation algorithm are quite effective and presented a higher performance than ordinary GMM method.  相似文献   

4.
A discriminative framework of tone model integration in continuous speech recognition was proposed. The method uses model dependent weights to scale probabilities of the hidden Markov models based on spectral features and tone models based on tonal features. The weights are discriminatively trained by minimum phone error criterion. Update equation of the model weights based on extended Baum-Welch algorithm is derived. Various schemes of model weight combination are evaluated and a smoothing technique is introduced to make training robust to over fitting. The proposed method is ewluated on tonal syllable output and character output speech recognition tasks. The experimental results show the proposed method has obtained 9.5% and 4.7% relative error reduction than global weight on the two tasks due to a better interpolation of the given models. This proves the effectiveness of discriminative trained model weights for tone model integration.  相似文献   

5.
A new sub-aperture overlapping area fusion algorithm based on wavelet transformation is proposed to retain high-frequency components as much as the measurements in the sub-aperture overlapping areas. The principles of sub-aperture stitching are briefly introduced, and the fusion algorithm based on wavelet transformation is demonstrated. The results of the experiment indicate that the new algorithm improves the retention of high-frequency measurement components.  相似文献   

6.
A method used for objective evaluation of pronunciation of finals in standard Chinese is presented. The formant pattern of final is selected as the mam feature and an improved evaluation algorithm based on Support Vector Machine is proposed. In this algorithm, two-level classification strategy is employed. A full-classification model and a sub-classification model are trained for each final. The pronunciation quality is evaluated based on the classification results of this two-level strategy with scoring model of each final. The new evaluation method is compared with traditional methods such as Hidden Markov Model (HMM) posterior probability scoring method and feature of Mel-Frequency Cepstrum Coefficients (MFCC), and the results show that the performance is effectively improved by the proposed method. The correlation of scores between human testers and machine has achieved 82%.  相似文献   

7.
The performance of linear prediction analysis of speech deteriorates rapidly under noisy environments.To tackle this issue,an improved noise-robust sparse linear prediction algorithm is proposed.First,the linear prediction residual of speech is modeled as Student-t distribution,and the additive noise is incorporated explicitly to increase the robustness,thus a probabilistic model for sparse linear prediction of speech is built.Furthermore,variational Bayesian inference is utilized to approximate the intractable posterior distributions of the model parameters,and then the optimal linear prediction parameters are estimated robustly.The experimental results demonstrate the advantage of the developed algorithm in terms of several different metrics compared with the traditional algorithm and the l1 norm minimization based sparse linear prediction algorithm proposed in recent years.Finally it draws to a conclusion that the proposed algorithm is more robust to noise and is able to increase the speech quality in applications.  相似文献   

8.
Holographic storage scheme based on digital signal processing   总被引:2,自引:0,他引:2  
In this paper, a holographic storage scheme for multimedia data storage and retrieval based on the digital signal processing (DSP) is designed. A communication model for holographic storage system is obtained on the analogy of traditional communication system. Many characteristics of holographic storage are embodied in the communication model. Then some new methods of DSP including two-dimensional (2-D) shifting interleaving, encoding and decoding of modulation-array (MA) code and method of soft-decision, etc. are proposed and employed in the system. From the results of experiments it can be seen that those measures can effectively reduce the influence of noise. A segment of multimedia data, including video and audio data, is retrieved successfully after holographic storage by using those techniques.  相似文献   

9.
Multitaper spectrum has lower variance than the traditional periodogram. The noise spectrum and the noise to noisy signal spectrum ratio (NNSR) were estimated from the multitaper spectrum of the noisy signal; the pre-enhanced speech for calculating the noise masking threshold was obtained by the spectral amplitude subtraction method, whose gain is a function of NNSR; the final enhanced speech was obtained by suppressing the Fourier spectrum of the noisy speech with the psychoacoustical weighting rule incorporating the noise masking threshold. Because of the low variance feature of the multitaper spectrum, a modified offset formula was proposed to calculate the noise masking threshold, thus the reconstructed speech with this modification has an improvement in MBSD (Modified Bark Spectral Distortion). When a maximum limitation less than one to the psychoacoustical weighting rule is further proposed, the higher the input SNR (> 0 dB) is, the more improvement the segmental SNR and the overall SNR have. The informal listening tests show that there is little speech distortion for the enhanced speech processed by the proposed method, the background noise is reduced much and free of musical noise.  相似文献   

10.
It is well known that auditory system of human beings has excellent performance which automatic speech recognition(ASR) systems can’t match,and fractional Fourier transform (FrFT) has unique advantages in non-stationary signal processing.In this paper,the Gammatone filterbank is applied to speech signals for front-end temporal filtering,and then acoustic features of the output subband signals are extracted based on fractional Fourier transform. Considering the critical effect of transform order for FrFT,an order adaptation method based on the instantaneous frequency is proposed,and its performance is compared with the method based on ambiguity function.ASR experiments are conducted on clean and noisy Putonghua digits,and the results show that the proposed features achieve significantly higher recognition rate than the MFCC baseline,and the order adaptation method based on instantaneous frequency has much lower complexity than that based on ambiguity function.Further more,the FrFT-based features achieve the highest recognition rate using the proposed order adaptation method.  相似文献   

11.
This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.  相似文献   

12.
A multistream phoneme recognition framework is proposed based on forming streams from different spectrotemporal modulations of speech. Phoneme posterior probabilities were estimated from each stream separately and combined at the output level. A statistical model of the final estimated posterior probabilities is used to characterize the system performance. During the operation, the best fusion architecture is chosen automatically to maximize the similarity of output statistics to clean condition. Results on phoneme recognition from noisy speech indicate the effectiveness of the proposed method.  相似文献   

13.
设计了一个适用于端到端语音增强的改进的U-Net (Attention Dilated Convolution U-Net,ADC-U-Net)网络模型。与基线U-Net网络相比,一方面通过加入空洞卷积减小由采样带来的信息损失;另一方面引入了注意力机制结构,结合了含噪语音更多的上下文信息,提取更深层次和更丰富的特征信息。与传统语音增强方法相比,所提模型无需提取特征、对特征去噪、重构语音3个步骤,避免了对显性特征的依赖,转而由网络模型通过多层次多尺度学习获得隐性特征。用多个主客观指标对增强语音的质量和可懂度进行了评价。实验数据显示所提算法在噪声抑制能力和对噪声的适应度方面均表现出良好的性能,与基线U-Net网络及其它模型相比,展示了良好的语音质量和可懂度。   相似文献   

14.
为实现噪声情况下的人声分离,提出了一种采用稀疏非负矩阵分解与深度吸引子网络的单通道人声分离算法。首先,通过训练得到人声与噪声的字典矩阵,将其作为先验信息从带噪混合语音中分离出人声与噪声的系数矩阵;然后,根据人声系数矩阵中不同的声源成分在嵌入空间中的相似性不同,使用深度吸引子网络将其分离为各声源语音的系数矩阵;最后,使用分离得到的各语音系数矩阵与人声的字典矩阵重构干净的分离语音。在不同噪声情况下的实验结果表明,本文算法能够在抑制背景噪声的同时提高分离语音的整体质量,优于结合声噪人声分离模型的对比算法。   相似文献   

15.
This letter points out that, although in the audio signal domain low-pass filtering has been used to prevent aliasing noise from entering the baseband of speech signals, an antialias process in the speech feature domain is still needed to prevent high modulation frequency components from entering the baseband of speech features. The existence of aliasing noise in speech features is revealed via spectral analysis of speech feature streams. A method for suppressing such aliasing noise is proposed. Experiments on large vocabulary speech recognition show that antialias processing of speech features can improve speech recognition, especially for noisy speech.  相似文献   

16.
17.
牛丽红  倪国强 《光学技术》2005,31(3):420-423
来自多传感器的目标特征往往是高维数的,并且包含了更多的冗余信息和噪声。为了减小数据获取的代价,提高目标识别器的性能和效率,提出了基于遗传算法(GA)的多传感器目标识别系统特征优化方法。将遗传算法与神经网络目标分类器结合,通过识别结果的反馈信息,控制GA的遗传进化方向,从而实现特征优化。为了克服遗传算法的未成熟收敛问题,提出了相关选择与自适应遗传算子相结合的改进遗传算法。仿真实验结果验证了方法的有效性。  相似文献   

18.
19.
如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。  相似文献   

20.
基于亚像素区域加权能量特征的多尺度图像融合算法   总被引:4,自引:0,他引:4  
对矩形和圆形区域中各像素进行亚像素划分,确定各亚像素的权值,得到基于哑像素的综合加权区域能量.融合箅法首先对源图像进行金字塔分解,然后对金字塔的高频细节分量使用基于哑像素加权区域能量特征的融合规则取大,对低频粗糙分量取平均.得到融合图像的塔形分解,最后重构融合图像.仿真结果表明,新算法融合效果较常规的区域能量特征作为融合规则的多分辨率图像融合算法效果更好,从清晰度和熵的评价来看,提高了融合图像的品质.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号