期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The Study of Full Light Speech Signal Collection System

JIA Bo JIN Yaqiu ZHANG Wei Hu Li YE Kunzhen 《中国光学快报(英文版)》2001,10(3)

The demodulation character of 3×3 optic fiber couplers is analyzed, and the application in the coherent communication system and speech signal collection is pointed out in the paper. By the experiment, the feasibility of speech signal collection system by the way of the all light is verified. 相似文献

2.

On application of adaptive decorrelation filtering to assistive listening

Zhao Y Yen KC Soli S Gao S Vermiglio A 《The Journal of the Acoustical Society of America》2002,111(2):1077-1085

This paper describes an application of the multichannel signal processing technique of adaptive decorrelation filtering to the design of an assistive listening system. A simulated "dinner table" scenario was studied. The speech signal of a desired talker was corrupted by three simultaneous speech jammers and by a speech-shaped diffusive noise. The technique of adaptive decorrelation filtering processing was used to extract the desired speech from the interference speech and noise. The effectiveness of the assistive listening system was evaluated by observing improvements in A-weighted signal-to-noise ratio (SNR) and in sentence intelligibility, where the latter was evaluated in a listening test with eight normal hearing subjects and three subjects with hearing impairments. Significant improvements in SNR and sentence intelligibility were achieved with the use of the assistive listening system. For subjects with normal hearing, the speech reception threshold was improved by 3 to 5 dBA, and for subjects with hearing impairments, the threshold was improved by 4 to 8 dBA. 相似文献

3.

Pitch-based monaural segregation of reverberant speech

Roman N Wang D 《The Journal of the Acoustical Society of America》2006,120(1):458-469

In everyday listening, both background noise and reverberation degrade the speech signal. Psychoacoustic evidence suggests that human speech perception under reverberant conditions relies mostly on monaural processing. While speech segregation based on periodicity has achieved considerable progress in handling additive noise, little research in monaural segregation has been devoted to reverberant scenarios. Reverberation smears the harmonic structure of speech signals, and our evaluations using a pitch-based segregation algorithm show that an increase in the room reverberation time causes degraded performance due to weakened periodicity in the target signal. We propose a two-stage monaural separation system that combines the inverse filtering of the room impulse response corresponding to target location and a pitch-based speech segregation method. As a result of the first stage, the harmonicity of a signal arriving from target direction is partially restored while signals arriving from other directions are further smeared, and this leads to improved segregation. A systematic evaluation of the system shows that the proposed system results in considerable signal-to-noise ratio gains across different conditions. Potential applications of this system include robust automatic speech recognition and hearing aid design. 相似文献

4.

Multiresolution information measures applied to speech recognition

María E. Torres Hugo L. Rufiner Diego H. Milone 《Physica A》2007,385(1):319-332

Considerable advances in automatic speech recognition have been made in the last decades, thanks specially to the use of hidden Markov models. In the field of speech signal analysis, different techniques have been developed. However, deterioration in the performance of the speech recognizers has been observed when they are trained with clean signal and tested with noisy signals. This is still an open problem in this field. Continuous multiresolution entropy has been shown to be robust to additive noise in applications to different physiological signals. In previous works we have included Shannon and Tsallis entropies, and their corresponding divergences, in different speech analysis and recognition systems. In this paper we present an extension of the continuous multiresolution entropy to different divergences and we propose them as new dimensions for the pre-processing stage of a speech recognition system. This approach takes into account information about changes in the dynamics of speech signal at different scales. The methods proposed here are tested with speech signals corrupted with babble and white noise. Their performance is compared with classical mel cepstral parametrization. The results suggest that these continuous multiresolution entropy related measures provide valuable information to the speech recognition system and that they could be considered to be included as an extra component in the pre-processing stage. 相似文献

5.

Across-ear interference from parametrically degraded synthetic speech signals in a dichotic cocktail-party listening task

Brungart DS Simpson BD Darwin CJ Arbogast TL Kidd G 《The Journal of the Acoustical Society of America》2005,117(1):292-304

Recent results have shown that listeners attending to the quieter of two speech signals in one ear (the target ear) are highly susceptible to interference from normal or time-reversed speech signals presented in the unattended ear. However, speech-shaped noise signals have little impact on the segregation of speech in the opposite ear. This suggests that there is a fundamental difference between the across-ear interference effects of speech and nonspeech signals. In this experiment, the intelligibility and contralateral-ear masking characteristics of three synthetic speech signals with parametrically adjustable speech-like properties were examined: (1) a modulated noise-band (MNB) speech signal composed of fixed-frequency bands of envelope-modulated noise; (2) a modulated sine-band (MSB) speech signal composed of fixed-frequency amplitude-modulated sinewaves; and (3) a "sinewave speech" signal composed of sine waves tracking the first four formants of speech. In all three cases, a systematic decrease in performance in the two-talker target-ear listening task was found as the number of bands in the contralateral speech-like masker increased. These results suggest that speech-like fluctuations in the spectral envelope of a signal play an important role in determining the amount of across-ear interference that a signal will produce in a dichotic cocktail-party listening task. 相似文献

6.

语音通信降噪研究

下载免费PDF全文

田玉静左红伟王超《应用声学》2020,39(6):932-939

语音通信系统中，语音通过信道传输将不可避免地引入码间串扰和信号畸变，同时受到噪声污染。本文在分析自适应盲均衡算法CMA(constant modulus algorithm）和改进盲均衡算法的基础上，考虑到自适应盲均衡技术在语音噪声控制方面能力有限，将自适应盲均衡技术与小波包掩蔽阈值降噪算法联合使用，形成一种基带语音增强新方法。仿真试验结果显示自适应盲均衡技术可以使星座图变得清晰而紧凑，有效减小误码率。研究证实该方法在语音信号ISI和畸变严重情况下，在白噪及有色噪声不同的噪声环境中都具有稳定的降噪能力，消噪同时可获得汉语普通话良好的听觉效果。相似文献

7.

Proposals for the directive characterisation of electroacoustic emitters

J. Pfretzschner A. Moreno 《Applied Acoustics》1981,14(3):197-205

In this paper, the concept of directivity is generalised to acoustic sources radiating transients or signals evolutive with time. (These are the most common cases in the power or reinforcement electroacoustic systems.) The generalisation proposed is based upon the anamorphism relating the signal levels emitted into the free space in different directions. The relationship between the signals observed in two arbitrary directions is essentially independent of time. Therefore, the anamorphical relationship offers the possibility of obtaining the directivity patterns simply by using as a test signal that signal commonly emitted by the system (i.e. speech in a reinforcement system for a conference room). This principle and method can be applied without major restrictions to any other system, or piece or part of machinery emitting acoustic energy in discontinuous form.Concerning electroacoustic sources, it appears advantageous to replace the usual test signal consisting of pure tones by the signal proper to the system (music, speech, etc.) filtered into the standardised frequency bands. The complete signal (not filtered) can also give significant results. As a simplifying and reasonable compromise regarding the directivity for speech and music, bands of white noise are proposed as test signals. 相似文献

8.

全局特征及弱尺度融合策略的小样本语音情感识别

下载免费PDF全文

黄永明章国宝李雄达飞鹏《声学学报》2012,37(3):330-338

语音是一种短时平稳时频信号,因此大多数的研究者都通过分帧来提取情感特征。然而,分帧后提取的特征为局部特征,无法准确反应情感语音动态特性,故单纯采用局部特征往往无法构建鲁棒的情感识别系统。针对这个问题,先在不分帧的语音信号里通过多尺度最优小波包分解提取语句级全局特征,分帧后再提取384维的语句级局部特征,并利用Fisher准则进行降维,最后提出一种弱尺度融合策略来将这两种语句级特征进行融合,再利用SVM进行情感分类。基于柏林情感库的实验结果表明本文方法较单纯使用语句级局部特征最后识别率提高了4.2%到13.8%,特别在小样本的情况下,语音情感识别率波动较小。相似文献

9.

车载场景结合盲源分离与多说话人状态判决的语音抽取 总被引：1，自引：0，他引：1

下载免费PDF全文

王泽林陈锴卢晶《声学学报》2020,45(5):696-706

在车载分布式传声器阵列场景中,结合盲源分离TRINICON (Triple-N ICA for convolutive mixtures)算法与多说话人状态判决实现期望语音抽取。根据分布式传声器阵列与声源的相对位置关系,设计特定的盲源分离初始化条件以保证输出通道与声源的映射关系;根据分布式传声器阵列的频响特点,设计特征矢量来进行多说话人判决,并将判决结果引入TRINICON算法参数迭代过程。在使用实际车载录音数据的仿真评测中,所提方法在不同信噪比下有较高的鲁棒性,可有效提升TRINICON算法的收敛速度和语音信号的信扰比,且可以确保准确的通道映射。评测结果表明该方法可以在车载场景中有效抽取出期望语音,为车载复杂场景下的声信息提取提供了一种可靠且收敛快速的解决方法。相似文献

10.

Calculation of selective filters of a device for primary analysis of speech signals

L. S. Chudnovskii V. M. Ageev 《Acoustical Physics》2014,60(4):436-441

The amplitude-frequency responses of filters for primary analysis of speech signals, which have a low quality factor and a high rolloff factor in the high-frequency range, are calculated using the linear theory of speech production and psychoacoustic measurement data. The frequency resolution of the filter system for a sinusoidal signal is 40–200 Hz. The modulation-frequency resolution of amplitude- and frequency-modulated signals is 3–6 Hz. The aforementioned features of the calculated filters are close to the amplitudefrequency responses of biological auditory systems at the level of the eighth nerve. 相似文献

11.

Noise reduction using three-step gain factor and iterative-directional-median filter

《Applied Acoustics》2014

Musical residual noise is a major problem for a speech enhancement system. This noise is very annoying to the human ear and can significantly deteriorate the perception quality of enhanced speech. In this study, we aim at reducing the quantity of musical residual noise by a two-stage speech enhancement approach. In the first stage a preprocessor enhances noisy speech using an algorithm which combines the two-step-decision-directed and the Virag methods. In the second stage the enhanced speech signal is post-processed by an iterative-directional-median filter to significantly reduce the quantity of residual noise, while maintaining the harmonic spectra. Experimental results show that the proposed approach can significantly improve the performance of a speech enhancement system by reducing the quantity of residual noise. 相似文献

12.

基于卡尔曼滤波的低复杂度去混响算法* 总被引：1，自引：1，他引：0

下载免费PDF全文

齐园蕾杨飞然杨军《应用声学》2018,37(4):559-566

在电话会议、智能音箱等应用场景下,传声器往往处在声源的远场。混响信号的存在会掩蔽后续到达的直达声信号,降低传声器接收信号的语音质量,以及语音识别系统的准确识别率。多通道线性预测算法是一种经典的盲去混响算法,但该算法往往具有较高的计算复杂度。本文提出了一种简化的卡尔曼滤波更新算法,通过对角化卡尔曼滤波器状态向量误差协方差矩阵,降低了自适应多通道线性预测去混响算法的复杂度。通过与现有分块对角简化算法对比发现,本文提出的简化算法在保证语音质量的同时,进一步降低了原卡尔曼滤波算法的复杂度。相似文献

13.

语音信号序列的Volterra预测模型

下载免费PDF全文

张玉梅胡小俊吴晓军白树林路纲《物理学报》2015,64(20):200507-200507

对给定的英语音素、单词和语句进行了采集并完成预处理. 分别应用互信息法和Cao 氏法确定了实际采集的语音信号序列的延迟时间和嵌入维数, 以完成语音序列的相空间重构. 通过计算实际采集的语音信号序列的最大Lyapunov指数, 完成了语音信号的混沌特性识别, 判定其具有混沌特性. 引入Volterra级数, 提出了一种具有显式结构的语音信号非线性预测模型. 为克服最小均方误差算法在Volterra模型系数更新时固有的缺点, 在最小二乘法基础上, 应用基于后验误差假设的可变收敛因子技术, 构建了一种基于Davidon-Fletcher-Powell算法的二阶Volterra 模型(DFPSOVF), 并将其应用于具有混沌特性的语音信号序列预测. 仿真结果表明: DFPSOVF非线性预测模型对于单帧和多帧语音信号均具有更好的预测精度, 优于线性预测模型, 并且能够很好地反映语音序列变化的趋势和规律, 完全可以满足语音预测的要求; 可以根据语音信号序列的嵌入维数选取预测模型的记忆长度. 所提出模型可以为语音信号重构和压缩编码开辟一条新途径, 以改善语音信号处理方法的复杂度和处理效果. 相似文献

14.

Fractal dimensions of speech sounds: computation and application to automatic speech recognition 总被引：4，自引：0，他引：4

Maragos P Potamianos A 《The Journal of the Acoustical Society of America》1999,105(3):1925-1932

The dynamics of airflow during speech production may often result in some small or large degree of turbulence. In this paper, the geometry of speech turbulence as reflected in the fragmentation of the time signal is quantified by using fractal models. An efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering is described, and its potential for speech segmentation and phonetic classification discussed. Also reported are experimental results on using the short-time fractal dimension of speech signals at multiple scales as additional features in an automatic speech-recognition system using hidden Markov models, which provide a modest improvement in speech-recognition performance. 相似文献

15.

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

Ghosh PK Goldstein LM Narayanan SS 《The Journal of the Acoustical Society of America》2011,129(6):4014-4022

Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. 相似文献

16.

Statistical modeling of speech Poincaré sections in combination of frequency analysis to improve speech recognition performance

Jafari A Almasganj F Bidhendi MN 《Chaos (Woodbury, N.Y.)》2010,20(3):033106

This paper introduces a combinational feature extraction approach to improve speech recognition systems. The main idea is to simultaneously benefit from some features obtained from Poincare? section applied to speech reconstructed phase space (RPS) and typical Mel frequency cepstral coefficients (MFCCs) which have a proved role in speech recognition field. With an appropriate dimension, the reconstructed phase space of speech signal is assured to be topologically equivalent to the dynamics of the speech production system, and could therefore include information that may be absent in linear analysis approaches. Moreover, complicated systems such as speech production system can present cyclic and oscillatory patterns and Poincare? sections could be used as an effective tool in analysis of such trajectories. In this research, a statistical modeling approach based on Gaussian mixture models (GMMs) is applied to Poincare? sections of speech RPS. A final pruned feature set is obtained by applying an efficient feature selection approach to the combination of the parameters of the GMM model and MFCC-based features. A hidden Markov model-based speech recognition system and TIMIT speech database are used to evaluate the performance of the proposed feature set by conducting isolated and continuous speech recognition experiments. By the proposed feature set, 5.7% absolute isolated phoneme recognition improvement is obtained against only MFCC-based features. 相似文献

17.

基于听觉模型的耳语音的声韵切分 总被引：5，自引：0，他引：5

下载免费PDF全文

丁慧栗学丽徐柏龄《应用声学》2004,23(2):20-25,44

本文分析了耳语音的特点，并根据生理声学及心理声学的基本理论与实验资料，提出了一种利用听觉模型来进行耳语音声韵切分的方法。这种适用于耳语音声韵切分的听觉感知模型主要分为四个层次：耳蜗对声音频率的分解机理；听觉系统的时域和频域非线性变化；中枢神经系统的侧抑制机理。这种模型能反映在噪声环境下人对低能量语音的听觉感知特性，因而适于耳语音识别，在耳语音声韵母切分实验中得到了满意的结果。相似文献

18.

基于字典学习和稀疏表示的单通道语音增强算法综述* 总被引：1，自引：0，他引：1

下载免费PDF全文

叶中付朱媛媛贾翔宇《应用声学》2019,38(4):645-652

如何从带噪语音信号中恢复出干净的语音信号一直都是信号处理领域的热点问题。近年来研究者相继提出了一些基于字典学习和稀疏表示的单通道语音增强算法,这些算法利用语音信号在时频域上的稀疏特性,通过学习训练数据样本的结构特征和规律来构造相应的字典,再对带噪语音信号进行投影以估计出干净语音信号。针对训练样本与测试数据不匹配的情况,有监督类的非负矩阵分解方法与基于统计模型的传统语音增强方法相结合,在增强阶段对语音字典和噪声字典进行更新,从而估计出干净语音信号。本文首先介绍了单通道情况下语音增强的信号模型,然后对4种典型的增强方法进行了阐述,最后对未来可能的研究热点进行了展望。相似文献

19.

Differentiation of speech and nonspeech processing within primary auditory cortex

Whalen DH Benson RR Richardson M Swainson B Clark VP Lai S Mencl WE Fulbright RK Constable RT Liberman AM 《The Journal of the Acoustical Society of America》2006,119(1):575-581

Primary auditory cortex (PAC), located in Heschl's gyrus (HG), is the earliest cortical level at which sounds are processed. Standard theories of speech perception assume that signal components are given a representation in PAC which are then matched to speech templates in auditory association cortex. An alternative holds that speech activates a specialized system in cortex that does not use the primitives of PAC. Functional magnetic resonance imaging revealed different brain activation patterns in listening to speech and nonspeech sounds across different levels of complexity. Sensitivity to speech was observed in association cortex, as expected. Further, activation in HG increased with increasing levels of complexity with added fundamentals for both nonspeech and speech stimuli, but only for nonspeech when separate sources (release bursts/fricative noises or their nonspeech analogs) were added. These results are consistent with the existence of a specialized speech system which bypasses more typical processes at the earliest cortical level. 相似文献

20.

第四讲语音信号处理的现状和展望 总被引：1，自引：0，他引：1

李昌立《物理》2005,34(4):300-306

文章简要介绍了“语音信号处理”这一分支学科形成和发展的历史过程．指出了它在现代信息科学技术中的地位和作用．介绍了语音信号处理在应用领域的一些重要课题，如语音的低速率编码，语音的规则合成和文一语转换系统，语音识别和人一机语音对话等，这些仍然是当前研究的热点．文章最后展望了语音信号处理的发展前景，指出在这个领域还有很多难题等待人们去研究探索．相似文献