首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 144 毫秒
1.
Measurement of pitch by subharmonic summation   总被引:4,自引:0,他引:4  
In order to account for the phenomenon of virtual pitch, various theories assume implicitly or explicitly that each spectral component introduces a series of subharmonics. The spectral-compression method for pitch determination can be viewed as a direct implementation of this principle. The widespread application of this principle in pitch determination is, however, impeded by numerical problems with respect to accuracy and computational efficiency. A modified algorithm is described that solves these problems. Its performance is tested for normal speech and "telephone" speech, i.e., speech high-pass filtered at 300 Hz. The algorithm out-performs the harmonic-sieve method for pitch determination, while its computational requirements are about the same. The algorithm is described in terms of nonlinear system theory, i.c., subharmonic summation. It is argued that the favorable performance of the subharmonic-summation algorithm stems from its corresponding more closely with current pitch-perception theories than does the harmonic sieve.  相似文献   

2.
Two experiments investigated the effects of critical bandwidth and frequency region on the use of temporal envelope cues for speech. In both experiments, spectral details were reduced using vocoder processing. In experiment 1, consonant identification scores were measured in a condition for which the cutoff frequency of the envelope extractor was half the critical bandwidth (HCB) of the auditory filters centered on each analysis band. Results showed that performance is similar to those obtained in conditions for which the envelope cutoff was set to 160 Hz or above. Experiment 2 evaluated the impact of setting the cutoff frequency of the envelope extractor to values of 4, 8, and 16 Hz or to HCB in one or two contiguous bands for an eight-band vocoder. The cutoff was set to 16 Hz for all the other bands. Overall, consonant identification was not affected by removing envelope fluctuations above 4 Hz in the low- and high-frequency bands. In contrast, speech intelligibility decreased as the cutoff frequency was decreased in the midfrequency region from 16 to 4 Hz. The behavioral results were fairly consistent with a physical analysis of the stimuli, suggesting that clearly measurable envelope fluctuations cannot be attenuated without affecting speech intelligibility.  相似文献   

3.
《Journal of voice》2019,33(6):851-859
PurposeThe pitch-shift reflex (PSR) is the adaptation of the fundamental frequency during phonation and speech and describes the auditory feedback control. Speakers without voice and speech disorders mostly show a compensation of the pitch change in the auditory feedback and adapt their fundamental frequency to the opposite direction. Dysphonic patients often display problems with the auditory perception and control of their voice during therapy. Our study focuses on the auditory and kinesthetic control mechanisms of patients with muscle tension dysphonia (MTD) and speakers without voice and speech problems. Main purpose of the study is the analysis of the functionality of the control mechanisms within phonation and speech between patients with MTD and normal speakers.MethodSixty-one healthy subjects (17 male, 44 female) and 22 patients with MTD (7 male, 15 female) participated following two paradigms including a sustained phonation (vowel /a/) and speech ([‘mama]). Within both paradigms the fundamental frequency of the auditory feedback was increased synthetically. For the analysis of the PSR the electroencephalogram, electroglottography, the voice signal, and the high-speed endoscopy data were recorded simultaneously. The PSR in the electroencephalogram was detected via the N100 and the mismatch negativity. Statistical tests were applied for the detection of the PSR in the physiological response within the electroglottography, voice, and high-speed endoscopy signals. The results were compared between both groups.ResultsNo differences were found between the controls and patients with MTD regarding latency and magnitude of the perception of the pitch shift in both paradigms, but for the magnitude of the behavioral response. Differences also could be found for both groups between the “no pitch” and “pitch” condition of the two paradigms regarding vocal fold dynamics and voice quality. Patients with MTD showed more vibrational irregularities during the PSR than the controls, especially regarding the symmetry of vocal fold dynamics.ConclusionPatients with MTD seem to have a disturbed interaction between the auditory and kinesthetic feedback inducing the execution of an overriding behavioral response.  相似文献   

4.
In this paper we present a model called the Modified Phase-Opponency (MPO) model for single-channel speech enhancement when the speech is corrupted by additive noise. The MPO model is based on the auditory PO model, proposed for detection of tones in noise. The PO model includes a physiologically realistic mechanism for processing the information in neural discharge times and exploits the frequency-dependent phase properties of the tuned filters in the auditory periphery by using a cross-auditory-nerve-fiber coincidence detection for extracting temporal cues. The MPO model alters the components of the PO model such that the basic functionality of the PO model is maintained but the properties of the model can be analyzed and modified independently. The MPO-based speech enhancement scheme does not need to estimate the noise characteristics nor does it assume that the noise satisfies any statistical model. The MPO technique leads to the lowest value of the LPC-based objective measures and the highest value of the perceptual evaluation of speech quality measure compared to other methods when the speech signals are corrupted by fluctuating noise. Combining the MPO speech enhancement technique with our aperiodicity, periodicity, and pitch detector further improves its performance.  相似文献   

5.
A probabilistic psychophysical model for monaural communication from the auditory nerve to the brain is given in the form of a tonotopic display of stimulus spectrum, termed central spectrum. The model builds upon prior research demonstrating the potential of neural timing cues from the auditory nerve for conveying information on complex spectra, and was designed to meet the quantified demands of the psychophysics of frequency measurement. The central spectrum magnitude at each frequency is determined by the response of the auditory-nerve fibre with characteristic frequency matching that frequency. An interval histogram from each fiber is passed through a filter matched to the characteristic frequency of the fiber. This output versus characteristic frequency defines the central spectrum. Detailed analysis demonstrates that efficient probabilistic processing of the central spectrum described known psychophysical properties of frequency measurement in discrimination and periodicity pitch experiments. Psychophysical models based upon the central spectrum model followed by optimum probabilistic pattern recognition are potentially relevant for predicting human communication limits in response to arbitrary sounds of speech and music.  相似文献   

6.
Pitch detection is an important part of speech recognition and speech processing. In this paper, a pitch detection algorithm based on second generation wavelet transform was developed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform. The proposed pitch detection algorithm was tested for both real speech and synthetic speech signal. Some experiments were carried out under noisy environment condition to evaluate the accuracy and robustness of the proposed algorithm. Results showed that the proposed algorithm was robust to noise and provided accurate estimates of the pitch period for both low-pitched and high-pitched speakers. Moreover, different wavelet filters that were obtained using second generation wavelet transform were considered to see the effects of them on the proposed algorithm. It was noticed that Haar filter showed good performance as compared to the other wavelet filters.  相似文献   

7.
王玥  李平  崔杰 《声学学报》2013,38(4):501-508
为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。   相似文献   

8.
How fast speakers can change pitch voluntarily is potentially an important articulatory constraint for speech production. Previous attempts at assessing the maximum speed of pitch change have helped improve understanding of certain aspects of pitch production in speech. However, since only "response time"--time needed to complete the middle 75% of a pitch shift--was measured in previous studies, direct comparisons with speech data have been difficult. In the present study, a new experimental paradigm was adopted in which subjects produced rapid successions of pitch shifts by imitating synthesized model pitch undulation patterns. This permitted the measurement of the duration of entire pitch shifts. Native speakers of English and Mandarin participated as subjects. The speed of pitch change was measured both in terms of response time and excursion time-time needed to complete the entire pitch shift. Results show that excursion time is nearly twice as long as response time. This suggests that physiological limitation on the speed of pitch movement is greater than has been recognized. Also, it is found that the maximum speed of pitch change varies quite linearly with excursion size, and that it is different for pitch rises and falls. Comparisons of present data with data on speed of pitch change from studies of real speech found them to be largely comparable. This suggests that the maximum speed of pitch change is often approached in speech, and that the role of physiological constraints in determining the shape and alignment of F0 contours in speech is probably greater than has been appreciated.  相似文献   

9.
提升小波加权自相关函数的基音检测算法*   总被引:1,自引:0,他引:1       下载免费PDF全文
王晨  章小兵  刘美娟 《应用声学》2018,37(2):201-207
随着计算机技术的发展,语音信号处理作为人机交互的重要渠道,其在复杂噪声环境下的特征值检测算法直接关系到计算机的运算效率。基音周期是语音特征值提取的重要参数之一。针对传统基音检测算法在噪声环境下检测精度低的问题,提出了一种基于自适应提升小波变换加权线性预测误差自相关函数的基音检测算法。该方法用多级提升小波近似系数加权求和的方法来弥补自相关函数随着时间延迟量的增加幅值衰减的缺陷;用线性预测误差自相关函数的方法来抑制共振峰的干扰,然后将两种方法结合来突出基音周期处的峰值。实验结果表明,与传统的自相关函数法和小波加权法相比,该方法能有效减弱共振峰的影响,突出基音周期处的峰值,提高基音周期检测精度,鲁棒性更好。  相似文献   

10.
In intonation research, prominence-lending pitch movements have either been described on a linear or on a logarithmic frequency scale. An experiment has been carried out to check whether pitch movements in speech intonation are perceived on one of these two scales or on a psychoacoustic scale representing the frequency selectivity of the auditory system. This last scale is intermediary between the other two scales. Subjects matched the excursion size of prominence-lending pitch movements in utterances resynthesized in different pitch registers. Their task was to adjust the excursion size in a comparison stimulus in such a way that it lent equal prominence to the corresponding syllable in a fixed test stimulus. The comparison stimulus and the test stimulus had pitches running parallel on either the logarithmic frequency scale, the psychoacoustic scale, or the linear frequency scale. In one-half of the experimental sessions, the test stimulus was presented in the low register, while the comparison stimulus was presented in the high register, and, conversely, for the other half of the sessions. The result is that, in all cases, stimuli are matched in such a way that the average excursion sizes in different registers are equal on the psychoacoustic scale.  相似文献   

11.
The relationship between auditory perception and vocal production has been typically investigated by evaluating the effect of either altered or degraded auditory feedback on speech production in either normal hearing or hearing-impaired individuals. Our goal in the present study was to examine this relationship in individuals with superior auditory abilities. Thirteen professional musicians and thirteen nonmusicians, with no vocal or singing training, participated in this study. For vocal production accuracy, subjects were presented with three tones. They were asked to reproduce the pitch using the vowel /a/. This procedure was repeated three times. The fundamental frequency of each production was measured using an autocorrelation pitch detection algorithm designed for this study. The musicians' superior auditory abilities (compared to the nonmusicians) were established in a frequency discrimination task reported elsewhere. Results indicate that (a) musicians had better vocal production accuracy than nonmusicians (production errors of 1/2 a semitone compared to 1.3 semitones, respectively); (b) frequency discrimination thresholds explain 43% of the variance of the production data, and (c) all subjects with superior frequency discrimination thresholds showed accurate vocal production; the reverse relationship, however, does not hold true. In this study we provide empirical evidence to the importance of auditory feedback on vocal production in listeners with superior auditory skills.  相似文献   

12.
Pitch is the most important auditory perception characteristic of sound with respect to speech intelligibility and music appreciation,and corresponds to a frequency of sound stimulus.However,in some cases,we can perceive virtual pitch,where the corresponding frequency component does not exist in the stimulating sound.This virtual pitch contains a deviation from the de Boer pitch shift formula,which is known as second pitch shift.It has been theoretically suggested that nonlinear dynamics in the cochlea or in the neural network produce a nonlinear resonance with a frequency corresponding to the virtual pitch;however,there is no direct experimental observation to support this theory.The second virtual pitch shift,expressed via basilar membrane nonlinear vibration temporal patterns,and consistent with psychoacoustic experiments,is observed in situ in the cochlea via laser interferometry.  相似文献   

13.
The well-known speech production model is considered, where the speech signal is modeled as the output of an all-pole filter driven either by some white noise sequence (unvoiced speech) or by the sum of a periodic excitation and a noise sequence (voiced speech). Approximate maximum-likelihood (ML) estimation algorithms for the unvoiced case are well known. The ML estimator of the parameters is obtained for the voiced speech model. These parameters consist of the parameters of the periodic excitation (pitch parameters) and the parameters of the filter [linear prediction coefficient (LPC) parameters]. The results of the application of the algorithm on simulated and on real speech data are presented.  相似文献   

14.
朱斯语  姬培锋  杨军 《应用声学》2017,36(6):481-489
为了客观地评价民族乐器与西洋乐器在听觉感知方面的差异,本文利用15种典型的中西方乐器声样本,建立了与音色、响度和音色明亮度有关的15种乐器的感知空间模型,通过这些模型可以预测不同乐器在音高、响度一定时,音色明亮度的感知情况。此外,根据已建立的感知空间模型分别对比弹拨乐器、拉弦乐器和不同类型的吹奏乐器中三种听觉感知属性的变化差异。结果表明,对于中西方典型乐器,音色明亮度随响度的增加而增大,但是响度对音色明亮度的影响程度受到音域和响度范围的影响。民族乐器的音色明亮度随音高的增加而增大,但是西洋乐器的音色明亮度并没有随音高的增加而发生明显的变化。  相似文献   

15.
Perceptual linear predictive (PLP) analysis of speech   总被引:31,自引:0,他引:31  
A new technique for the analysis of speech, the perceptual linear predictive (PLP) technique, is presented and examined. This technique uses three concepts from the psychophysics of hearing to derive an estimate of the auditory spectrum: (1) the critical-band spectral resolution, (2) the equal-loudness curve, and (3) the intensity-loudness power law. The auditory spectrum is then approximated by an autoregressive all-pole model. A 5th-order all-pole model is effective in suppressing speaker-dependent details of the auditory spectrum. In comparison with conventional linear predictive (LP) analysis, PLP analysis is more consistent with human hearing. The effective second formant F2' and the 3.5-Bark spectral-peak integration theories of vowel perception are well accounted for. PLP analysis is computationally efficient and yields a low-dimensional representation of speech. These properties are found to be useful in speaker-independent automatic-speech recognition.  相似文献   

16.
If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources.  相似文献   

17.
王辉  张玲华 《声学学报》2012,37(5):534-538
自适应波束形成算法是数字助听器的核心算法之一。针对自适应波束形成算法中不可避免存在的语音泄漏,本文先对传统GSC结构自适应波束形成算法进行理论研究,并提出一种汉语处理技术,补偿泄漏的语音。这种汉语处理技术利用汉语语音特有的基音频率信息,调整语音幅度谱包络,提高谱包络与基频曲线形状的相似度以提高语音的可懂度。针对泄漏的语音在高频清辅音段有较大损失的特点,在频域上对清辅音进行放大,在不改变共振峰结构的情况下,提高清辅音的能量,同时降低语音间隔段GSC算法泄漏的噪声能量,提高对语音的辨别。仿真实验结果表明,这种汉语语音处理能够补偿自适应波束形成算法造成的语音泄漏,提高语音的可懂度。   相似文献   

18.
戴明扬  徐柏龄 《应用声学》2001,20(6):6-12,44
本文基于人耳听觉模型提出了一种鲁棒性的话者特征参数提取方法。该种方法中,首先由Gamma tone听觉滤波器组和Meddis内耳毛细胞发放模型获得表征听觉神经活动特性的听觉相关图。由听觉神经脉冲发放的锁相特性和双声抑制特性,我们将听觉相关图每个频带中的幅值最大频率分量作为表征当前频带特性的特征参量,于是所有频带的特征参量便构成了表征当前语音段特性的特征矢量;我们采用DCT交换进一步消除各个特征参量之间的相关性,压缩特征矢量的维数。有效性试验表明,该种特征矢量基本上反映了输入语音的谱包络特性;抗噪声性能实验表明,在高斯白噪声和汽车噪声干扰下,这种特征参数比LPCC和MFCC有较小的相对失真;基于矢量量化的文本无关话者辨识表明,对于三种类型的噪声干扰该种特征参数在低信噪比下都获得了较好的识别结果。  相似文献   

19.
马英  于向飞 《应用声学》2010,29(5):387-390
在语音信号分析中,对于基音周期的提取目前已有较多的分析和处理方法,在现有的短时平均幅度差函数(AMDF)的处理方法中,只需要加、减和取绝对值运算,运算量较之短时自相关函数大大下降。同时,AMDF函数的谷点提取基音周期比自相关函数的峰值更加尖锐,错判率相对较少,稳健性更高。然而,传统的AMDF算法对窗长的要求较为严格,窗长较短就会有较大的误差。本文针对该缺陷做出的改进算法,使之无论窗长多大均会有较为准确的结果,大大拓展了AMDF算法的适用空间。进一步,将其与同态处理结合,会有更好的效果。  相似文献   

20.
A method for the analysis of vocal tract parameters is developed, aimed to perform quantitative analysis of rigidity from speech signals of Parkinsonian patients. The cross-sectional area function of the vocal tract is calculated using pitch synchronous autoregressive moving average (ARMA) analysis. The changes in Parkinsonian subjects of the cross-sectional area during the utterance of sustained sounds are attributed to both Parkinsonian tremor and rigidity. In order to isolate the effects of the rigidity on the vocal tract from those of the tremor, an adaptive tremor cancellation (ATC) algorithm is developed, based on the correlation of tremor signals extracted from different locations of the speech production system.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号