首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Monaural speech segregation has proven to be extremely challenging. While efforts in computational auditory scene analysis have led to considerable progress in voiced speech segregation, little attention has been given to unvoiced speech, which lacks harmonic structure and has weaker energy, hence more susceptible to interference. This study proposes a new approach to the problem of segregating unvoiced speech from nonspeech interference. The study first addresses the question of how much speech is unvoiced. The segregation process occurs in two stages: Segmentation and grouping. In segmentation, the proposed model decomposes an input mixture into contiguous time-frequency segments by a multiscale analysis of event onsets and offsets. Grouping of unvoiced segments is based on Bayesian classification of acoustic-phonetic features. The proposed model for unvoiced speech segregation joins an existing model for voiced speech segregation to produce an overall system that can deal with both voiced and unvoiced speech. Systematic evaluation shows that the proposed system extracts a majority of unvoiced speech without including much interference, and it performs substantially better than spectral subtraction.  相似文献   

2.
李国锋  刘莹 《应用声学》1996,15(5):41-44
本文介绍了一种利用复倒谱来实现气声发音重建的方法。首先分析了气声发音的语音特征;进而在复倒谱序列中加入基频特征使其恢复到正常的语音。对元音[a]以及实际语音段进行了处理,均有较好的效果。  相似文献   

3.
利用倒谱方法实现气声发育的重建   总被引:1,自引:0,他引:1       下载免费PDF全文
李国锋  刘莹 《应用声学》1996,15(5):41-44
本文介绍了一种利用复倒谱来实现气声发音重建的方法,首先分析了气声发音的语音特征;进而在复倒谱序列中加入基频率特征其恢复到正常的语音,对元音(a)以及实际语音段进行了处理。均有较好的效果。  相似文献   

4.
Contours of equal loudness and threshold of hearing under binaural free-field conditions for the frequency range 20–15 000 Hz were standardized internationally in 1961. This paper describes an extension of the data in the low-frequency range down to 3·15 Hz, at l levels from threshold to 70 phon. The latter corresponds to nearly 140 dB sound pressure level at the lowest frequency. Direct loudness comparisons were made between tones at intervals of an octave, and the resulting contours were checked by numerical loudness estimation.  相似文献   

5.
Our study is focused on a phenomenon often encountered in flow carrying pipes, since flow instabilities caused by geometric features may generate acoustic signals and, thereafter, interact with these signals in such a way that powerful pure tones are produced. A modern example is found in the so-called ‘singing risers’, or the gas pipes connecting gas production platforms to the transport network. But the flow generated resonance in a fully corrugated circular pipe may be silenced by the addition of relatively low frequency flow oscillations induced by an acoustic generator. Experiments reported here, aimed at investigating in more detail the coupling between the flow in the pipe, the acoustically generated flow oscillations and the emitted resulting noise, are performed in a specifically designed facility. A rectangular transparent channel using glass walls enables us to use optical techniques to describe in detail the flow field in the corrugation vicinity, in addition to more standard hot-wire anemometry and acoustic pressure measurements with microphones, with and without the acoustically generated low-frequency oscillations.  相似文献   

6.
A method is described for the automatic recognition of transient animal sounds. Automatic recognition can be used in wild animal research, including studies of behavior, population, and impact of anthropogenic noise. The method described here, spectrogram correlation, is well-suited to recognition of animal sounds consisting of tones and frequency sweeps. For a sound type of interest, a two-dimensional synthetic kernel is constructed and cross-correlated with a spectrogram of a recording, producing a recognition function--the likelihood at each point in time that the sound type was present. A threshold is applied to this function to obtain discrete detection events, instants at which the sound type of interest was likely to be present. An extension of this method handles the temporal variation commonly present in animal sounds. Spectrogram correlation was compared to three other methods that have been used for automatic call recognition: matched filters, neural networks, and hidden Markov models. The test data set consisted of bowhead whale (Balaena mysticetus) end notes from songs recorded in Alaska in 1986 and 1988. The method had a success rate of about 97.5% on this problem, and the comparison indicated that it could be especially useful for detecting a call type when relatively few (5-200) instances of the call type are known.  相似文献   

7.
Previous non-invasive brain research has reported auditory cortical sensitivity to periodicity as reflected by larger and more anterior responses to periodic than to aperiodic vowels. The current study investigated whether there is a lower fundamental frequency (F0) limit for this effect. Auditory evoked fields (AEFs) elicited by natural-sounding 400 ms periodic and aperiodic vowel stimuli were measured with magnetoencephalography. Vowel F0 ranged from normal male speech (113 Hz) to exceptionally low values (9 Hz). Both the auditory N1m and sustained fields were larger in amplitude for periodic than for aperiodic vowels. The AEF sources for periodic vowels were also anterior to those for the aperiodic vowels. Importantly, the AEF amplitudes and locations were unaffected by the F0 decrement of the periodic vowels. However, the N1m latency increased monotonically as F0 was decreased down to 19 Hz, below which this trend broke down. Also, a cascade of transient N1m-like responses was observed in the lowest F0 condition. Thus, the auditory system seems capable of extracting the periodicity even from very low F0 vowels. The behavior of the N1m latency and the emergence of a response cascade at very low F0 values may reflect the lower limit of pitch perception.  相似文献   

8.
9.
The dynamics of airflow during speech production may often result in some small or large degree of turbulence. In this paper, the geometry of speech turbulence as reflected in the fragmentation of the time signal is quantified by using fractal models. An efficient algorithm for estimating the short-time fractal dimension of speech signals based on multiscale morphological filtering is described, and its potential for speech segmentation and phonetic classification discussed. Also reported are experimental results on using the short-time fractal dimension of speech signals at multiple scales as additional features in an automatic speech-recognition system using hidden Markov models, which provide a modest improvement in speech-recognition performance.  相似文献   

10.
A composite auditory model for processing speech sounds   总被引:2,自引:0,他引:2  
A composite inner-ear model, containing the middle ear, basilar membrane (BM), hair cells, and hair-cell/nerve-fiber synapses, is presented. The model incorporates either a linear-BM stage or a nonlinear one. The model with the nonlinear BM generally shows a high degree of success in reproducing the qualitative aspects of experimentally recorded cat auditory-nerve-fiber responses to speech. In modeling fiber population responses to speech and speech in noise, it was found that the BM nonlinearity allows bands of fibers in the model to synchronize strongly to a common spectral peak in the stimulus. A cross-channel correlation algorithm has been devised to further process the model's population outputs. With output from the nonlinear-BM model, the cross-channel correlation values are appreciably reduced only at those channels whose CFs coincide with the formant frequencies. This observation also holds, to a large extent, for noisy speech.  相似文献   

11.
Pitch detection is an important part of speech recognition and speech processing. In this paper, a pitch detection algorithm based on second generation wavelet transform was developed. The proposed algorithm reduces the computational load of those algorithms that were based on classical wavelet transform. The proposed pitch detection algorithm was tested for both real speech and synthetic speech signal. Some experiments were carried out under noisy environment condition to evaluate the accuracy and robustness of the proposed algorithm. Results showed that the proposed algorithm was robust to noise and provided accurate estimates of the pitch period for both low-pitched and high-pitched speakers. Moreover, different wavelet filters that were obtained using second generation wavelet transform were considered to see the effects of them on the proposed algorithm. It was noticed that Haar filter showed good performance as compared to the other wavelet filters.  相似文献   

12.
Adaptation to the acoustic world following cochlear implantation does not typically include formal training or extensive audiological rehabilitation. Can cochlear implant (CI) users benefit from formal training, and if so, what type of training is best? This study used a pre-/posttest design to evaluate the efficacy of training and generalization of perceptual learning in normal hearing subjects listening to CI simulations (eight-channel sinewave vocoder). Five groups of subjects were trained on words (simple/complex), sentences (meaningful/anomalous), or environmental sounds, and then were tested using an open-set identification task. Subjects were trained on only one set of materials but were tested on all stimuli. All groups showed significant improvement due to training, which successfully generalized to some, but not all stimulus materials. For easier tasks, all types of training generalized equally well. For more difficult tasks, training specificity was observed. Training on speech did not generalize to the recognition of environmental sounds; however, explicit training on environmental sounds successfully generalized to speech. These data demonstrate that the perceptual learning of degraded speech is highly context dependent and the type of training and the specific stimulus materials that a subject experiences during perceptual learning has a substantial impact on generalization to new materials.  相似文献   

13.
14.
There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds.  相似文献   

15.
This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.  相似文献   

16.
The speech understanding of persons with "flat" hearing loss (HI) was compared to a normal-hearing (NH) control group to examine how hearing loss affects the contribution of speech information in various frequency regions. Speech understanding in noise was assessed at multiple low- and high-pass filter cutoff frequencies. Noise levels were chosen to ensure that the noise, rather than quiet thresholds, determined audibility. The performance of HI subjects was compared to a NH group listening at the same signal-to-noise ratio and a comparable presentation level. Although absolute speech scores for the HI group were reduced, performance improvements as the speech and noise bandwidth increased were comparable between groups. These data suggest that the presence of hearing loss results in a uniform, rather than frequency-specific, deficit in the contribution of speech information. Measures of auditory thresholds in noise and speech intelligibility index (SII) calculations were also performed. These data suggest that differences in performance between the HI and NH groups are due primarily to audibility differences between groups. Measures of auditory thresholds in noise showed the "effective masking spectrum" of the noise was greater for the HI than the NH subjects.  相似文献   

17.
18.

Background  

The speech signal contains both information about phonological features such as place of articulation and non-phonological features such as speaker identity. These are different aspects of the 'what'-processing stream (speaker vs. speech content), and here we show that they can be further segregated as they may occur in parallel but within different neural substrates. Subjects listened to two different vowels, each spoken by two different speakers. During one block, they were asked to identify a given vowel irrespectively of the speaker (phonological categorization), while during the other block the speaker had to be identified irrespectively of the vowel (speaker categorization). Auditory evoked fields were recorded using 148-channel magnetoencephalography (MEG), and magnetic source imaging was obtained for 17 subjects.  相似文献   

19.
The intensity of a noise-induced startle response can be reduced by the presentation of an otherwise neutral stimulus immediately before the noise ("prepulse inhibition" or PPI). This effect has been used to study the detection of gaps and other stimuli, but has been applied infrequently to complex stimuli or the ability to discriminate among multiple stimuli. To address both issues and explore the potential of PPI, rats were presented a series of 5 tasks, most contrasting a pair of speech sounds. One of these (the "standard" stimulus) occurred frequently but rarely preceded the startle stimulus. The second occurred infrequently (as an "oddball") and always preceded a noise. In each such task, startle responses were inhibited more by the oddball than by the standard stimulus, usually within the first test. This suggests that PPI can be adapted to studies of the discrimination of speech and other complex sounds, and that this method can provide useful information on subjects' ability to discriminate with greater ease and speed than other methods.  相似文献   

20.
The question of whether visual information can affect ongoing speech production arises from numerous studies demonstrating an interaction between auditory and visual information during speech perception. In a preliminary study, the effect of delayed visual feedback on speech production was examined. Two of the 13 subjects demonstrated speech errors that were directly related to the delayed visual signal. However, in the main experiment, providing immediate visual feedback of the articulators did not diminish the effects of delayed auditory feedback for 11 speakers.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号