首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Automatic detection and classification of short and nonstationary events in noisy signals is widely considered to be a difficult task for traditional frequency domain and even time–frequency domain approaches. A novel method for audio signal classification is introduced. It is based on statistical properties of the temporal fine structure of audio events. Artificially generated random signals and unvoiced stop consonants of speech are used to evaluate the method. The results show improved recognition accuracy in comparison to traditional approaches.  相似文献   

2.
The speech production skills of 12 dysphasic children and of 12 normal children were compared. The dysphasic children were found to have significantly greater difficulty than the normal children in producing stop consonants. In addition, it was found that seven of the dysphasic children, who had difficulty in perceiving initial stop consonants, had greater difficulty in producing stop consonants than the remaining five dysphasic children who showed no such perceptual difficulty. A detailed phonetic analysis indicated that the dysphasic children seldom omitted stops or substituted nonstop for stop consonants. Instead, their errors were predominantly of voicing or place of articulation. Acoustic analyses suggested that the voicing errors were related to lack of precise control over the timing of speech events, specifically, voice onset time for initial stops and vowel duration preceding final stops. The number of voicing errors on final stops, however, was greater than expected on the basis of lack of differentiation of vowel duration alone. They appeared also to be related to a tendency in the dysphasic children to produce final stops with exaggerated aspiration. The possible relationship of poor timing control in speech production in these children and auditory temporal processing deficits in speech perception is discussed.  相似文献   

3.
Knowledge-based speech recognition systems extract acoustic cues from the signal to identify speech characteristics. For channel-deteriorated telephone speech, acoustic cues, especially those for stop consonant place, are expected to be degraded or absent. To investigate the use of knowledge-based methods in degraded environments, feature extrapolation of acoustic-phonetic features based on Gaussian mixture models is examined. This process is applied to a stop place detection module that uses burst release and vowel onset cues for consonant-vowel tokens of English. Results show that classification performance is enhanced in telephone channel-degraded speech, with extrapolated acoustic-phonetic features reaching or exceeding performance using estimated Mel-frequency cepstral coefficients (MFCCs). Results also show acoustic-phonetic features may be combined with MFCCs for best performance, suggesting these features provide information complementary to MFCCs.  相似文献   

4.
5.
A set of phonetic studies based on analysis of the TIMIT speech database is presented. Using a database methodological approach, these studies detail new results in speaker-dependent variation due to sex and dialect region of the talker including effects on stop release frequency, speaking rate, vowel reduction, flapping, and the use of glottal stop. TIMIT was found to be fertile ground for gathering acoustic-phonetic knowledge having relevance to the phonetic classification and recognition goals for which TIMIT was designed, as well as to the linguist attempting to describe regularity and variability in the pronunciation of read English speech.  相似文献   

6.
The paper considers the inverse problem of finding the shape of a voice-source pulse from a specified segment of a speech signal using a special mathematical model that relates these quantities. A variational method for solving the formulated inverse problem for two new parametric classes of sources is proposed: a piecewise-linear source and an A-source. The error in the obtained approximate solutions of the inverse problem is considered, and a technique to numerically estimate this error is proposed, which is based on the theory of a posteriori estimates of the accuracy in solving ill-posed problems. A computer study of the adequacy of the proposed models of sources, and a study of the a posteriori estimates of the accuracy in solving inverse problems for such sources were performed using various types of voice signals. Numerical experiments for speech signals showed satisfactory properties of such a posteriori estimates, which represent the upper bounds of possible errors in solving the inverse problem. The estimate of the most probable error in determining the source-pulse shapes for the investigated speech material is on average ~7%. It is noted that the a posteriori accuracy estimates can be used as a criterion for the quality of determining the voice-source pulse shape in the speaker-identification problem.  相似文献   

7.
A method of identifying low-energy nuclei from the readings of an ionizational mass spectrometer is considered. The identification principles for particles recorded by a multilayer detector are discussed, together with the fundamentals of a probabilistic approach to the solution of this problem. The method permits correct analysis of the experimental data obtained. The determination of the charge and isotopic composition of radiation consisting of a mixture of different nuclei is considered. The results of calculations by the given method which permit the determination of the optimal detector parameters for the solution of the specific physical problem are outlined. This approach is also applicable for the analysis of data from other types of measuring apparatus.Translated from Izvestiya Vysshikh Uchebnykh Zavedenii, Fizika, No. 7, pp. 84–88, July, 1991.  相似文献   

8.
A new method to code the speech envelope in continuous interleaved sampling (CIS) processors for cochlear implants is proposed. In this enhanced envelope, the rapid adaptation seen in the response of auditory nerves to sound stimuli is incorporated. Two strategies, one using the standard envelope (CIS) and one using the enhanced envelope (EECIS), were tested perceptually with six postlingually deafened users of the LAURA cochlear implant. The tests included identification of stop consonants in three different vowel contexts and monosyllabic consonant-vowel-consonant (CVC) words. Significant improvements in correct identification scores were observed for stop consonants in intervocalic /a/ context (p = 0.026): average results varied from 46% correct for CIS to 55% for EECIS. This improvement was mainly due to the better transmission of place of articulation. The differences in identification scores for stop consonants in /i/ and /u/ context were not significant. The identification scores for the medial vowels of the CVC words were significantly higher when the EECIS strategy was used: average results increased from 39% correct to 46% correct (p = 0.018). No significant differences were observed between the results for initial and final consonants of the CVC words. The present results demonstrate that the inclusion of the rapid adaptation in the speech processing for cochlear implants can improve speech intelligibility.  相似文献   

9.
《Infrared physics》1987,27(4):267-273
This paper discusses the application of a signal processing technique known as Adaptive Noise Cancellation to the problem of reducing noise levels at the output of a pyroelectric detector. The detection system is considered in relation to self-modulating sources so that the input signals can be considered as pseudo-periodic. Two forms of adaptive processing, namely the Adaptive Line Enhancer and spike sequence input, are compared and contrasted, both methods are shown to improve signal-to-noise and thus increase detector performance.  相似文献   

10.
To investigate possible auditory factors in the perception of stops and glides (e.g., /b/ vs /w/), a two-category labeling performance was compared on several series of /ba/-/wa/ stimuli and on corresponding nonspeech stimulus series that modeled the first-formant trajectories and amplitude rise times of the speech items. In most respects, performance on the speech and nonspeech stimuli was closely parallel. Transition duration proved to be an effective cue for both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets, and the category boundaries along the transition-duration dimension did not differ significantly in the two cases. When the stop/glide distinction was signaled by variation in transition duration, there was a reliable stimulus-length effect: A longer vowel shifted the category boundary toward greater transition durations. A similar effect was observed for the corresponding nonspeech stimuli. Variation in rise time had only a small effect in signaling both the stop/glide distinction and the nonspeech distinction between abrupt and gradual onsets. There was, however, one discrepancy between the speech and nonspeech performance. When the stop/glide distinction was cued by rise-time variation, there was a stimulus-length effect, but no such effect occurred for the corresponding nonspeech stimuli. On balance, the results suggest that there are significant auditory commonalities between the perception of stops and glides and the perception of acoustically analogous nonspeech stimuli.  相似文献   

11.
A filterbank-based algorithm for time-varying spectral analysis is proposed. The algorithm, which is an enhanced realization of the conventional spectrogram, consists of hundreds or thousands of highly overlapping wideband filter/detector stages, followed by a peak detector that probes the filter/detector outputs at very short time intervals. Analysis with synthetic modulated signals illustrates how the proposed method demodulates these signals. The resulting spectrogram-like display, referred to as a "fine structure spectrogram," shows the fine structure of the modulations in substantially higher detail than is possible with conventional spectrograms. Error evaluation is performed as a function of various parameters of a single- and two-component synthetic modulated signal, and of parameters of the analysis system. In speech, the fine structure spectrogram can detect small frequency and amplitude modulations in the formants. It also appears to identify additional significant time-frequency components in speech that are not detected by other methods, making it potentially useful in speech processing applications.  相似文献   

12.
We have performed a search for scalar top quark (stop) pair production in the inclusive electron-muon-missing transverse energy final state, using a sample of pp events corresponding to 108.3 pb(-1) of data collected with the D0 detector at Fermilab. The search is done in the framework of the minimal supersymmetric standard model assuming that the sneutrino is the lightest supersymmetric particle. For the dominant decays of the lightest stop, t-->b chi+1 and t-->blnu, no evidence for signal is found. We derive cross-section limits as a function of stop ( t ), chargino ( chi+1), and sneutrino ( nu) masses.  相似文献   

13.
This article reports on investigations of the relative roles of simultaneous and nonsimultaneous masking on detection thresholds using natural speech utterances. Thresholds were obtained for 15-ms probe tones placed in the stop or flap closures of /ada/ and /idi/. Threshold elevations due to simultaneous and nonsimultaneous masking could be explained by the dynamics of neighboring speech spectra. Nonsimultaneous effects were related to spectra at least 30 ms around the probe tone. Although simultaneous masking is usually stronger than nonsimultaneous masking, the relative amplitude of adjacent speech segments in natural speech is sufficiently high near formant regions to cause noticeable effects of nonsimultaneous masking.  相似文献   

14.
大型柴油机曲轴箱内的润滑油油雾浓度过高,有可能引起爆炸事故。为了实现对油雾浓度的准确、实时监控,研制了一种基于单光路光散射法的油雾探测器。介绍了油雾探测器光机电系统的设计原理、温度补偿措施和信号处理技术。探测器响应时间≤1s,引用误差≤10%,具有相应的声光报警、减速及停机等功能。实验结果表明,该探测器结构简单,性能可靠。  相似文献   

15.
The results of investigation of optical image detectors designed for the largest problem, near-VUV, range of the spectrum are presented. The possibility of using a dual-stage image detection system to appreciably lower the sensitivity threshold and make computer data processing feasible is considered. The integration of a UV module into a wideband image detector is studied.  相似文献   

16.
C. Milsténe  A. Sopczak 《Pramana》2007,69(5):921-926
A vertex detector concept of the linear collider flavour identification (LCFI) collaboration, which studies pixel detectors for heavy quark flavour identification, has been implemented in simulations for c-quark tagging in scalar top studies. The production and decay of scalar top quarks (stops) is particularly interesting for the development of the vertex detector as only two c-quarks and missing energy (from undetected neutralinos) are produced for light stops. Previous studies investigated the vertex detector design in scenarios with large mass differences between stop and neutralino, corresponding to large visible energy in the detector. In this study we investigate the tagging performance dependence on the vertex detector design in a scenario with small visible energy for the international linear collider (ILC).   相似文献   

17.
Considerable advances in automatic speech recognition have been made in the last decades, thanks specially to the use of hidden Markov models. In the field of speech signal analysis, different techniques have been developed. However, deterioration in the performance of the speech recognizers has been observed when they are trained with clean signal and tested with noisy signals. This is still an open problem in this field. Continuous multiresolution entropy has been shown to be robust to additive noise in applications to different physiological signals. In previous works we have included Shannon and Tsallis entropies, and their corresponding divergences, in different speech analysis and recognition systems. In this paper we present an extension of the continuous multiresolution entropy to different divergences and we propose them as new dimensions for the pre-processing stage of a speech recognition system. This approach takes into account information about changes in the dynamics of speech signal at different scales. The methods proposed here are tested with speech signals corrupted with babble and white noise. Their performance is compared with classical mel cepstral parametrization. The results suggest that these continuous multiresolution entropy related measures provide valuable information to the speech recognition system and that they could be considered to be included as an extra component in the pre-processing stage.  相似文献   

18.
The problem of AFM cantilever calibration from a thermal noise spectrum is considered. A large volume of preliminary work is required to use this method, along with consideration of numerous factors. Use of a synchronous detector allowed us to reduce the requirements for the measuring system and increase the accuracy of spring constant measurement. The respective experimental results are presented.  相似文献   

19.
Sentences spoken "clearly" are significantly more intelligible than those spoken "conversationally" for hearing-impaired listeners in a variety of backgrounds [Picheny et al., J. Speech Hear. Res. 28, 96-103 (1985); Uchanski et al., ibid. 39, 494-509 (1996); Payton et al., J. Acoust. Soc. Am. 95, 1581-1592 (1994)]. While producing clear speech, however, talkers often reduce their speaking rate significantly [Picheny et al., J. Speech Hear. Res. 29, 434-446 (1986); Uchanski et al., ibid. 39, 494-509 (1996)]. Yet speaking slowly is not solely responsible for the intelligibility benefit of clear speech (over conversational speech), since a recent study [Krause and Braida, J. Acoust. Soc. Am. 112, 2165-2172 (2002)] showed that talkers can produce clear speech at normal rates with training. This finding suggests that clear speech has inherent acoustic properties, independent of rate, that contribute to improved intelligibility. Identifying these acoustic properties could lead to improved signal processing schemes for hearing aids. To gain insight into these acoustical properties, conversational and clear speech produced at normal speaking rates were analyzed at three levels of detail (global, phonological, and phonetic). Although results suggest that talkers may have employed different strategies to achieve clear speech at normal rates, two global-level properties were identified that appear likely to be linked to the improvements in intelligibility provided by clear/normal speech: increased energy in the 1000-3000-Hz range of long-term spectra and increased modulation depth of low frequency modulations of the intensity envelope. Other phonological and phonetic differences associated with clear/normal speech include changes in (1) frequency of stop burst releases, (2) VOT of word-initial voiceless stop consonants, and (3) short-term vowel spectra.  相似文献   

20.
This paper examines whether correlations between speech perception and speech production exist, and, if so, whether they might provide a way of evaluating different acoustic metrics. The cues listeners use for many phonemic distinctions are not known, often because many different acoustic cues are highly correlated with one another, making it difficult to distinguish among them. Perception-production correlations may provide a new means of doing so. In the present paper, correlations were examined between acoustic measures taken on listeners' perceptual prototypes for a given speech category and on their average production of members of that category. Significant correlations were found for VOT among stop consonants, and for spectral peaks (but not centroids or skewness) for voiceless fricatives. These results suggest that correlations between speech perception and production may provide a methodology for evaluating different proposed acoustic metrics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号