首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Two methods are described for speaker normalizing vowel spectral features: one is a multivariable linear transformation of the features and the other is a polynomial warping of the frequency scale. Both normalization algorithms minimize the mean-square error between the transformed data of each speaker and vowel target values obtained from a "typical speaker." These normalization techniques were evaluated both for formants and a form of cepstral coefficients (DCTCs) as spectral parameters, for both static and dynamic features, and with and without fundamental frequency (F0) as an additional feature. The normalizations were tested with a series of automatic classification experiments for vowels. For all conditions, automatic vowel classification rates increased for speaker-normalized data compared to rates obtained for nonnormalized parameters. Typical classification rates for vowel test data for nonnormalized and normalized features respectively are as follows: static formants--69%/79%; formant trajectories--76%/84%; static DCTCs 75%/84%; DCTC trajectories--84%/91%. The linear transformation methods increased the classification rates slightly more than the polynomial frequency warping. The addition of F0 improved the automatic recognition results for nonnormalized vowel spectral features as much as 5.8%. However, the addition of F0 to speaker-normalized spectral features resulted in much smaller increases in automatic recognition rates.  相似文献   

2.
Five different psychophysical procedures were used to measure level-discrimination (also called intensity discrimination) thresholds for 1-kHz tones at two levels (30 and 90 dB SPL) and two durations (10 and 500 ms). The procedures were the classic transformed up-down staircase method with a two-alternative forced-choice (2AFC) paradigm (UPD), 15- and 50-trial implementations of the method of maximum likelihood (MML) with a cued yes-no paradigm, and 18-trial implementations of ZEST using both cued yes-no and 2AFC paradigms. Results obtained from nine normal listeners show that estimates of level-discrimination thresholds for the four conditions are similar across all five procedures when different points of convergence are accounted for. The variance of threshold estimates within listener and condition was smallest for UPD, largest for the MML with 15 trials, and statistically indistinguishable among the others. The sweat factors ranged from 5.5 for MML with 50 trials to about 1.4 for UPD and ZEST. Simulations show that ideal performance of procedures may be far from real-life experience and that these deviations are likely to depend on complex interactions between listener behavior and parameter choices used for implementing the procedures. Therefore, empirical verification is important for judging the effectiveness of psychophysical procedures.  相似文献   

3.
4.
The present study aimed to examine the size of the acoustic vowel space in talkers who had previously been identified as having slow and fast habitual speaking rates [Tsao, Y.-C. and Weismer, G. (1997) J. Speech Lang. Hear. Res. 40, 858-866]. Within talkers, it is fairly well known that faster speaking rates result in a compression of the vowel space relative to that measured for slower rates, so the current study was completed to determine if the same differences in the size of the vowel space occur across talkers who differ significantly in their habitual speaking rates. Results indicated that there was no difference in the average size of the vowel space for slow vs fast talkers, and no relationship across talkers between vowel duration and formant frequencies. One difference between the slow and fast talkers was in intertalker variability of the vowel spaces, which was clearly greater for the slow talkers, for both speaker sexes. Results are discussed relative to theories of speech production and vowel normalization in speech perception.  相似文献   

5.
Cross-generational and cross-dialectal variation in vowels among speakers of American English was examined in terms of vowel identification by listeners and vowel classification using pattern recognition. Listeners from Western North Carolina and Southeastern Wisconsin identified 12 vowel categories produced by 120 speakers stratified by age (old adults, young adults, and children), gender, and dialect. The vowels /?, o, ?, u/ were well identified by both groups of listeners. The majority of confusions were for the front /i, ?, e, ?, ?/, the low back /ɑ, ?/ and the monophthongal North Carolina /a?/. For selected vowels, generational differences in acoustic vowel characteristics were perceptually salient, suggesting listeners' responsiveness to sound change. Female exemplars and native-dialect variants produced higher identification rates. Linear discriminant analyses which examined dialect and generational classification accuracy showed that sampling the formant pattern at vowel midpoint only is insufficient to separate the vowels. Two sample points near onset and offset provided enough information for successful classification. The models trained on one dialect classified the vowels from the other dialect with much lower accuracy. The results strongly support the importance of dynamic information in accurate classification of cross-generational and cross-dialectal variations.  相似文献   

6.
语音信号元音检测的新方法   总被引:1,自引:0,他引:1  
屈丹  王炳锡 《声学学报》2003,28(1):17-20
给出了语音信号元音检测的新方法。该方法基于语音声学信号的频谱分析,不需要任何学习过程,而且适用于多种语言。利用OGI多语占语音库的英语、汉语、日语、法语四种语音对该算法进行了检测,并给出了改进算法,以及两种算法的检测率。实验结果表明该方法是检测元音的一种有效方法。  相似文献   

7.
Recently, a new method for quantitatively comparing NMR spectra of control and treated samples, in order to examine the possible occurring variations in cell metabolism and/or structure in response to numerous physical, chemical, and biological agents, was proposed. This method is based upon the utilization of the maximum superposition normalization algorithm (MaSNAl) operative in the frequency domain and based upon maximizing, by an opportune sign variable measure, the spectral region in which control and treated spectra are superimposed. Although the frequency-domain MaSNAl algorithm was very precise in normalizing spectra, it showed some limitations in relation to the signal-to-noise ratio and to the degree of diversity of the two spectra being analyzed. In particular, it can rarely be applied to spectra with a small number of visible signals not buried in the noise such as generally in vivo spectra. In this paper, a time-domain normalization algorithm is presented. Specifically, it consists in minimizing the rank of a Hankel matrix constructed with the difference of the two free induction decay signals. The algorithm, denoted MiRaNAl (minimum rank normalization algorithm), was tested by Monte Carlo simulations as well as experimentally by comparing two samples of known contents both with the new algorithms and with an older method using a standard. Finally, the algorithm was applied to real spectra of cell samples showing how it can be used to obtain qualitative and quantitative biological information.  相似文献   

8.
A new approach for fringe normalization by Zernike polynomial fitting to cancel background illumination in an interferogram is proposed. With this method, background illumination can be suppressed, high frequency noise is immunized and the contrast is improved by normalization. The main idea for this paper is to use the Zernike polynomial fitting interferogram to cancel background illumination; the high frequency noise is then filtered by a Wiener filter. Finally, the paper uses the method of local region contrast modulation to enhance fringe contrast. This method can overcome the problem of the non-uniformity illumination of fringe patterns resulting from the marginal reflection of optical components and a non-uniformity light source  相似文献   

9.
There is increasing use of high-resolution NMR spectroscopy to examine variations in cell metabolism and/or structure in response to numerous physical, chemical, and biological agents. In these types of studies, in order to obtain relative quantitative information, a comparison between signal intensities of control samples and treated or exposed ones is often conducted. The methods thus far developed for this purpose are not directly related to the overall intrinsic properties of the samples, but rather to the addition of external substances of known concentrations or to indirect measurement of internal substances. In this paper, a new method for quantitatively comparing the spectra of cell samples is presented. It depends on a normalization algorithm which takes into consideration all cell metabolites present in the sample. In particular, the algorithm is based on maximizing, by an opportune sign variable measure, the spectral region in which the two spectra are superimposed. The algorithm was tested by Monte Carlo simulations as well as experimentally by comparing two samples of known contents with the new method and with an older method using a standard. At the end, the algorithm was applied to real spectra of cell samples to show how it could be used to obtain qualitative and quantitative biological information.  相似文献   

10.
The purpose of this paper is to propose and evaluate a new model of vowel perception which assumes that vowel identity is recognized by a template-matching process involving the comparison of narrow band input spectra with a set of smoothed spectral-shape templates that are learned through ordinary exposure to speech. In the present simulation of this process, the input spectra are computed over a sufficiently long window to resolve individual harmonics of voiced speech. Prior to template creation and pattern matching, the narrow band spectra are amplitude equalized by a spectrum-level normalization process, and the information-bearing spectral peaks are enhanced by a "flooring" procedure that zeroes out spectral values below a threshold function consisting of a center-weighted running average of spectral amplitudes. Templates for each vowel category are created simply by averaging the narrow band spectra of like vowels spoken by a panel of talkers. In the present implementation, separate templates are used for men, women, and children. The pattern matching is implemented with a simple city-block distance measure given by the sum of the channel-by-channel differences between the narrow band input spectrum (level-equalized and floored) and each vowel template. Spectral movement is taken into account by computing the distance measure at several points throughout the course of the vowel. The input spectrum is assigned to the vowel template that results in the smallest difference accumulated over the sequence of spectral slices. The model was evaluated using a large database consisting of 12 vowels in /hVd/ context spoken by 45 men, 48 women, and 46 children. The narrow band model classified vowels in this database with a degree of accuracy (91.4%) approaching that of human listeners.  相似文献   

11.
In psychoacoustic studies there is often a need to assess performance indices quickly and reliably. The aim of this study was to establish a quick and reliable procedure for evaluating thresholds in backward masking and frequency discrimination tasks. Based on simulations, four procedures likely to produce the best results were selected, and data collected from 20 naive adult listeners on each. Each procedure used one of two adaptive methods (staircase or maximum-likelihood estimation, each targeting the 79% correct point on the psychometric function) and two response paradigms (3-interval, 2-alternative forced-choice AXB or 3-interval; 3-alternative forced-choice oddball). All procedures yielded statistically equivalent threshold estimates in both backward masking and frequency discrimination, with a trend to lower thresholds for oddball procedures in frequency discrimination. Oddball procedures were both more efficient and more reliable (test-retest) in backward masking, but all four procedures were equally efficient and reliable in frequency discrimination. Fitted psychometric functions yielded similar thresholds to averaging over reversals in staircase procedures. Learning was observed across threshold-assessment blocks and experimental sessions. In four additional groups, each of ten listeners, trained on the different procedures, no differences in performance improvement or rate of learning were observed, suggesting that learning is independent of procedure.  相似文献   

12.
A geometrical method for computing overlap between vowel distributions, the spectral overlap assessment metric (SOAM), is applied to an investigation of spectral (F1, F2) and temporal (duration) relations in three different types of systems: one claimed to exhibit primary quality (American English), one primary quantity (Jamaican Creole), and one about which no claims have been made (Jamaican English). Shapes, orientations, and proximities of pairs of vowel distributions involved in phonological oppositions are modeled using best-fit ellipses (in F1 x F2 space) and ellipsoids (F1 x F2 x duration). Overlap fractions computed for each pair suggest that spectral and temporal features interact differently in the three varieties and oppositions. Under a two-dimensional analysis, two of three American English oppositions show no overlap; the third shows partial overlap. All Jamaican Creole oppositions exhibit complete overlap when F1 and F2 alone are modeled, but no or partial overlap with incorporation of a factor for duration. Jamaican English three-dimensional overlap fractions resemble two-dimensional results for American English. A multidimensional analysis tool such as SOAM appears to provide a more objective basis for simultaneously investigating spectral and temporal relations within vowel systems. Normalization methods and the SOAM method are described in an extended appendix.  相似文献   

13.
A subspace time-domain algorithm for automated NMR spectral normalization   总被引:2,自引:0,他引:2  
Recently, two methods have been proposed for quantitatively comparing NMR spectra of control and treated samples, in order to examine the possible occurring variations in cell metabolism and/or structure in response to numerous physical, chemical, and biological agents. These methods are the maximum superposition normalization algorithm (MaSNAl) and the minimum rank normalization algorithm (MiRaNAl). In this paper a new subspace-based time-domain normalization algorithm, denoted by SuTdNAl (subspace time-domain normalization algorithm), is presented. By the determination of the intersection of the column spaces of two Hankel matrices, the common signal poles and further on the components having proportionally varying amplitudes are detected. The method has the advantage that it is computationally less intensive than the MaSNAl and the MiRaNAl. Furthermore, no approximate estimate of the normalization factor is required. The algorithm was tested by Monte Carlo simulations on a set of simulation signals. It was shown that the SuTdNAl has a statistical performance similar to that of the MiRaNAl, which itself is an improvement over the MaSNAl. Furthermore, two samples of known contents are compared with the MiRaNAl, the SuTdNAl, and an older method using a standard. Finally, the SuTdNAl is tested on a realistic simulation example derived from an in vitro measurement on cells.  相似文献   

14.
In order to compare the local-energy method with the variation principle as an optimization technique, numerical results are given for the ground states of the hydrogen molecule ion and the helium atom. Although the optimum wave functions obtained from the two methods are very similar, expectation values of various operators are given more accurately by the variation-principle functions. Conditions are discussed under which the local-energy method could be superior to the variation principle as an optimization technique.  相似文献   

15.
A model of the vocal-tract area function is described that consists of four tiers. The first tier is a vowel substrate defined by a system of spatial eigenmodes and a neutral area function determined from MRI-based vocal-tract data. The input parameters to the first tier are coefficient values that, when multiplied by the appropriate eigenmode and added to the neutral area function, construct a desired vowel. The second tier consists of a consonant shaping function defined along the length of the vocal tract that can be used to modify the vowel substrate such that a constriction is formed. Input parameters consist of the location, area, and range of the constriction. Location and area roughly correspond to the standard phonetic specifications of place and degree of constriction, whereas the range defines the amount of vocal-tract length over which the constriction will influence the tract shape. The third tier allows length modifications for articulatory maneuvers such as lip rounding/spreading and larynx lowering/raising. Finally, the fourth tier provides control of the level of acoustic coupling of the vocal tract to the nasal tract. All parameters can be specified either as static or time varying, which allows for multiple levels of coarticulation or coproduction.  相似文献   

16.
Four multiple-channel cochlear implant patients were tested with synthesized versions of the words "hid, head, had, hud, hod, hood" containing 1, 2, or 3 formants, and with a natural 2-formant version of the same words. The formant frequencies were encoded in terms of the positions of electrical stimulation in the cochlea. Loudness, duration, and fundamental frequency were kept fixed within the synthetic stimulus sets. The average recognition scores were 47%, 61%, 62%, and 79% for the synthesized 1-, 2-, and 3-format vowels and the natural vowels, respectively. These scores showed that the place coding of the first and second formant frequencies accounted for a large part of the vowel recognition of cochlear implant patients using these coding schemes. The recognition of the natural stimuli was significantly higher than recognition of the synthetic stimuli, indicating that extra cues such as loudness, duration, and fundamental frequency contributed to recognition of the spoken words.  相似文献   

17.
This article describes the results of two experiments. Experiment 1 was a cross-sectional study designed to explore developmental and cross-linguistic variation in the vowel space of 10- to 18-month-old infants, exposed to either Canadian English or Canadian French. Acoustic parameters of the infant vowel space were described (specifically the mean and standard deviation of the first and second formant frequencies) and then used to derive the grave, acute, compact, and diffuse features of the vowel space across age. A decline in mean F1 with age for French-learning infants and a decline in mean F2 with age for English-learning infants was observed. A developmental expansion of the vowel space into the high-front and high-back regions was also evident. In experiment 2, the Variable Linear Articulatory Model was used to model the infant vowel space taking into consideration vocal tract size and morphology. Two simulations were performed, one with full range of movement for all articulatory paramenters, and the other for movement of jaw and lip parameters only. These simulated vowel spaces were used to aid in the interpretation of the developmental changes and cross-linguistic influences on vowel production in experiment 1.  相似文献   

18.
Within-subject variation of three vocal frequency perturbation indices was compared across multiple sessions. The magnitude of jitter factor (JF), pitch perturbation quotient (PPQ), and directional perturbation quotient (DPF) was measured every other day for 33 consecutive days for ten female and five male normal young adult speakers. Perturbation measures were calculated using a zero-crossing analysis of taped [i] and [u] productions. Pearson product-moment correlations among the three perturbation indices were calculated to examine their relation over time. Coefficients of variation for JF, PPQ, and DPF were considered indicative of the temporal stability of the three measures. JF and PPQ provided redundant information about laryngeal behaviors in steady-state productions. DPF, however, appeared to measure different laryngeal behaviors. Also, JF and PPQ varied considerably within individuals across sessions while DPF was the more temporally stable measure. Multiple sampling sessions and measurement of both the magnitude and direction of period differences are advised for future investigations of vocal frequency perturbation.  相似文献   

19.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

20.
We report a dual-band normalization technique for in vivo quantification of the metabolic biomarker, protoporphyrin IX (PpIX), during brain tumor resection procedures. The accuracy of the approach was optimized in tissue simulating phantoms with varying absorption and scattering properties, validated with fluorimetric assessments on ex vivo brain tissue, and tested on human data acquired in vivo during fluorescence-guided surgery of brain tumors. The results demonstrate that the dual-band normalization technique allows PpIX concentrations to be accurately quantified by correction with reflectance data recorded and integrated within only two narrow wavelength intervals. The simplicity of the method lends itself to the enticing prospect that the method could be applicable to wide-field applications in quantitative fluorescence imaging and dosimetry in photodynamic therapy.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号