首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Some effects of talker variability on spoken word recognition   总被引:2,自引:0,他引:2  
The perceptual consequences of trial-to-trial changes in the voice of the talker on spoken word recognition were examined. The results from a series of experiments using perceptual identification and naming tasks demonstrated that perceptual performance decreases when the voice of the talker changes from trial to trial compared to performance when the voice on each trial remains the same. In addition, the effects of talker variability on word recognition appeared to be more robust and less dependent on task than the effects of word frequency and lexical structure. Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker's voice is intimately related to early perceptual processes that extract acoustic-phonetic information from the speech signal.  相似文献   

2.
In this study, the effect of articulation rate and speaking style on the perceived speech rate is investigated. The articulation rate is measured both in terms of the intended phones, i.e., phones present in the assumed canonical form, and as the number of actual, realized phones per second. The combination of these measures reflects the deletion of phones, which is related to speaking style. The effect of the two rate measures on the perceived speech rate is compared in two listening experiments on the basis of a set of intonation phrases with carefully balanced intended and realized phone rates, selected from a German database of spontaneous speech. Because the balance between input-oriented (effort) and output-oriented (communicative) constraints may be different at fast versus slow speech rates, the effect of articulation rate is compared both for fast and for slow phrases from the database. The effect of the listeners' own speaking habits is also investigated to evaluate if listeners' perception is based on a projection of their own behavior as a speaker. It is shown that listener judgments reflect both the intended and realized phone rates, and that their judgments are independent of the constraint balance and their own speaking habits.  相似文献   

3.
Listeners identified a phonetically balanced set of consonant-vowel-consonant (CVC) words and nonsense syllables in noise at four signal-to-noise ratios. The identification scores for phonemes and syllables were analyzed using the j-factor model [Boothroyd and Nittrouer, J. Acoust. Soc. Am. 84, 101-114 (1988)], which measures the perceptual independence of the parts of a whole. Results indicate that nonsense CVC syllables are perceived as having three independent phonemes, while words show j = 2.34 independent units. Among the words, high-frequency words are perceived as having significantly fewer independent units than low-frequency words. Words with dense phonetic neighborhoods are perceived as having 0.5 more independent units than words with sparse neighborhoods. The neighborhood effect in these data is due almost entirely to density as determined by the initial consonant and vowel, demonstrated in analyses by subjects and items, and correlation analyses of syllable recognition with the neighborhood activation model [Luce and Pisoni, Ear Hear. 19, 1-36 (1998)]. The j factors are interpreted as measuring increased efficiency of the perception of word-final consonants of words in sparse neighborhoods during spoken word recognition.  相似文献   

4.
The relative abilities of word frequency, contextual diversity, and semantic distinctiveness to predict accuracy of spoken word recognition in noise were compared using two data sets. Word frequency is the number of times a word appears in a corpus of text. Contextual diversity is the number of different documents in which the word appears in that corpus. Semantic distinctiveness takes into account the number of different semantic contexts in which the word appears. Semantic distinctiveness and contextual diversity were both able to explain variance above and beyond that explained by word frequency, which by itself explained little unique variance.  相似文献   

5.
To determine if the speaking fundamental frequency (F0) profiles of English and Mandarin differ, a variety of voice samples from male and female speakers were compared. The two languages' F0 profiles were sometimes found to differ, but these differences depended on the particular speech samples being compared. Most notably, the physiological F0 ranges of the speakers, determined from tone sweeps, hardly differed between the two languages, indicating that the English and Mandarin speakers' voices are comparable. Their use of F0 in single-word utterances was, however, quite different, with the Mandarin speakers having higher maximums and means, and larger ranges, even when only the Mandarin high falling tone was compared with English. In contrast, for a prose passage, the two languages were more similar, differing only in the mean F0, Mandarin again being higher. The study thus contributes to the growing literature showing that languages can differ in their F0 profile, but highlights the fact that the choice of speech materials to compare can be critical.  相似文献   

6.
7.
Thresholds (F0DLs) were measured for discrimination of the fundamental frequency (F0) of a group of harmonics (group B) embedded in harmonics with a fixed F0. Miyazono and Moore [(2009). Acoust. Sci. & Tech. 30, 383386] found a large training effect for tones with high harmonics in group B, when the harmonics were added in cosine phase. It is shown here that this effect was due to use of a cue related to pitch pulse asynchrony (PPA). When PPA cues were disrupted by introducing a temporal offset between the envelope peaks of the harmonics in group B and the remaining harmonics, F0DLs increased markedly. Perceptual learning was examined using a training stimulus with cosine-phase harmonics, F0 = 50 Hz, and high harmonics in group B, under conditions where PPA was not useful. Learning occurred, and it transferred to other cosine-phase tones, but not to random-phase tones. A similar experiment with F0 = 100 Hz showed a learning effect which transferred to a cosine-phase tone with mainly high unresolved harmonics, but not to cosine-phase tones with low harmonics, and not to random-phase tones. The learning found here appears to be specific to tones for which F0 discrimination is based on distinct peaks in the temporal envelope.  相似文献   

8.
Nineteen trained soprano singers aged 18–30 years vocalized tasks designed to assess average speaking fundamental frequency (SFF) during spontaneous speaking and reading. Vocal range and perceptual characteristics while singing with low intensity and high frequency were also assessed, and subjects completed a survey of vocal habits/symptoms. Recorded signals were digitized prior to being analyzed for SFF using the Kay Computerized Speech Lab program. Subjects were assigned to a normal voice or impaired voice group based on ratings of perceptual tasks and survey results. Data analysis showed group differences in mean SFF, no differences in vocal range, higher mean SFF values for reading than speaking, and 58% ability to perceive speaking in low pitch. The role of speaking in too low pitch as causal for vocal symptoms and need for voice classification differentiation in vocal performance studies are discussed.  相似文献   

9.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

10.
11.
The differences of speaking frequency and intensity in different tonal dialects has not been widely investigated. The purposes of this study were (1) to compare the speaking frequency and speaking intensity ranges of Mandarin and Min and (2) to compare the speaking frequency and intensity ranges of Mandarin and Min to those of American English. The subjects were 80 normal Taiwanese adults divided into two dialect groups, Mandarin and Min. The speaking F0, the highest speaking F0, the lowest speaking F0, the maximum range of speaking F0, and the intensity counterpart were obtained from reading in their native dialects. Statistical analysis revealed that Min speakers had a significantly greater maximum range of speaking intensity and a smaller lowest speaking intensity than Mandarin speakers, which indicated tonal effects by speakers of the Min dialect. Moreover, Mandarin and Min speakers had a greater maximum range of speaking F0 and maximum range of speaking intensity than American English speakers. The data may provide an assessment tool for Mandarin speakers and Min speakers.  相似文献   

12.
Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process.  相似文献   

13.
Slow amplitude modulation of human voice was approximated by a sinusoidal wave. The theoretical effects of smoothing window size, F0, and modulation frequency on window amplitude average as well as calculated shimmer were mathematically derived. Subsequently, the theoretical predictions were tested using idealized and real voice signals from normal speakers. The theoretical and experimental results suggest that shimmer (when calculated using a smoothing window) is a function of window duration and modulation frequency. Window duration when defined as a constant number of pitch periods varies from speaker to speaker depending on their F0. It may not be desirable to use local smoothing windows with a constant number of cycles for shimmer computation, especially if voices with known low-frequency amplitude modulations but notably different fundamental frequencies are compared.  相似文献   

14.
This article describes experiments carried out in order to gain a deeper understanding of the mechanisms underlying variation of vocal loudness in singers. Ten singers, two of whom are famous professional opera tenor soloists, phonated at different pitches and different loudnesses. Their voice source characteristics were analyzed by inverse filtering the oral airflow signal. It was found that the main physiological variable underlying loudness variation is subglottal pressure (Ps). The voice source property determining most of the loudness variation is the amplitude of the negative peak of the differentiated flow signal, as predicted by previous research. Increases in this amplitude are achieved by (a) increasing the pulse amplitude of the flow waveform; (b) moving the moment of vocal fold contact earlier in time, closer to the center of the pulse; and (c) skewing the pulses. The last mentioned alternative seems dependent on both Ps and the ratio between the fundamental frequency and the first formant. On the average, the singers doubled Ps when they increased fundamental frequency by one octave, and a doubling of the excess Ps over threshold caused the sound pressure level (SPL) to increase by 8–9 dB for neutral phonation, less if mode of phonation was changed to pressed. A shift of mode of phonation from flow over neutral to pressed was associated with a reduction of the peak glottal permittance i.e., the ratio between peak transglottal airflow to Ps. Flow phonation had the most favorable relationship between Ps and SPL.  相似文献   

15.
A method of measuring the rate of change of fundamental frequency has been developed in an effort to find acoustic voice parameters that could be useful in psychiatric research. A minicomputer program was used to extract seven parameters from the fundamental frequency contour of tape-recorded speech samples: (1) the average rate of change of the fundamental frequency and (2) its standard deviation, (3) the absolute rate of fundamental frequency change, (4) the total reading time, (5) the percent pause time of the total reading time, (6) the mean, and (7) the standard deviation of the fundamental frequency distribution. The method is demonstrated on (a) a material consisting of synthetic speech and (b) voice recordings of depressed patients who were examined during depression and after improvement.  相似文献   

16.
The detection of 500- or 2000-Hz pure-tone signals in unmodulated and modulated noise was investigated in normal-hearing and sensorineural hearing-impaired listeners, as a function of noise bandwidth. Square-wave modulation rates of 15 and 40 Hz were used in the modulated noise conditions. A notched noise measure of frequency selectivity and a gap detection measure of temporal resolution were also obtained on each subject. The modulated noise results indicated a masking release that increased as a function of increasing noise bandwidth, and as a function of decreasing modulation rate for both groups of listeners. However, the improvement of threshold with increasing modulated noise bandwidth was often greatly reduced among the sensorineural hearing-impaired listeners. It was hypothesized that the masking release in modulated noise may be due to several types of processes including across-critical band analysis (CMR), within-critical band analysis, and suppression. Within-band effects appeared to be especially large at the higher frequency region and lower modulation rate. In agreement with previous research, there was a significant correlation between frequency selectivity and masking release in modulated noise. At the 500-Hz region, masking release was correlated more highly with the filter skirt and tail measures than with the filter passband measure. At the 2000-Hz region, masking release was correlated more with the filter passband and skirt measures than with the filter tail measure. The correlation between gap detection and masking release was significant at the 40-Hz modulation rate, but not at the 15-Hz modulation rate. The results of this study suggest that masking release in modulated noise is limited by frequency selectivity at low modulation rates, and by both frequency selectivity and temporal resolution at high modulation rates. However, even when the present measures of frequency selectivity and temporal resolution are both taken into account, significant variance in masking release still remains unaccounted for.  相似文献   

17.
Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition.  相似文献   

18.
19.
20.
Proton decay is investigated within the framework of the theory of gravitation and of the concept of neutron oscillation. It is shown that the mechanism of proton decay is very sensitive to the value of the fundamental length.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号