首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 406 毫秒
1.
The goal of this study was to measure detection thresholds for 12 isolated American English vowels naturally spoken by three male and three female talkers for young normal-hearing listeners in the presence of a long-term speech-shaped (LTSS) noise, which was presented at 70 dB sound pressure level. The vowel duration was equalized to 170 ms and the spectrum of the LTSS noise was identical to the long-term average spectrum of 12-talker babble. Given the same duration, detection thresholds for vowels differed by 19 dB across the 72 vowels. Thresholds for vowel detection showed a roughly U-shaped pattern as a function of the vowel category across talkers with lowest thresholds at /i/ and /ae/ vowels and highest thresholds at /u/ vowel in general. Both vowel category and talker had a significant effect on vowel detectability. Detection thresholds predicted from three excitation pattern metrics by using a simulation model were well matched with thresholds obtained from human listeners, suggesting that listeners could use a constant metric in the excitation pattern of the vowel to detect the signal in noise independent of the vowel category and talker. Application of the simulation model to predict thresholds of vowel detection in noise was also discussed.  相似文献   

2.
Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments.  相似文献   

3.
Current speech perception models propose that relative perceptual difficulties with non-native segmental contrasts can be predicted from cross-language phonetic similarities. Japanese (J) listeners performed a categorical discrimination task in which nine contrasts (six adjacent height pairs, three front/back pairs) involving eight American (AE) vowels [i?, ?, ε, ??, ɑ?, ?, ?, u?] in /hVb?/ disyllables were tested. The listeners also completed a perceptual assimilation task (categorization as J vowels with category goodness ratings). Perceptual assimilation patterns (quantified as categorization overlap scores) were highly predictive of discrimination accuracy (r(s)=0.93). Results suggested that J listeners used both spectral and temporal information in discriminating vowel contrasts.  相似文献   

4.
Current theories of cross-language speech perception claim that patterns of perceptual assimilation of non-native segments to native categories predict relative difficulties in learning to perceive (and produce) non-native phones. Cross-language spectral similarity of North German (NG) and American English (AE) vowels produced in isolated hVC(a) (di)syllables (study 1) and in hVC syllables embedded in a short sentence (study 2) was determined by discriminant analyses, to examine the extent to which acoustic similarity was predictive of perceptual similarity patterns. The perceptual assimilation of NG vowels to native AE vowel categories by AE listeners with no German language experience was then assessed directly. Both studies showed that acoustic similarity of AE and NG vowels did not always predict perceptual similarity, especially for "new" NG front rounded vowels and for "similar" NG front and back mid and mid-low vowels. Both acoustic and perceptual similarity of NG and AE vowels varied as a function of the prosodic context, although vowel duration differences did not affect perceptual assimilation patterns. When duration and spectral similarity were in conflict, AE listeners assimilated vowels on the basis of spectral similarity in both prosodic contexts.  相似文献   

5.
The interlanguage speech intelligibility benefit   总被引:1,自引:0,他引:1  
This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication.  相似文献   

6.
In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners.  相似文献   

7.
Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels.  相似文献   

8.
Native American English and non-native (Dutch) listeners identified either the consonant or the vowel in all possible American English CV and VC syllables. The syllables were embedded in multispeaker babble at three signal-to-noise ratios (0, 8, and 16 dB). The phoneme identification performance of the non-native listeners was less accurate than that of the native listeners. All listeners were adversely affected by noise. With these isolated syllables, initial segments were harder to identify than final segments. Crucially, the effects of language background and noise did not interact; the performance asymmetry between the native and non-native groups was not significantly different across signal-to-noise ratios. It is concluded that the frequently reported disproportionate difficulty of non-native listening under disadvantageous conditions is not due to a disproportionate increase in phoneme misidentifications.  相似文献   

9.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners.  相似文献   

10.
Cross-language perception studies report influences of speech style and consonantal context on perceived similarity and discrimination of non-native vowels by inexperienced and experienced listeners. Detailed acoustic comparisons of distributions of vowels produced by native speakers of North German (NG), Parisian French (PF) and New York English (AE) in citation (di)syllables and in sentences (surrounded by labial and alveolar stops) are reported here. Results of within- and cross-language discriminant analyses reveal striking dissimilarities across languages in the spectral/temporal variation of coarticulated vowels. As expected, vocalic duration was most important in differentiating NG vowels; it did not contribute to PF vowel classification. Spectrally, NG long vowels showed little coarticulatory change, but back/low short vowels were fronted/raised in alveolar context. PF vowels showed greater coarticulatory effects overall; back and front rounded vowels were fronted, low and mid-low vowels were raised in both sentence contexts. AE mid to high back vowels were extremely fronted in alveolar contexts, with little change in mid-low and low long vowels. Cross-language discriminant analyses revealed varying patterns of spectral (dis)similarity across speech styles and consonantal contexts that could, in part, account for AE listeners' perception of German and French front rounded vowels, and "similar" mid-high to mid-low vowels.  相似文献   

11.
Cross-generational and cross-dialectal variation in vowels among speakers of American English was examined in terms of vowel identification by listeners and vowel classification using pattern recognition. Listeners from Western North Carolina and Southeastern Wisconsin identified 12 vowel categories produced by 120 speakers stratified by age (old adults, young adults, and children), gender, and dialect. The vowels /?, o, ?, u/ were well identified by both groups of listeners. The majority of confusions were for the front /i, ?, e, ?, ?/, the low back /ɑ, ?/ and the monophthongal North Carolina /a?/. For selected vowels, generational differences in acoustic vowel characteristics were perceptually salient, suggesting listeners' responsiveness to sound change. Female exemplars and native-dialect variants produced higher identification rates. Linear discriminant analyses which examined dialect and generational classification accuracy showed that sampling the formant pattern at vowel midpoint only is insufficient to separate the vowels. Two sample points near onset and offset provided enough information for successful classification. The models trained on one dialect classified the vowels from the other dialect with much lower accuracy. The results strongly support the importance of dynamic information in accurate classification of cross-generational and cross-dialectal variations.  相似文献   

12.
The intelligibility of speech is sustained at lower signal-to-noise ratios when the speech has a different interaural configuration from the noise. This paper argues that the advantage arises in part because listeners combine evidence of the spectrum of speech in the across-frequency profile of interaural decorrelation with evidence in the across-frequency profile of intensity. To support the argument, three experiments examined the ability of listeners to integrate and segregate evidence of vowel formants in these two profiles. In experiment 1, listeners achieved accurate identification of the members of a small set of vowels whose first formant was defined by a peak in one profile and whose second formant was defined by a peak in the other profile. This result demonstrates that integration is possible. Experiment 2 demonstrated that integration is not mandatory, insofar as listeners could report the identity of a vowel defined entirely in one profile despite the presence of a competing vowel in the other profile. The presence of the competing vowel reduced accuracy of identification, however, showing that segregation was incomplete. Experiment 3 demonstrated that segregation of the binaural vowel, in particular, can be increased by the introduction of an onset asynchrony between the competing vowels. The results of experiments 2 and 3 show that the intrinsic cues for segregation of the profiles are relatively weak. Overall, the results are compatible with the argument that listeners can integrate evidence of spectral peaks from the two profiles.  相似文献   

13.
Neural-population interactions resulting from excitation overlap in multi-channel cochlear implants (CI) may cause blurring of the "internal" auditory representation of complex sounds such as vowels. In experiment I, confusion matrices for eight German steady-state vowellike signals were obtained from seven CI listeners. Identification performance ranged between 42% and 74% correct. On the basis of an information transmission analysis across all vowels, pairs of most and least frequently confused vowels were selected for each subject. In experiment II, vowel masking patterns (VMPs) were obtained using the previously selected vowels as maskers. The VMPs were found to resemble the "electrical" vowel spectra to a large extent, indicating a relatively weak effect of neural-population interactions. Correlation between vowel identification data and VMP spectral similarity, measured by means of several spectral distance metrics, showed that the CI listeners identified the vowels based on differences in the between-peak spectral information as well as the location of spectral peaks. The effect of nonlinear amplitude mapping of acoustic into "electrical" vowels, as performed in the implant processors, was evaluated separately and compared to the effect of neural-population interactions. Amplitude mapping was found to cause more blurring than neural-population interactions. Subjects exhibiting strong blurring effects yielded lower overall vowel identification scores.  相似文献   

14.
Many studies have noted great variability in speech perception ability among postlingually deafened adults with cochlear implants. This study examined phoneme misperceptions for 30 cochlear implant listeners using either the Nucleus-22 or Clarion version 1.2 device to examine whether listeners with better overall speech perception differed qualitatively from poorer listeners in their perception of vowel and consonant features. In the first analysis, simple regressions were used to predict the mean percent-correct scores for consonants and vowels for the better group of listeners from those of the poorer group. A strong relationship between the two groups was found for consonant identification, and a weak, nonsignificant relationship was found for vowel identification. In the second analysis, it was found that less information was transmitted for consonant and vowel features to the poorer listeners than to the better listeners; however, the pattern of information transmission was similar across groups. Taken together, results suggest that the performance difference between the two groups is primarily quantitative. The results underscore the importance of examining individuals' perception of individual phoneme features when attempting to relate speech perception to other predictor variables.  相似文献   

15.
Humans were trained to categorize problem non-native phonemes using an animal psychoacoustic procedure that trains monkeys to greater than 90% correct in phoneme identification [Sinnott and Gilmore, Percept. Psychophys. 66, 1341-1350 (2004)]. This procedure uses a manual left versus right response on a lever, a continuously repeated stimulus on each trial, extensive feedback for errors in the form of a repeated correction procedure, and training until asymptotic levels of performance. Here, Japanese listeners categorized the English liquid contrast /r-l/, and English listeners categorized the Middle Eastern dental-retroflex contrast /d-D/. Consonant-vowel stimuli were constructed using four talkers and four vowels. Native listeners and phoneme contrasts familiar to all listeners were included as controls. Responses were analyzed using percent correct, response time, and vowel context effects as measures. All measures indicated nativelike Japanese perception of /r-l/ after 32 daily training sessions, but this was not the case for English perception of /d-D/. Results are related to the concept of "robust" (more easily recovered) versus "fragile" (more easily lost) phonetic contrasts [Burnham, Appl. Psycholing. 7, 207-240 (1986)].  相似文献   

16.
17.
An evaluation of vowel normalization procedures for the purpose of studying language variation is presented. The procedures were compared on how effectively they (a) preserve phonemic information, (b) preserve information about the talker's regional background (or sociolinguistic information), and (c) minimize anatomical/physiological variation in acoustic representations of vowels. Recordings were made for 80 female talkers and 80 male talkers of Dutch. These talkers were stratified according to their gender and regional background. The normalization procedures were applied to measurements of the fundamental frequency and the first three formant frequencies for a large set of vowel tokens. The normalization procedures were evaluated through statistical pattern analysis. The results show that normalization procedures that use information across multiple vowels ("vowel-extrinsic" information) to normalize a single vowel token performed better than those that include only information contained in the vowel token itself ("vowel-intrinsic" information). Furthermore, the results show that normalization procedures that operate on individual formants performed better than those that use information across multiple formants (e.g., "formant-extrinsic" F2-F1).  相似文献   

18.
For each of five vowels [i e a o u] following [t], a continuum from non-nasal to nasal was synthesized. Nasalization was introduced by inserting a pole-zero pair in the vicinity of the first formant in an all-pole transfer function. The frequencies and spacing of the pole and zero were systematically varied to change the degree of nasalization. The selection of stimulus parameters was determined from acoustic theory and the results of pilot experiments. The stimuli were presented for identification and discrimination to listeners whose language included a non-nasal--nasal vowel opposition (Gujarati, Hindi, and Bengali) and to American listeners. There were no significant differences between language groups in the 50% crossover points of the identification functions. Some vowels were more influenced by range and context effects than were others. The language groups showed some differences in the shape of the discrimination functions for some vowels. On the basis of the results, it is postulated that (1) there is a basic acoustic property of nasality, independent of the vowel, to which the auditory system responds in a distinctive way regardless of language background; and (2) there are one or more additional acoustic properties that may be used to various degrees in different languages to enhance the contrast between a nasal vowel and its non-nasal congener. A proposed candidate for the basic acoustic property is a measure of the degree of prominence of the spectral peak in the vicinity of the first formant. Additional secondary properties include shifts in the center of gravity of the low-frequency spectral prominence, leading to a change in perceived vowel height, and changes in overall spectral balance.  相似文献   

19.
The amount of acoustic information that native and non-native listeners need for syllable identification was investigated by comparing the performance of monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Duration-preserved silent-center syllables retaining 10, 20, 30, or 40 ms of the consonant-vowel and vowel-consonant transitions were created for the target vowels /i, I, eI, epsilon, ae/ and /a/, spoken by two males in /bVb/ context. Duration-neutral syllables were created by editing the silent portion to equate the duration of all vowels. Listeners identified the syllables in a six-alternative forced-choice task. The earlier learners identified the whole-word and 40 ms duration-preserved syllables as accurately as the monolingual listeners, but identified the silent-center syllables significantly less accurately overall. Only the monolingual listener group identified syllables significantly more accurately in the duration-preserved than in the duration-neutral condition, suggesting that the non-native listeners were unable to recover from the syllable disruption sufficiently to access the duration cues in the silent-center syllables. This effect was most pronounced for the later learners, who also showed the most vowel confusions and the greatest decrease in performance from the whole word to the 40 ms transition condition.  相似文献   

20.
People vary in the intelligibility of their speech. This study investigated whether across-talker intelligibility differences observed in normally-hearing listeners are also found in cochlear implant (CI) users. Speech perception for male, female, and child pairs of talkers differing in intelligibility was assessed with actual and simulated CI processing and in normal hearing. While overall speech recognition was, as expected, poorer for CI users, differences in intelligibility across talkers were consistent across all listener groups. This suggests that the primary determinants of intelligibility differences are preserved in the CI-processed signal, though no single critical acoustic property could be identified.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号