期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Acoustic and perceptual evaluation of Mandarin tone productions before and after perceptual training

Wang Y Jongman A Sereno JA 《The Journal of the Acoustical Society of America》2003,113(2):1033-1043

Training American listeners to perceive Mandarin tones has been shown to be effective, with trainees' identification improving by 21%. Improvement also generalized to new stimuli and new talkers, and was retained when tested six months after training [Y. Wang et al., J. Acoust. Soc. Am. 106, 3649-3658 (1999)]. The present study investigates whether the tone contrasts gained perceptually transferred to production. Before their perception pretest and after their post-test, the trainees were recorded producing a list of Mandarin words. Their productions were first judged by native Mandarin listeners in an identification task. Identification of trainees' post-test tone productions improved by 18% relative to their pretest productions, indicating significant tone production improvement after perceptual training. Acoustic analyses of the pre- and post-training productions further reveal the nature of the improvement, showing that post-training tone contours approximate native norms to a greater degree than pretraining tone contours. Furthermore, pitch height and pitch contour are not mastered in parallel, with the former being more resistant to improvement than the latter. These results are discussed in terms of the relationship between non-native tone perception and production as well as learning at the suprasegmental level. 相似文献

2.

Effects of speaker variability and noise on Mandarin fricative identification by native and non-native listeners

CY Lee Y Zhang X Li L Tao ZS Bond 《The Journal of the Acoustical Society of America》2012,132(2):1130-1140

Speaker variability and noise are two common sources of acoustic variability. The goal of this study was to examine whether these two sources of acoustic variability affected native and non-native perception of Mandarin fricatives to different degrees. Multispeaker Mandarin fricative stimuli were presented to 40 native and 52 non-native listeners in two presentation formats (blocked by speaker and mixed across speakers). The stimuli were also mixed with speech-shaped noise to create five levels of signal-to- noise ratios. The results showed that noise affected non-native identification disproportionately. By contrast, the effect of speaker variability was comparable between the native and non-native listeners. Confusion patterns were interpreted with reference to the results of acoustic analysis, suggesting native and non-native listeners used distinct acoustic cues for fricative identification. It was concluded that not all sources of acoustic variability are treated equally by native and non-native listeners. Whereas noise compromised non-native fricative perception disproportionately, speaker variability did not pose a special challenge to the non-native listeners. 相似文献

3.

Across-talker effects on non-native listeners' vowel perception in noise

Bent T Kewley-Port D Ferguson SH 《The Journal of the Acoustical Society of America》2010,128(5):3142-3151

This study explored how across-talker differences influence non-native vowel perception. American English (AE) and Korean listeners were presented with recordings of 10 AE vowels in /bVd/ context. The stimuli were mixed with noise and presented for identification in a 10-alternative forced-choice task. The two listener groups heard recordings of the vowels produced by 10 talkers at three signal-to-noise ratios. Overall the AE listeners identified the vowels 22% more accurately than the Korean listeners. There was a wide range of identification accuracy scores across talkers for both AE and Korean listeners. At each signal-to-noise ratio, the across-talker intelligibility scores were highly correlated for AE and Korean listeners. Acoustic analysis was conducted for 2 vowel pairs that exhibited variable accuracy across talkers for Korean listeners but high identification accuracy for AE listeners. Results demonstrated that Korean listeners' error patterns for these four vowels were strongly influenced by variability in vowel production that was within the normal range for AE talkers. These results suggest that non-native listeners are strongly influenced by across-talker variability perhaps because of the difficulty they have forming native-like vowel categories. 相似文献

4.

Suprasegmental and segmental timing models in Mandarin Chinese and American English

van Santen JP Shih C 《The Journal of the Acoustical Society of America》2000,107(2):1012-1026

This paper formalizes and tests two key assumptions of the concept of suprasegmental timing: segmental independence and suprasegmental mediation. Segmental independence holds that the duration of a suprasegmental unit such as a syllable or foot is only minimally dependent on its segments. Suprasegmental mediation states that the duration of a segment is determined by the duration of its suprasegmental unit and its identity, but not directly by the specific prosodic context responsible for suprasegmental unit duration. Both assumptions are made by various versions of the isochrony hypothesis [I. Lehiste, J. Phonetics 5, 253-263 (1977)], and by the syllable timing hypothesis [W. Campbell, Speech Commun. 9, 57-62 (1990)]. The validity of these assumptions was studied using the syllable as suprasegmental unit in American English and Mandarin Chinese. To avoid unnatural timing patterns that might be induced when reading carrier phrase material, meaningful, nonrepetitive sentences were used with a wide range of lengths. Segmental independence was tested by measuring how the average duration of a syllable in a fixed prosodic context depends on its segmental composition. A strong association was found; in many cases the increase in average syllabic duration when one segment was substituted for another (e.g., bin versus pin) was the same as the difference in average duration between the two segments (i.e., [b] versus [p]). Thus, the [i] and [n] were not compressed to make room for the longer [p], which is inconsistent with segmental independence. Syllabic mediation was tested by measuring which locations in a syllable are most strongly affected by various contextual factors, including phrasal position, within-word position, tone, and lexical stress. Systematic differences were found between these factors in terms of the intrasyllabic locus of maximal effect. These and earlier results obtained by van Son and van Santen [R. J. J. H van Son and J. P. H. van Santen, "Modeling the interaction between factors affecting consonant duration," Proceedings Eurospeech-97, 1997, pp. 319-322] showing a three-way interaction between consonantal identity (coronals vs labials), within-word position of the syllable, and stress of surrounding vowels, imply that segmental duration cannot be predicted by compressing or elongating segments to fit into a predetermined syllabic time interval. In conclusion, while there is little doubt that suprasegmental units play important predictive and explanatory roles as phonological units, the concept of suprasegmental timing is less promising. 相似文献

5.

Vibrotactile perception of suprasegmental features of speech: a comparison of single-channel and multichannel instruments

A E Carney C R Beachler 《The Journal of the Acoustical Society of America》1986,79(1):131-140

The recognition of three suprasegmental aspects of speech--the number of syllables in a word, the stress pattern of a word, and rising or falling intonation patterns--through a single-channel tactile device and through a 24-channel tactile vocoder, using two groups of normal-hearing subjects, was compared. All subjects received an initial pretest on three recognition tasks, one for each prosodic feature. Half the subjects from each group then received 12 h of training with feedback on the tasks and stimuli used in the pretest. All subjects received a post-test which contained physically different stimuli from those previously tested. Performance was significantly better on the syllable-number and syllabic stress tasks with the single-channel than with the multichannel device on both the pre- and post-tests; no difference was found for the intonation task. Performance on the post-test was poorer for all trained subjects compared to their final training results, suggesting that cues learned in training were not readily transferable to new stimuli, even those with similar prosodic characteristics. Overall, the results provide support for the notion that certain prosodic features of speech may be conveyed more readily when the waveform envelope is preserved. 相似文献

6.

Perception of musical and lexical tones by Taiwanese-speaking musicians

Lee CY Lee YF Shr CL 《The Journal of the Acoustical Society of America》2011,130(1):526-535

This study explored the relationship between music and speech by examining absolute pitch and lexical tone perception. Taiwanese-speaking musicians were asked to identify musical tones without a reference pitch and multispeaker Taiwanese level tones without acoustic cues typically present for speaker normalization. The results showed that a high percentage of the participants (65% with an exact match required and 81% with one-semitone errors allowed) possessed absolute pitch, as measured by the musical tone identification task. A negative correlation was found between occurrence of absolute pitch and age of onset of musical training, suggesting that the acquisition of absolute pitch resembles the acquisition of speech. The participants were able to identify multispeaker Taiwanese level tones with above-chance accuracy, even though the acoustic cues typically present for speaker normalization were not available in the stimuli. No correlations were found between the performance in musical tone identification and the performance in Taiwanese tone identification. Potential reasons for the lack of association between the two tasks are discussed. 相似文献

7.

Voice F0 responses to pitch-shifted voice feedback during English speech

Chen SH Liu H Xu Y Larson CR 《The Journal of the Acoustical Society of America》2007,121(2):1157-1163

Previous studies have demonstrated that motor control of segmental features of speech rely to some extent on sensory feedback. Control of voice fundamental frequency (F0) has been shown to be modulated by perturbations in voice pitch feedback during various phonatory tasks and in Mandarin speech. The present study was designed to determine if voice Fo is modulated in a task-dependent manner during production of suprasegmental features of English speech. English speakers received pitch-modulated voice feedback (+/-50, 100, and 200 cents, 200 ms duration) during a sustained vowel task and a speech task. Response magnitudes during speech (mean 31.5 cents) were larger than during the vowels (mean 21.6 cents), response magnitudes increased as a function of stimulus magnitude during speech but not vowels, and responses to downward pitch-shift stimuli were larger than those to upward stimuli. Response latencies were shorter in speech (mean 122 ms) compared to vowels (mean 154 ms). These findings support previous research suggesting the audio vocal system is involved in the control of suprasegmental features of English speech by correcting for errors between voice pitch feedback and the desired F0. 相似文献

8.

Quantifying the intelligibility of speech in noise for non-native talkers

van Wijngaarden SJ Steeneken HJ Houtgast T 《The Journal of the Acoustical Society of America》2002,112(6):3004-3013

The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures. 相似文献

9.

The interlanguage speech intelligibility benefit 总被引：1，自引：0，他引：1

Bent T Bradlow AR 《The Journal of the Acoustical Society of America》2003,114(3):1600-1610

This study investigated how native language background influences the intelligibility of speech by non-native talkers for non-native listeners from either the same or a different native language background as the talker. Native talkers of Chinese (n = 2), Korean (n = 2), and English (n = 1) were recorded reading simple English sentences. Native listeners of English (n = 21), Chinese (n = 21), Korean (n = 10), and a mixed group from various native language backgrounds (n = 12) then performed a sentence recognition task with the recordings from the five talkers. Results showed that for native English listeners, the native English talker was most intelligible. However, for non-native listeners, speech from a relatively high proficiency non-native talker from the same native language background was as intelligible as speech from a native talker, giving rise to the "matched interlanguage speech intelligibility benefit." Furthermore, this interlanguage intelligibility benefit extended to the situation where the non-native talker and listeners came from different language backgrounds, giving rise to the "mismatched interlanguage speech intelligibility benefit." These findings shed light on the nature of the talker-listener interaction during speech communication. 相似文献

10.

Training English listeners to perceive phonemic length contrasts in Japanese

Tajima K Kato H Rothwell A Akahane-Yamada R Munhall KG 《The Journal of the Acoustical Society of America》2008,123(1):397-413

The present study investigated the extent to which native English listeners' perception of Japanese length contrasts can be modified with perceptual training, and how their performance is affected by factors that influence segment duration, which is a primary correlate of Japanese length contrasts. Listeners were trained in a minimal-pair identification paradigm with feedback, using isolated words contrasting in vowel length, produced at a normal speaking rate. Experiment 1 tested listeners using stimuli varying in speaking rate, presentation context (in isolation versus embedded in carrier sentences), and type of length contrast. Experiment 2 examined whether performance varied by the position of the contrast within the word, and by whether the test talkers were professionally trained or not. Results did not show that trained listeners improved overall performance to a greater extent than untrained control participants. Training improved perception of trained contrast types, generalized to nonprofessional talkers' productions, and improved performance in difficult within-word positions. However, training did not enable listeners to cope with speaking rate variation, and did not generalize to untrained contrast types. These results suggest that perceptual training improves non-native listeners' perception of Japanese length contrasts only to a limited extent. 相似文献

11.

The effect of segmental and suprasegmental corrections on the intelligibility of deaf speech

B Maassen D J Povel 《The Journal of the Acoustical Society of America》1985,78(3):877-886

Three experiments were conducted to study the effect of segmental and suprasegmental corrections on the intelligibility and judged quality of deaf speech. By means of digital signal processing techniques, including LPC analysis, transformations of separate speech sounds, temporal structure, and intonation were carried out on 30 Dutch sentences spoken by ten deaf children. The transformed sentences were tested for intelligibility and acceptability by presenting them to inexperienced listeners. In experiment 1, LPC based reflection coefficients describing segmental characteristics of deaf speakers were replaced by those of hearing speakers. A complete segmental correction caused a dramatic increase in intelligibility from 24% to 72%, which, for a major part, was due to correction of vowels. Experiment 2 revealed that correction of temporal structure and intonation caused only a small improvement from 24% to about 34%. Combination of segmental and suprasegmental corrections yielded almost perfectly understandable sentences, due to a more than additive effect of the two corrections. Quality judgments, collected in experiment 3, were in close agreement with the intelligibility measures. The results show that, in order for these speakers to become more intelligible, improving their articulation is more important than improving their production of temporal structure and intonation. 相似文献

12.

Training humans in non-native phoneme perception using a monkey psychoacoustic procedure

Sinnott JM Gonzales CL Masood AF Ishihara T 《The Journal of the Acoustical Society of America》2007,121(6):3846-3857

Humans were trained to categorize problem non-native phonemes using an animal psychoacoustic procedure that trains monkeys to greater than 90% correct in phoneme identification [Sinnott and Gilmore, Percept. Psychophys. 66, 1341-1350 (2004)]. This procedure uses a manual left versus right response on a lever, a continuously repeated stimulus on each trial, extensive feedback for errors in the form of a repeated correction procedure, and training until asymptotic levels of performance. Here, Japanese listeners categorized the English liquid contrast /r-l/, and English listeners categorized the Middle Eastern dental-retroflex contrast /d-D/. Consonant-vowel stimuli were constructed using four talkers and four vowels. Native listeners and phoneme contrasts familiar to all listeners were included as controls. Responses were analyzed using percent correct, response time, and vowel context effects as measures. All measures indicated nativelike Japanese perception of /r-l/ after 32 daily training sessions, but this was not the case for English perception of /d-D/. Results are related to the concept of "robust" (more easily recovered) versus "fragile" (more easily lost) phonetic contrasts [Burnham, Appl. Psycholing. 7, 207-240 (1986)]. 相似文献

13.

The influence of linguistic and musical experience on Cantonese word learning

Cooper A Wang Y 《The Journal of the Acoustical Society of America》2012,131(6):4756-4769

Adult non-native speech perception is subject to influence from multiple factors, including linguistic and extralinguistic experience such as musical training. The present research examines how linguistic and musical factors influence non-native word identification and lexical tone perception. Groups of native tone language (Thai) and non-tone language listeners (English), each subdivided into musician and non-musician groups, engaged in Cantonese tone word training. Participants learned to identify words minimally distinguished by five Cantonese tones during training, also completing musical aptitude and phonemic tone identification tasks. First, the findings suggest that either musical experience or a tone language background leads to significantly better non-native word learning proficiency, as compared to those with neither musical training nor tone language experience. Moreover, the combination of tone language and musical experience did not provide an additional advantage for Thai musicians above and beyond either experience alone. Musicianship was found to be more advantageous than a tone language background for tone identification. Finally, tone identification and musical aptitude scores were significantly correlated with word learning success for English but not Thai listeners. These findings point to a dynamic influence of musical and linguistic experience, both at the tone dentification level and at the word learning stage. 相似文献

14.

Seeing pitch: visual information for lexical tones of Mandarin-Chinese

Chen TH Massaro DW 《The Journal of the Acoustical Society of America》2008,123(4):2356-2366

Mandarin perceivers were tested in visual lexical-tone identification before and after learning. Baseline performance was only slightly above chance, although there appeared to be some visual information in the speakers' neck and head movements. When participants were taught to use this visible information in two experiments, visual tone identification improved significantly. There appears to be a relationship between the production of lexical tones and the visible movements of the neck, head, and mouth, and this information can be effectively used after a short training session. 相似文献

15.

Evidence against the mismatched interlanguage speech intelligibility benefit hypothesis

Stibbard RM Lee JI 《The Journal of the Acoustical Society of America》2006,120(1):433-442

In a follow-up study to that of Bent and Bradlow (2003), carrier sentences containing familiar keywords were read aloud by five talkers (Korean high proficiency; Korean low proficiency; Saudi Arabian high proficiency; Saudi Arabian low proficiency; native English). The intelligibility of these keywords to 50 listeners in four first language groups (Korean, n = 10; Saudi Arabian, n = 10; native English, n = 10; other mixed first languages, n = 20) was measured in a word recognition test. In each case, the non-native listeners found the non-native low-proficiency talkers who did not share the same first language as the listeners the least intelligible, at statistically significant levels, while not finding the low-proficiency talker who shared their own first language similarly unintelligible. These findings indicate a mismatched interlanguage speech intelligibility detriment for low-proficiency non-native speakers and a potential intelligibility problem between mismatched first language low-proficiency speakers unfamiliar with each others' accents in English. There was no strong evidence to support either an intelligibility benefit for the high-proficiency non-native talkers to the listeners from a different first language background or to indicate that the native talkers were more intelligible than the high-proficiency non-native talkers to any of the listeners. 相似文献

16.

Perception of native and non-native affricate-fricative contrasts: cross-language tests on adults and infants

Tsao FM Liu HM Kuhl PK 《The Journal of the Acoustical Society of America》2006,120(4):2285-2294

Previous studies have shown improved sensitivity to native-language contrasts and reduced sensitivity to non-native phonetic contrasts when comparing 6-8 and 10-12-month-old infants. This developmental pattern is interpreted as reflecting the onset of language-specific processing around the first birthday. However, generalization of this finding is limited by the fact that studies have yielded inconsistent results and that insufficient numbers of phonetic contrasts have been tested developmentally; this is especially true for native-language phonetic contrasts. Three experiments assessed the effects of language experience on affricate-fricative contrasts in a cross-language study of English and Mandarin adults and infants. Experiment 1 showed that English-speaking adults score lower than Mandarin-speaking adults on Mandarin alveolo-palatal affricate-fricative discrimination. Experiment 2 examined developmental change in the discrimination of this contrast in English- and Mandarin-leaning infants between 6 and 12 months of age. The results demonstrated that native-language performance significantly improved with age while performance on the non-native contrast decreased. Experiment 3 replicated the perceptual improvement for a native contrast: 6-8 and 10-12-month-old English-learning infants showed a performance increase at the older age. The results add to our knowledge of the developmental patterns of native and non-native phonetic perception. 相似文献

17.

Training Japanese listeners to identify English /r/ and /l/: a first report 总被引：5，自引：0，他引：5

J S Logan S E Lively D B Pisoni 《The Journal of the Acoustical Society of America》1991,89(2):874-886

Native speakers of Japanese learning English generally have difficulty differentiating the phonemes /r/ and /l/, even after years of experience with English. Previous research that attempted to train Japanese listeners to distinguish this contrast using synthetic stimuli reported little success, especially when transfer to natural tokens containing /r/ and /l/ was tested. In the present study, a different training procedure that emphasized variability among stimulus tokens was used. Japanese subjects were trained in a minimal pair identification paradigm using multiple natural exemplars contrasting /r/ and /l/ from a variety of phonetic environments as stimuli. A pretest-posttest design containing natural tokens was used to assess the effects of training. Results from six subjects showed that the new procedure was more robust than earlier training techniques. Small but reliable differences in performance were obtained between pretest and posttest scores. The results demonstrate the importance of stimulus variability and task-related factors in training nonnative speakers to perceive novel phonetic contrasts that are not distinctive in their native language. 相似文献

18.

Training the perception of Hindi dental and retroflex stops by native speakers of American English and Japanese

Pruitt JS Jenkins JJ Strange W 《The Journal of the Acoustical Society of America》2006,119(3):1684-1696

Perception of second language speech sounds is influenced by one's first language. For example, speakers of American English have difficulty perceiving dental versus retroflex stop consonants in Hindi although English has both dental and retroflex allophones of alveolar stops. Japanese, unlike English, has a contrast similar to Hindi, specifically, the Japanese /d/ versus the flapped /r/ which is sometimes produced as a retroflex. This study compared American and Japanese speakers' identification of the Hindi contrast in CV syllable contexts where C varied in voicing and aspiration. The study then evaluated the participants' increase in identifying the distinction after training with a computer-interactive program. Training sessions progressively increased in difficulty by decreasing the extent of vowel truncation in stimuli and by adding new speakers. Although all participants improved significantly, Japanese participants were more accurate than Americans in distinguishing the contrast on pretest, during training, and on posttest. Transfer was observed to three new consonantal contexts, a new vowel context, and a new speaker's productions. Some abstract aspect of the contrast was apparently learned during training. It is suggested that allophonic experience with dental and retroflex stops may be detrimental to perception of the new contrast. 相似文献

19.

The influence of talker differences on vowel identification by normal-hearing and hearing-impaired listeners.

A K Náb?lek Z Czyzewski L A Krishnan 《The Journal of the Acoustical Society of America》1992,92(3):1228-1246

Vowel identification was tested in quiet, noise, and reverberation with 20 normal-hearing subjects and 20 hearing-impaired subjects. Stimuli were 15 English vowels spoken in a /b-t/context by six male talkers. Each talker produced five tokens of each vowel. In quiet, all stimuli were identified by two judges as the intended targets. The stimuli were degraded by reverberation or speech-spectrum noise. Vowel identification scores depended upon talker, listening condition, and subject type. The relationship between identification errors and spectral details of the vowels is discussed. 相似文献

20.

Talker-identification training using simulations of binaurally combined electric and acoustic hearing: generalization to speech and emotion recognition

Krull V Luo X Iler Kirk K 《The Journal of the Acoustical Society of America》2012,131(4):3069-3078

Understanding speech in background noise, talker identification, and vocal emotion recognition are challenging for cochlear implant (CI) users due to poor spectral resolution and limited pitch cues with the CI. Recent studies have shown that bimodal CI users, that is, those CI users who wear a hearing aid (HA) in their non-implanted ear, receive benefit for understanding speech both in quiet and in noise. This study compared the efficacy of talker-identification training in two groups of young normal-hearing adults, listening to either acoustic simulations of unilateral CI or bimodal (CI+HA) hearing. Training resulted in improved identification of talkers for both groups with better overall performance for simulated bimodal hearing. Generalization of learning to sentence and emotion recognition also was assessed in both subject groups. Sentence recognition in quiet and in noise improved for both groups, no matter if the talkers had been heard during training or not. Generalization to improvements in emotion recognition for two unfamiliar talkers also was noted for both groups with the simulated bimodal-hearing group showing better overall emotion-recognition performance. Improvements in sentence recognition were retained a month after training in both groups. These results have potential implications for aural rehabilitation of conventional and bimodal CI users. 相似文献