期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Adapting to suprasegmental lexical stress errors in foreign-accented speech

E Reinisch A Weber 《The Journal of the Acoustical Society of America》2012,132(2):1165-1176

Can native listeners rapidly adapt to suprasegmental mispronunciations in foreign-accented speech? To address this question, an exposure-test paradigm was used to test whether Dutch listeners can improve their understanding of non-canonical lexical stress in Hungarian-accented Dutch. During exposure, one group of listeners heard a Dutch story with only initially stressed words, whereas another group also heard 28 words with canonical second-syllable stress (e.g., EEKhorn, "squirrel" was replaced by koNIJN "rabbit"; capitals indicate stress). The 28 words, however, were non-canonically marked by the Hungarian speaker with high pitch and amplitude on the initial syllable, both of which are stress cues in Dutch. After exposure, listeners' eye movements were tracked to Dutch target-competitor pairs with segmental overlap but different stress patterns, while they listened to new words from the same Hungarian speaker (e.g., HERsens, herSTEL, "brain," "recovery"). Listeners who had previously heard non-canonically produced words distinguished target-competitor pairs better than listeners who had only been exposed to Hungarian accent with canonical forms of lexical stress. Even a short exposure thus allows listeners to tune into speaker-specific realizations of words' suprasegmental make-up, and use this information for word recognition. 相似文献

2.

Perception of intrusive /r/ in English by native, cross-language and cross-dialect listeners

Tuinman A Mitterer H Cutler A 《The Journal of the Acoustical Society of America》2011,130(3):1643-1652

In sequences such as law and order, speakers of British English often insert /r/ between law and and. Acoustic analyses revealed such "intrusive" /r/ to be significantly shorter than canonical /r/. In a 2AFC experiment, native listeners heard British English sentences in which /r/ duration was manipulated across a word boundary [e.g., saw (r)ice], and orthographic and semantic factors were varied. These listeners responded categorically on the basis of acoustic evidence for /r/ alone, reporting ice after short /r/s, rice after long /r/s; orthographic and semantic factors had no effect. Dutch listeners proficient in English who heard the same materials relied less on durational cues than the native listeners, and were affected by both orthography and semantic bias. American English listeners produced intermediate responses to the same materials, being sensitive to duration (less so than native, more so than Dutch listeners), and to orthography (less so than the Dutch), but insensitive to the semantic manipulation. Listeners from language communities without common use of intrusive /r/ may thus interpret intrusive /r/ as canonical /r/, with a language difference increasing this propensity more than a dialect difference. Native listeners, however, efficiently distinguish intrusive from canonical /r/ by exploiting the relevant acoustic variation. 相似文献

3.

Lexical frequency and acoustic reduction in spoken Dutch

Pluymaekers M Ernestus M Baayen RH 《The Journal of the Acoustical Society of America》2005,118(4):2561-2569

This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes. 相似文献

4.

Processing unattended speech

Rivenez M Darwin CJ Guillaume A 《The Journal of the Acoustical Society of America》2006,119(6):4027-4040

Three experiments examine the effect of a difference in fundamental frequency (F0) range between two simultaneous voices on the processing of unattended speech. Previous experiments have only found evidence for the processing of nominally unattended speech when it has consisted of isolated words which could have attracted the listener's attention. A paradigm recently used by Dupoux et al. [J. Exp. Psychol.: Human Percept. Perform. 29(1), 172-184 (2003)] was modified so that participants had to detect a target word belonging to a specific category presented in a rapid list of words in the attended ear. In the unattended ear, concatenated sentences were presented, some containing a repetition prime presented just before a target word. Primes speeded category detection by 25 ms when the two messages were in a difference F0 range. This priming effect was unaffected by whether the target was led to the left or the right ear, but disappeared when there was no F0 range difference between the messages. Finally, it was replicated when participants were compelled to focus on the attended message in order to perform a second task. The results demonstrate that repetition priming can be produced by words in unattended continuous speech provided that there is a difference in F0 range between the voices. 相似文献

5.

The relative roles of vowels and consonants in discriminating talker identity versus word meaning

Owren MJ Cardillo GC 《The Journal of the Acoustical Society of America》2006,119(3):1727-1739

Three experiments tested the hypothesis that vowels play a disproportionate role in hearing talker identity, while consonants are more important in perceiving word meaning. In each study, listeners heard 128 stimuli consisting of two different words. Stimuli were balanced for same/different meaning, same/different talker, and male/female talker. The first word in each was intact, while the second was either intact (Experiment 1), or had vowels ("Consonants-Only") or consonants wels-Only") replaced by silence (Experiments 2, 3). Different listeners performed a same/ different judgment of either talker identity (Talker) or word meaning (Meaning). Baseline testing in Experiment 1 showed above-chance performance in both, with greater accuracy for Meaning. In Experiment 2, Talker identity was more accurately judged from Vowels-Only stimuli, with modestly better overall Meaning performance with Consonants-Only stimuli. However, performance with vowel-initial Vowels-Only stimuli in particular was most accurate of all. Editing Vowels-Only stimuli further in Experiment 3 had no effect on Talker discrimination, while dramatically reducing accuracy in the Meaning condition, including both vowel-initial and consonant-initial Vowels-Only stimuli. Overall, results confirmed a priori predictions, but are largely inconsistent with recent tests of vowels and consonants in sentence comprehension. These discrepancies and possible implications for the evolutionary origins of speech are discussed. 相似文献

6.

Lexical frequency and voice assimilation

Ernestus M Lahey M Verhees F Baayen RH 《The Journal of the Acoustical Society of America》2006,120(2):1040-1051

Acoustic duration and degree of vowel reduction are known to correlate with a word's frequency of occurrence. The present study broadens the research on the role of frequency in speech production to voice assimilation. The test case was regressive voice assimilation in Dutch. Clusters from a corpus of read speech were more often perceived as unassimilated in lower-frequency words and as either completely voiced (regressive assimilation) or, unexpectedly, as completely voiceless (progressive assimilation) in higher-frequency words. Frequency did not predict the voice classifications over and above important acoustic cues to voicing, suggesting that the frequency effects on the classifications were carried exclusively by the acoustic signal. The duration of the cluster and the period of glottal vibration during the cluster decreased while the duration of the release noises increased with frequency. This indicates that speakers reduce articulatory effort for higher-frequency words, with some acoustic cues signaling more voicing and others less voicing. A higher frequency leads not only to acoustic reduction but also to more assimilation. 相似文献

7.

The role of vowel and consonant fundamental frequency, envelope, and temporal fine structure cues to the intelligibility of words and sentences

Fogerty D Humes LE 《The Journal of the Acoustical Society of America》2012,131(2):1490-1501

The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words. 相似文献

8.

Dutch and English listeners' interpretation of vowel duration

van der Feest SV Swingley D 《The Journal of the Acoustical Society of America》2011,129(3):EL57-EL63

相似文献

9.

Perceiving unstressed vowels in foreign-accented English

Braun B Lemhöfer K Mani N 《The Journal of the Acoustical Society of America》2011,129(1):376-387

This paper investigated how foreign-accented stress cues affect on-line speech comprehension in British speakers of English. While unstressed English vowels are usually reduced to /?/, Dutch speakers of English only slightly centralize them. Speakers of both languages differentiate stress by suprasegmentals (duration and intensity). In a cross-modal priming experiment, English listeners heard sentences ending in monosyllabic prime fragments--produced by either an English or a Dutch speaker of English--and performed lexical decisions on visual targets. Primes were either stress-matching ("ab" excised from absurd), stress-mismatching ("ab" from absence), or unrelated ("pro" from profound) with respect to the target (e.g., ABSURD). Results showed a priming effect for stress-matching primes only when produced by the English speaker, suggesting that vowel quality is a more important cue to word stress than suprasegmental information. Furthermore, for visual targets with word-initial secondary stress that do not require vowel reduction (e.g., CAMPAIGN), resembling the Dutch way of realizing stress, there was a priming effect for both speakers. Hence, our data suggest that Dutch-accented English is not harder to understand in general, but it is in instances where the language-specific implementation of lexical stress differs across languages. 相似文献

10.

Quantifying the intelligibility of speech in noise for non-native talkers

van Wijngaarden SJ Steeneken HJ Houtgast T 《The Journal of the Acoustical Society of America》2002,112(6):3004-3013

The intelligibility of speech pronounced by non-native talkers is generally lower than speech pronounced by native talkers, especially under adverse conditions, such as high levels of background noise. The effect of foreign accent on speech intelligibility was investigated quantitatively through a series of experiments involving voices of 15 talkers, differing in language background, age of second-language (L2) acquisition and experience with the target language (Dutch). Overall speech intelligibility of L2 talkers in noise is predicted with a reasonable accuracy from accent ratings by native listeners, as well as from the self-ratings for proficiency of L2 talkers. For non-native speech, unlike native speech, the intelligibility of short messages (sentences) cannot be fully predicted by phoneme-based intelligibility tests. Although incorrect recognition of specific phonemes certainly occurs as a result of foreign accent, the effect of reduced phoneme recognition on the intelligibility of sentences may range from severe to virtually absent, depending on (for instance) the speech-to-noise ratio. Objective acoustic-phonetic analyses of accented speech were also carried out, but satisfactory overall predictions of speech intelligibility could not be obtained with relatively simple acoustic-phonetic measures. 相似文献

11.

Longitudinal changes in speech recognition in older persons

Dubno JR Lee FS Matthews LJ Ahlstrom JB Horwitz AR Mills JH 《The Journal of the Acoustical Society of America》2008,123(1):462-475

Recognition of isolated monosyllabic words in quiet and recognition of key words in low- and high-context sentences in babble were measured in a large sample of older persons enrolled in a longitudinal study of age-related hearing loss. Repeated measures were obtained yearly or every 2 to 3 years. To control for concurrent changes in pure-tone thresholds and speech levels, speech-recognition scores were adjusted using an importance-weighted speech-audibility metric (AI). Linear-regression slope estimated the rate of change in adjusted speech-recognition scores. Recognition of words in quiet declined significantly faster with age than predicted by declines in speech audibility. As subjects aged, observed scores deviated increasingly from AI-predicted scores, but this effect did not accelerate with age. Rate of decline in word recognition was significantly faster for females than males and for females with high serum progesterone levels, whereas noise history had no effect. Rate of decline did not accelerate with age but increased with degree of hearing loss, suggesting that with more severe injury to the auditory system, impairments to auditory function other than reduced audibility resulted in faster declines in word recognition as subjects aged. Recognition of key words in low- and high-context sentences in babble did not decline significantly with age. 相似文献

12.

Effects of cross-language voice training on speech perception: whose familiar voices are more intelligible?

Levi SV Winters SJ Pisoni DB 《The Journal of the Acoustical Society of America》2011,130(6):4053-4062

Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. 相似文献

13.

Effectiveness of spatial cues, prosody, and talker characteristics in selective attention

Darwin CJ Hukin RW 《The Journal of the Acoustical Society of America》2000,107(2):970-977

The three experiments reported here compare the effectiveness of natural prosodic and vocal-tract size cues at overcoming spatial cues in selective attention. Listeners heard two simultaneous sentences and decided which of two simultaneous target words came from the attended sentence. Experiment 1 used sentences that had natural differences in pitch and in level caused by a change in the location of the main sentence stress. The sentences' pitch contours were moved apart or together in order to separate out effects due to pitch and those due to other prosodic factors such as intensity. Both pitch and the other prosodic factors had an influence on which target word was reported, but the effects were not strong enough to override the spatial difference produced by an interaural time difference of +/- 91 microseconds. In experiment 2, a large (+/- 15%) difference in apparent vocal-tract size between the speakers of the two sentences had an additional and strong effect, which, in conjunction with the original prosodic differences overrode an interaural time difference of +/- 181 microseconds. Experiment 3 showed that vocal-tract size differences of +/- 4% or less had no detectable effect. Overall, the results show that prosodic and vocal-tract size cues can override spatial cues in determining which target word belongs in an attended sentence. 相似文献

14.

Mathematical treatment of context effects in phoneme and word recognition 总被引：2，自引：0，他引：2

A Boothroyd S Nittrouer 《The Journal of the Acoustical Society of America》1988,84(1):101-114

Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates. 相似文献

15.

Vongpaisal T Trehub SE Glenn Schellenberg E van Lieshout P 《The Journal of the Acoustical Society of America》2012,131(1):501-508

Temporal information provided by cochlear implants enables successful speech perception in quiet, but limited spectral information precludes comparable success in voice perception. Talker identification and speech decoding by young hearing children (5-7 yr), older hearing children (10-12 yr), and hearing adults were examined by means of vocoder simulations of cochlear implant processing. In Experiment 1, listeners heard vocoder simulations of sentences from a man, woman, and girl and were required to identify the talker from a closed set. Younger children identified talkers more poorly than older listeners, but all age groups showed similar benefit from increased spectral information. In Experiment 2, children and adults provided verbatim repetition of vocoded sentences from the same talkers. The youngest children had more difficulty than older listeners, but all age groups showed comparable benefit from increasing spectral resolution. At comparable levels of spectral degradation, performance on the open-set task of speech decoding was considerably more accurate than on the closed-set task of talker identification. Hearing children's ability to identify talkers and decode speech from spectrally degraded material sheds light on the difficulty of these domains for child implant users. 相似文献

16.

Effect of age, presentation method, and learning on identification of noise-vocoded words

Sheldon S Pichora-Fuller MK Schneider BA 《The Journal of the Acoustical Society of America》2008,123(1):476-488

Noise vocoding was used to investigate the ability of younger and older adults with normal audiometric thresholds in the speech range to use amplitude envelope cues to identify words. In Experiment 1, four 50-word lists were tested, with each word presented initially with one frequency band and the number of bands being incremented until it was correctly identified by the listener. Both age groups required an average of 5.25 bands for 50% correct word identification and performance improved across the four lists. In Experiment 2, the same participants who completed Experiment 1 identified words in four blocked noise-vocoded conditions (16, 8, 4, 2 bands). Compared to Experiment 1, both age groups required more bands to reach the 50% correct word identification threshold in Experiment 2, 6.13, and 8.55 bands, respectively, with younger adults outperforming older adults. Experiment 3 was identical to Experiment 2 except the participants had no prior experience with noise-vocoded speech. Again, younger adults outperformed older adults, with thresholds of 6.67 and 8.97 bands, respectively. The finding of age effects in Experiments 2 and 3, but not in Experiment 1, seems more likely to be related to differences in the presentation methods than to experience with noise vocoding. 相似文献

17.

The acoustic and perceptual effects of two noise-suppression algorithms

Zakis JA Wise C 《The Journal of the Acoustical Society of America》2007,121(1):433-441

Internal noise generated by hearing-aid circuits can be audible and objectionable to aid users, and may lead to the rejection of hearing aids. Two expansion algorithms were developed to suppress internal noise below a threshold level. The multiple-channel algorithm's expansion thresholds followed the 55-dB SPL long-term average speech spectrum, while the single-channel algorithm suppressed sounds below 45 dBA. With the recommended settings in static conditions, the single-channel algorithm provided lower noise levels, which were perceived as quieter by most normal-hearing participants. However, in dynamic conditions "pumping" noises were more noticeable with the single-channel algorithm. For impaired-hearing listeners fitted with the ADRO amplification strategy, both algorithms maintained speech understanding for words in sentences presented at 55 dB SPL in quiet (99.3% correct). Mean sentence reception thresholds in quiet were 39.4, 40.7, and 41.8 dB SPL without noise suppression, and with the single- and multiple-channel algorithms, respectively. The increase in the sentence reception threshold was statistically significant for the multiple-channel algorithm, but not the single-channel algorithm. Thus, both algorithms suppressed noise without affecting the intelligibility of speech presented at 55 dB SPL, with the single-channel algorithm providing marginally greater noise suppression in static conditions, and the multiple-channel algorithm avoiding pumping noises. 相似文献

18.

Speaker-independent factors affecting the perception of foreign accent in a second language

Levi SV Winters SJ Pisoni DB 《The Journal of the Acoustical Society of America》2007,121(4):2327-2338

Previous research on foreign accent perception has largely focused on speaker-dependent factors such as age of learning and length of residence. Factors that are independent of a speaker's language learning history have also been shown to affect perception of second language speech. The present study examined the effects of two such factors--listening context and lexical frequency--on the perception of foreign-accented speech. Listeners rated foreign accent in two listening contexts: auditory-only, where listeners only heard the target stimuli, and auditory + orthography, where listeners were presented with both an auditory signal and an orthographic display of the target word. Results revealed that higher frequency words were consistently rated as less accented than lower frequency words. The effect of the listening context emerged in two interactions: the auditory + orthography context reduced the effects of lexical frequency, but increased the perceived differences between native and non-native speakers. Acoustic measurements revealed some production differences for words of different levels of lexical frequency, though these differences could not account for all of the observed interactions from the perceptual experiment. These results suggest that factors independent of the speakers' actual speech articulations can influence the perception of degree of foreign accent. 相似文献

19.

Acoustic and linguistic factors in the perception of bandpass-filtered speech

Stickney GS Assmann PF 《The Journal of the Acoustical Society of America》2001,109(3):1157-1165

Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering. 相似文献

20.

Effects of age and hearing loss on the recognition of interrupted words in isolation and in sentences

Kidd GR Humes LE 《The Journal of the Acoustical Society of America》2012,131(2):1434-1448

The ability to recognize spoken words interrupted by silence was investigated with young normal-hearing listeners and older listeners with and without hearing impairment. Target words from the revised SPIN test by Bilger et al. [J. Speech Hear. Res. 27(1), 32-48 (1984)] were presented in isolation and in the original sentence context using a range of interruption patterns in which portions of speech were replaced with silence. The number of auditory "glimpses" of speech and the glimpse proportion (total duration glimpsed/word duration) were varied using a subset of the SPIN target words that ranged in duration from 300 to 600 ms. The words were presented in isolation, in the context of low-predictability (LP) sentences, and in high-predictability (HP) sentences. The glimpse proportion was found to have a strong influence on word recognition, with relatively little influence of the number of glimpses, glimpse duration, or glimpse rate. Although older listeners tended to recognize fewer interrupted words, there was considerable overlap in recognition scores across listener groups in all conditions, and all groups were affected by interruption parameters and context in much the same way. 相似文献