期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Using monkeys to explore perceptual "loss" versus "learning" models in English and Spanish voice-onset-time perception

Sinnott JM Powell LA Camchong J 《The Journal of the Acoustical Society of America》2006,119(3):1585-1596

Cross-language research using synthetic voice-onset-time series over the past 40 years suggests that the Spanish lead contrast is acoustically less salient than the English lag contrast. This study examined monkey identification of a labial consonant-vowel voice-onset-time (CV VOT) series (-60 to +70 ms) in order to obtain a linguistically unbiased estimate of lead versus lag salience. Comparisons were made with both English and Spanish adult human listeners. In a classic two-choice identification test, monkey and Spanish functions were quite variable and showed evidence of sensitivity to three types of voicing cues (lead versus simultaneous versus lag). In contrast, English functions were highly categorical and showed sensitivity to only two types of cues (combined lead/simultaneous versus lag). Next, listeners were explicitly trained via feedback to differentiate stimuli crossing lead and lag boundaries. Here, monkey and Spanish performance was initially more symmetrical than English performance, with the latter showing reduced sensitivity to the lead boundary, but group differences disappeared after extended training. These results provide evidence for perceptual loss in English listeners for aspects of Spanish voicing lead perception. 相似文献

2.

Influence of fundamental frequency on stop-consonant voicing perception: a case of learned covariation or auditory enhancement?

Holt LL Lotto AJ Kluender KR 《The Journal of the Acoustical Society of America》2001,109(2):764-774

For stimuli modeling stop consonants varying in the acoustic correlates of voice onset time (VOT), human listeners are more likely to perceive stimuli with lower f0's as voiced consonants--a pattern of perception that follows regularities in English speech production. The present study examines the basis of this observation. One hypothesis is that lower f0's enhance perception of voiced stops by virtue of perceptual interactions that arise from the operating characteristics of the auditory system. A second hypothesis is that this perceptual pattern develops as a result of experience with f0-voicing covariation. In a test of these hypotheses, Japanese quail learned to respond to stimuli drawn from a series varying in VOT through training with one of three patterns of f0-voicing covariation. Voicing and f0 varied in the natural pattern (shorter VOT, lower f0), in an inverse pattern (shorter VOT, higher f0), or in a random pattern (no f0-voicing covariation). Birds trained with stimuli that had no f0-voicing covariation exhibited no effect of f0 on response to novel stimuli varying in VOT. For the other groups, birds' responses followed the experienced pattern of covariation. These results suggest f0 does not exert an obligatory influence on categorization of consonants as [VOICE] and emphasize the learnability of covariation among acoustic characteristics of speech. 相似文献

3.

Differential cue weighting in perception and production of consonant voicing

AA Shultz AL Francis F Llanos 《The Journal of the Acoustical Society of America》2012,132(2):EL95-EL101

This study examines English speakers' relative weighting of two voicing cues in production and perception. Participants repeated words differing in initial consonant voicing ([b] or [p]) and labeled synthesized tokens ranging between [ba] and [pa] orthogonally according to voice onset time (VOT) and onset f0. Discriminant function analysis and logistic regression were used to calculate individuals' relative weighting of each cue. Production results showed a significant negative correlation of VOT and onset f0, while perception results showed a trend toward a positive correlation. No significant correlations were found across perception and production, suggesting a complex relationship between the two domains. 相似文献

4.

Phonetic training with acoustic cue manipulations: a comparison of methods for teaching English /r/-/l/ to Japanese adults

Iverson P Hazan V Bannister K 《The Journal of the Acoustical Society of America》2005,118(5):3267-3278

Recent work [Iverson et al. (2003) Cognition, 87, B47-57] has suggested that Japanese adults have difficulty learning English /r/ and /l/ because they are overly sensitive to acoustic cues that are not reliable for /r/-/l/ categorization (e.g., F2 frequency). This study investigated whether cue weightings are altered by auditory training, and compared the effectiveness of different training techniques. Separate groups of subjects received High Variability Phonetic Training (natural words from multiple talkers), and 3 techniques in which the natural recordings were altered via signal processing (All Enhancement, with F3 contrast maximized and closure duration lengthened; Perceptual Fading, with F3 enhancement reduced during training; and Secondary Cue Variability, with variation in F2 and durations increased during training). The results demonstrated that all of the training techniques improved /r/-/l/ identification by Japanese listeners, but there were no differences between the techniques. Training also altered the use of secondary acoustic cues; listeners became biased to identify stimuli as English /l/ when the cues made them similar to the Japanese /r/ category, and reduced their use of secondary acoustic cues for stimuli that were dissimilar to Japanese /r/. The results suggest that both category assimilation and perceptual interference affect English /r/ and /l/ acquisition. 相似文献

5.

Auditory constraints on the perception of voice-onset time: the influence of lower tone frequency on judgments of tone-onset simultaneity 总被引：1，自引：0，他引：1

E M Parker 《The Journal of the Acoustical Society of America》1988,83(4):1597-1607

The experiments reported employed nonspeech analogs of speech stimuli to examine the perceptual interaction between first-formant onset frequency and voice-onset time, acoustic cues to the voicing distinction in English initial stop consonants. The nonspeech stimuli comprised two pure tones varying in relative onset time, and listeners were asked to judge the simultaneity of tone onsets. These judgments were affected by the frequency of the lower tone in a manner that parallels the influence of first-formant onset frequency on voicing judgments. This effect was shown to occur regardless of prior learning and to be systematic over a wide range of lower tone frequencies including frequencies beyond the range of possible first-formant frequencies of speech, suggesting that the effect in speech is not attributable to (tacit) knowledge of production constraints, as some current theories suggest. 相似文献

6.

Auditory discontinuities interact with categorization: implications for speech perception 总被引：2，自引：0，他引：2

Holt LL Lotto AJ Diehl RL 《The Journal of the Acoustical Society of America》2004,116(3):1763-1773

Behavioral experiments with infants, adults, and nonhuman animals converge with neurophysiological findings to suggest that there is a discontinuity in auditory processing of stimulus components differing in onset time by about 20 ms. This discontinuity has been implicated as a basis for boundaries between speech categories distinguished by voice onset time (VOT). Here, it is investigated how this discontinuity interacts with the learning of novel perceptual categories. Adult listeners were trained to categorize nonspeech stimuli that mimicked certain temporal properties of VOT stimuli. One group of listeners learned categories with a boundary coincident with the perceptual discontinuity. Another group learned categories defined such that the perceptual discontinuity fell within a category. Listeners in the latter group required significantly more experience to reach criterion categorization performance. Evidence of interactions between the perceptual discontinuity and the learned categories extended to generalization tests as well. It has been hypothesized that languages make use of perceptual discontinuities to promote distinctiveness among sounds within a language inventory. The present data suggest that discontinuities interact with category learning. As such, "learnability" may play a predictive role in selection of language sound inventories. 相似文献

7.

Coherence in children's speech perception.

S Nittrouer C S Crowther 《The Journal of the Acoustical Society of America》2001,110(4):2129-2140

Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions. 相似文献

8.

Categorization and discrimination of nonspeech sounds: differences between steady-state and rapidly-changing acoustic cues

Mirman D Holt LL McClelland JL 《The Journal of the Acoustical Society of America》2004,116(2):1198-1207

Different patterns of performance across vowels and consonants in tests of categorization and discrimination indicate that vowels tend to be perceived more continuously, or less categorically, than consonants. The present experiments examined whether analogous differences in perception would arise in nonspeech sounds that share critical transient acoustic cues of consonants and steady-state spectral cues of simplified synthetic vowels. Listeners were trained to categorize novel nonspeech sounds varying along a continuum defined by a steady-state cue, a rapidly-changing cue, or both cues. Listeners' categorization of stimuli varying on the rapidly changing cue showed a sharp category boundary and posttraining discrimination was well predicted from the assumption of categorical perception. Listeners more accurately discriminated but less accurately categorized steady-state nonspeech stimuli. When listeners categorized stimuli defined by both rapidly-changing and steady-state cues, discrimination performance was accurate and the categorization function exhibited a sharp boundary. These data are similar to those found in experiments with dynamic vowels, which are defined by both steady-state and rapidly-changing acoustic cues. A general account for the speech and nonspeech patterns is proposed based on the supposition that the perceptual trace of rapidly-changing sounds decays faster than the trace of steady-state sounds. 相似文献

9.

Voice onset time as a perceptual cue in voicing contrast in Tulu language

Parinitha Shetty Rubia N. Sada Ajith U. Kumar 《Indian Journal of Physics》2010,84(3):291-299

Voice onset time (VOT) is a perceptual cue in voicing contrast of stops in the word initial position. The current study aims to acoustically and perceptually characterize VOT in one of the major South Indian languages — Tulu. Stimuli consisted of 2 pairs of meaningful words with velar [/p/-/b/] and bilabial stops and [/k/-g/] in the initial position. These words were uttered by 8 normal native speakers of Tulu and recorded using Praat software. Both spectrogram and waveform views were used to identify the VOT. For perceptual experiment, 4 adult native speakers of Tulu were asked to identify the stimulus from where voicing was truncated in steps of 5 to 7 ms till lead VOT was 0 and silence was added after the burst in 5 msec steps till the lag VOT was 50 msec. The reaction time and the accuracy in identification were measured. Results of acoustic measurement showed no significant mean difference between lead VOTs of two voiced consonants. However, there was a significant difference between means of lag VOTs of voiceless consonants. Results of Perceptual measurement showed that as lead VOT reduces, probability of indentification of /g/ responses reduces; whereas changing VOT had little effect on reaction time and identification of /b/ responses. These results probably indicate that VOT is not necessary to perceive voiceles constants in Tulu but is necessary in the perception of voiced consonants. Thus VOT is a constant specific cue in Tulu. 相似文献

10.

Cues for intervocalic /t/ and /d/ in children and adults

G D Allen J A Norwood 《The Journal of the Acoustical Society of America》1988,84(3):868-875

One naturally spoken token of each of the words petal and pedal was computer edited to produce stimuli varying in voice onset time (VOT), silent closure duration, and initial /e/ vowel duration. These stimuli were then played, in the sentence frame "Push the button for the----," to four adult and four 6-year-old listeners who responded by pressing a button associated with a flower (petal) or a bicycle (pedal). Among the findings of interest were the following: (a) VOT was statistically the strongest cue for both listener groups, followed by closure duration and initial vowel duration; (b) VOT was relatively stronger for children than for adults, whereas closure and initial vowel durations were relatively stronger for adults than for children; (c) except for a probable ceiling/floor effect, there were no statistically significant interactions among the three acoustic cues, although there were interactions between those cues and both listener group (adults versus children) and the token for which the stimulus had been derived (petal versus pedal). 相似文献

11.

On the perception of voicing in syllable-initial plosives in noise

Jiang J Chen M Alwan A 《The Journal of the Acoustical Society of America》2006,119(2):1092-1105

Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not. 相似文献

12.

The role of F1 in the perception of voice onset time and voice offset time.

J Pind 《The Journal of the Acoustical Society of America》1999,106(1):434-437

An important speech cue is that of voice onset time (VOT), a cue for the perception of voicing and aspiration in word-initial stops. Preaspiration, an [h]-like sound between a vowel and the following stop, can be cued by voice offset time, a cue which in most respects mirrors VOT. In Icelandic VOffT is much more sensitive to the duration of the preceding vowel than is VOT to the duration of the following vowel. This has been explained by noting that preaspiration can only follow a phonemically short vowel. Lengthening of the vowel, either by changing its duration or by moving the spectrum towards that appropriate for a long vowel, will thus demand a longer VOffT to cue preaspiration. An experiment is reported showing that this greater effect that vowel quantity has on the perception of VOffT than on the perception of VOT cannot be explained by the effect of F1 frequency at vowel offset. 相似文献

13.

The use of acoustic cues for phonetic identification: effects of spectral degradation and electric hearing

Winn MB Chatterjee M Idsardi WJ 《The Journal of the Acoustical Society of America》2012,131(2):1465-1479

Although some cochlear implant (CI) listeners can show good word recognition accuracy, it is not clear how they perceive and use the various acoustic cues that contribute to phonetic perceptions. In this study, the use of acoustic cues was assessed for normal-hearing (NH) listeners in optimal and spectrally degraded conditions, and also for CI listeners. Two experiments tested the tense/lax vowel contrast (varying in formant structure, vowel-inherent spectral change, and vowel duration) and the word-final fricative voicing contrast (varying in F1 transition, vowel duration, consonant duration, and consonant voicing). Identification results were modeled using mixed-effects logistic regression. These experiments suggested that under spectrally-degraded conditions, NH listeners decrease their use of formant cues and increase their use of durational cues. Compared to NH listeners, CI listeners showed decreased use of spectral cues like formant structure and formant change and consonant voicing, and showed greater use of durational cues (especially for the fricative contrast). The results suggest that although NH and CI listeners may show similar accuracy on basic tests of word, phoneme or feature recognition, they may be using different perceptual strategies in the process. 相似文献

14.

Representation of the voice onset time (VOT) speech parameter in population responses within primary auditory cortex of the awake monkey

Steinschneider M Fishman YI Arezzo JC 《The Journal of the Acoustical Society of America》2003,114(1):307-321

Voice onset time (VOT) signifies the interval between consonant onset and the start of rhythmic vocal-cord vibrations. Differential perception of consonants such as /d/ and /t/ is categorical in American English, with the boundary generally lying at a VOT of 20-40 ms. This study tests whether previously identified response patterns that differentially reflect VOT are maintained in large-scale population activity within primary auditory cortex (A1) of the awake monkey. Multiunit activity and current source density patterns evoked by the syllables /da/ and /ta/ with variable VOTs are examined. Neural representation is determined by the tonotopic organization. Differential response patterns are restricted to lower best-frequency regions. Response peaks time-locked to both consonant and voicing onsets are observed for syllables with a 40- and 60-ms VOT, whereas syllables with a 0- and 20-ms VOT evoke a single response time-locked only to consonant onset. Duration of aspiration noise is represented in higher best-frequency regions. Representation of VOT and aspiration noise in discrete tonotopic areas of A1 suggest that integration of these phonetic cues occurs in secondary areas of auditory cortex. Findings are consistent with the evolving concept that complex stimuli are encoded by synchronized activity in large-scale neuronal ensembles. 相似文献

15.

Effect of stimulus spectrum on distance perception for nearby sources

Kopčo N Shinn-Cunningham BG 《The Journal of the Acoustical Society of America》2011,130(3):1530-1541

The effects of stimulus frequency and bandwidth on distance perception were examined for nearby sources in simulated reverberant space. Sources to the side [containing reverberation-related cues and interaural level difference (ILD) cues] and to the front (without ILDs) were simulated. Listeners judged the distance of noise bursts presented at a randomly roving level from simulated distances ranging from 0.15 to 1.7 m. Six stimuli were tested, varying in center frequency (300-5700 Hz) and bandwidth (200-5400 Hz). Performance, measured as the correlation between simulated and response distances, was worse for frontal than for lateral sources. For both simulated directions, performance was inversely proportional to the low-frequency stimulus cutoff, independent of stimulus bandwidth. The dependence of performance on frequency was stronger for frontal sources. These correlation results were well summarized by considering how mean response, as opposed to response variance, changed with stimulus direction and spectrum: (1) little bias was observed for lateral sources, but listeners consistently overestimated distance for frontal nearby sources; (2) for both directions, increasing the low-frequency cut-off reduced the range of responses. These results are consistent with the hypothesis that listeners used a direction-independent but frequency-dependent mapping of a reverberation-related cue, not the ILD cue, to judge source distance. 相似文献

16.

Spectral structure across the syllable specifies final-stop voicing for adults and children alike

Nittrouer S Lowenstein JH 《The Journal of the Acoustical Society of America》2008,123(1):377-385

Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure. 相似文献

17.

Differences in human and monkey sensitivity to acoustic cues underlying voicing contrasts 总被引：1，自引：0，他引：1

J M Sinnott F S Adams 《The Journal of the Acoustical Society of America》1987,82(5):1539-1547

Humans and monkeys were compared in their differential sensitivity to various acoustic cues underlying voicing contrasts specified by voice-onset time (VOT) in utterance-initial stop consonants. A low-uncertainty repeating standard AX procedure and positive-reinforcement operant conditioning techniques were used to measure difference limens (DLs) along a VOT continuum from--70 ms (prevoiced/ba/) to 0 ms (/ba/) to + 70 ms (/pa/). For all contrasts tested, human sensitivity was more acute than that of monkeys. For voicing lag, which spans a phonemic contrast in English, human DLs for a/ba/(standard)-to-/pa/ (target) continuum averaged 8.3 ms compared to 17 ms for monkeys. Human DLs for a/pa/-to-/ba/ continuum averaged 11 ms compared to 25 ms for monkeys. Larger species differences occurred for voicing lead, which is phonemically nondistinctive in English. Human DLs for a /ba/-to-prevoiced/ba/ continuum averaged 8.2 ms and were four times lower than monkeys (35 ms). Monkeys did not reliably discriminate prevoiced /ba/-to-/ba/, whereas humans DLs averaged 18 ms. The effects of eliminating cues in the English VOT contrasts were also examined. Removal of the aspiration noise in /pa/ greatly increased the DLs and reaction times for both humans and monkeys, but straightening out the F1 transition in /ba/ had only minor effects. Results suggest that quantitative differences in sensitivity should be considered when using monkeys to model the psychoacoustic level of human speech perception. 相似文献

18.

The perception of intonation questions and statements in Cantonese

Ma JK Ciocca V Whitehill TL 《The Journal of the Acoustical Society of America》2011,129(2):1012-1023

In tone languages there are potential conflicts in the perception of lexical tone and intonation, as both depend mainly on the differences in fundamental frequency (F0) patterns. The present study investigated the acoustic cues associated with the perception of sentences as questions or statements in Cantonese, as a function of the lexical tone in sentence final position. Cantonese listeners performed intonation identification tasks involving complete sentences, isolated final syllables, and sentences without the final syllable (carriers). Sensitivity (d' scores) were similar for complete sentences and final syllables but were significantly lower for carriers. Sensitivity was also affected by tone identity. These findings show that the perception of questions and statements relies primarily on the F0 characteristics of the final syllables (local F0 cues). A measure of response bias (c) provided evidence for a general bias toward the perception of statements. Logistic regression analyses showed that utterances were accurately classified as questions or statements by using average F0 and F0 interval. Average F0 of carriers (global F0 cue) was also found to be a reliable secondary cue. These findings suggest that the use of F0 cues for the perception of intonation question in tonal languages is likely to be language-specific. 相似文献

19.

Neural responses to the onset of voicing are unrelated to other measures of temporal resolution

Sinex DG Chen GD 《The Journal of the Acoustical Society of America》2000,107(1):486-495

Voice onset time (VOT) is a temporal cue that can distinguish consonants such as /d/ from /t/. It has previously been shown that neurons' responses to the onset of voicing are strongly dependent on their static spectral sensitivity. This study examined the relation between temporal resolution, determined from responses to sinusoidally amplitude-modulated (SAM) tones, and responses to syllables with different VOTs. Responses to syllables and SAM tones were obtained from low-frequency neurons in the inferior colliculus (IC) of the chinchilla. VOT and modulation period varied from 10 to 70 ms in 10-ms steps, and discharge rates elicited by stimuli whose amplitude envelopes were modulated over the same temporal interval were compared. Neurons that respond preferentially to syllables with particular VOTs might be expected to respond best to the SAM tones with comparable modulation periods. However, no consistent agreement between responses to VOT syllables and to SAM tones was obtained. These results confirm the previous suggestion that IC neurons' selectivity for VOT is determined by spectral rather than temporal sensitivity. 相似文献

20.

Production of the word-final English /t/-/d/ contrast by native speakers of English, Mandarin, and Spanish.

J E Flege M J Munro L Skelton 《The Journal of the Acoustical Society of America》1992,92(1):128-143

The primary aim of this study was to determine if adults whose native language permits neither voiced nor voiceless stops to occur in word-final position can master the English word-final /t/-/d/ contrast. Native English-speaking listeners identified the voicing feature in word-final stops produced by talkers in five groups: native speakers of English, experienced and inexperienced native Spanish speakers of English, and experienced and inexperienced native Mandarin speakers of English. Contrary to hypothesis, the experienced second language (L2) learners' stops were not identified significantly better than stops produced by the inexperienced L2 learners; and their stops were correctly identified significantly less often than stops produced by the native English speakers. Acoustic analyses revealed that the native English speakers made vowels significantly longer before /d/ than /t/, produced /t/-final words with a higher F1 offset frequency than /d/-final words, produced more closure voicing in /d/ than /t/, and sustained closure longer for /t/ than /d/. The L2 learners produced the same kinds of acoustic differences between /t/ and /d/, but theirs were usually of significantly smaller magnitude. Taken together, the results suggest that only a few of the 40 L2 learners examined in the present study had mastered the English word-final /t/-/d/ contrast. Several possible explanations for this negative finding are presented. Multiple regression analyses revealed that the native English listeners made perceptual use of the small, albeit significant, vowel duration differences produced in minimal pairs by the nonnative speakers. A significantly stronger correlation existed between vowel duration differences and the listeners' identifications of final stops in minimal pairs when the perceptual judgments were obtained in an "edited" condition (where post-vocalic cues were removed) than in a "full cue" condition. This suggested that listeners may modify their identification of stops based on the availability of acoustic cues. 相似文献