首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 125 毫秒
1.
Using the phenomenon of duplex perception, previous researchers have shown that certain manipulations affect the perception of formant transitions as speech but not their perception as nonspeech "chirps," a dissociation that is consistent with the hypothesized distinction between speech and nonspeech modes of perception [Liberman et al., Percept. Psychophys. 30, 133-143 (1981); Mann and Liberman, Cognition 14, 211-235 (1983)]. The present study supports this interpretation of duplex perception by showing the existence of a "double dissociation" between the speech and chirp percepts. Five experiments compared the effects of stimulus onset asynchrony, backward masking, and transition intensity on the two sides of duplex percepts. It was found that certain manipulations penalize the chirp side but not the speech side, whereas other manipulations had the opposite effect of penalizing the speech side but not the chirp side. In addition, although effects on the speech side of duplex percepts have appeared to be much the same as in the case of normal (electronically fused) speech stimuli, the present study discovered that manipulations that impaired the chirp side of duplex percepts had considerably less effect on the perception of isolated chirps. Thus it would seem that duplex perception makes chirp perception more vulnerable to the effects of stimulus degradation. Several explanations of the data are discussed, among them, the view that speech perception may take precedence over other forms of auditory perception [Mattingly and Liberman, in Signals and Sense: Local and Global Order in Perceptual Maps, edited by G.M. Edelman, W.E. Gall, and W.M. Cowan (Wiley, New York, in press); Whalen and Liberman, Science 237, 169-171 (1987)].  相似文献   

2.
Speech perception requires the integration of information from multiple phonetic and phonological dimensions. A sizable literature exists on the relationships between multiple phonetic dimensions and single phonological dimensions (e.g., spectral and temporal cues to stop consonant voicing). A much smaller body of work addresses relationships between phonological dimensions, and much of this has focused on sequences of phones. However, strong assumptions about the relevant set of acoustic cues and/or the (in)dependence between dimensions limit previous findings in important ways. Recent methodological developments in the general recognition theory framework enable tests of a number of these assumptions and provide a more complete model of distinct perceptual and decisional processes in speech sound identification. A hierarchical Bayesian Gaussian general recognition theory model was fit to data from two experiments investigating identification of English labial stop and fricative consonants in onset (syllable initial) and coda (syllable final) position. The results underscore the importance of distinguishing between conceptually distinct processing levels and indicate that, for individual subjects and at the group level, integration of phonological information is partially independent with respect to perception and that patterns of independence and interaction vary with syllable position.  相似文献   

3.
4.
This study examined the ability of six-month-old infants to recognize the perceptual similarity of syllables sharing a phonetic segment when variations were introduced in phonetic environment and talker. Infants in a "phonetic" group were visually reinforced for head turns when a change occurred from a background category of labial nasals to a comparison category of alveolar nasals . The infants were initially trained on a [ma]-[na] contrast produced by a male talker. Novel tokens differing in vowel environment and talker were introduced over several stages of increasing complexity. In the most complex stage infants were required to make a head turn when a change occurred from [ma,mi,mu] to [na,ni,nu], with the tokens in each category produced by both male and female talkers. A " nonphonetic " control group was tested using the same pool of stimuli as the phonetic condition. The only difference was that the stimuli in the background and comparison categories were chosen in such a way that the sounds could not be organized by acoustic or phonetic characteristics. Infants in the phonetic group transferred training to novel tokens produced by different talkers and in different vowel contexts. However, infants in the nonphonetic control group had difficulty learning the phonetically unrelated tokens that were introduced as the experiment progressed. These findings suggest that infants recognize the similarity of nasal consonants sharing place of articulation independent of variation in talker and vowel context.  相似文献   

5.
On the role of spectral transition for speech perception   总被引:2,自引:0,他引:2  
This paper examines the relationship between dynamic spectral features and the identification of Japanese syllables modified by initial and/or final truncation. The experiments confirm several main points. "Perceptual critical points," where the percent correct identification of the truncated syllable as a function of the truncation position changes abruptly, are related to maximum spectral transition positions. A speech wave of approximately 10 ms in duration that includes the maximum spectral transition position bears the most important information for consonant and syllable perception. Consonant and vowel identification scores simultaneously change as a function of the truncation position in the short period, including the 10-ms period for final truncation. This suggests that crucial information for both vowel and consonant identification is contained across the same initial part of each syllable. The spectral transition is more crucial than unvoiced and buzz bar periods for consonant (syllable) perception, although the latter features are of some perceptual importance. Also, vowel nuclei are not necessary for either vowel or syllable perception.  相似文献   

6.
In an isolated syllable, a formant will tend to be segregated perceptually if its fundamental frequency (F0) differs from that of the other formants. This study explored whether similar results are found for sentences, and specifically whether differences in F0 (ΔF0) also influence across-formant grouping in circumstances where the exclusion or inclusion of the manipulated formant critically determines speech intelligibility. Three-formant (F1 + F2 + F3) analogues of almost continuously voiced natural sentences were synthesized using a monotonous glottal source (F0 = 150 Hz). Perceptual organization was probed by presenting stimuli dichotically (F1 + F2C + F3; F2), where F2C is a competitor for F2 that listeners must resist to optimize recognition. Competitors were created using time-reversed frequency and amplitude contours of F2, and F0 was manipulated (ΔF0 = ± 8, ± 2, or 0 semitones relative to the other formants). Adding F2C typically reduced intelligibility, and this reduction was greatest when ΔF0 = 0. There was an additional effect of absolute F0 for F2C, such that competitor efficacy was greater for higher F0s. However, competitor efficacy was not due to energetic masking of F3 by F2C. The results are consistent with the proposal that a grouping "primitive" based on common F0 influences the fusion and segregation of concurrent formants in sentence perception.  相似文献   

7.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

8.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

9.
Four experiments explored the relative contributions of spectral content and phonetic labeling in effects of context on vowel perception. Two 10-step series of CVC syllables ([bVb] and [dVd]) varying acoustically in F2 midpoint frequency and varying perceptually in vowel height from [delta] to [epsilon] were synthesized. In a forced-choice identification task, listeners more often labeled vowels as [delta] in [dVd] context than in [bVb] context. To examine whether spectral content predicts this effect, nonspeech-speech hybrid series were created by appending 70-ms sine-wave glides following the trajectory of CVC F2's to 60-ms members of a steady-state vowel series varying in F2 frequency. In addition, a second hybrid series was created by appending constant-frequency sine-wave tones equivalent in frequency to CVC F2 onset/offset frequencies. Vowels flanked by frequency-modulated glides or steady-state tones modeling [dVd] were more often labeled as [delta] than were the same vowels surrounded by nonspeech modeling [bVb]. These results suggest that spectral content is important in understanding vowel context effects. A final experiment tested whether spectral content can modulate vowel perception when phonetic labeling remains intact. Voiceless consonants, with lower-amplitude more-diffuse spectra, were found to exert less of an influence on vowel perception than do their voiced counterparts. The data are discussed in terms of a general perceptual account of context effects in speech perception.  相似文献   

10.
How do children develop the ability to recognize phonetic structure in their native language with the accuracy and efficiency of adults? In particular, how do children learn what information in speech signals is relevant to linguistic structure in their native language, and what information is not? These questions are the focus of considerable investigation, including several studies by Catherine Mayo and Alice Turk. In a proposed Letter by Mayo and Turk, the comparative role of the isolated consonant-vowel formant transition in children's and adults' speech perception was questioned. Although Mayo and Turk ultimately decided to withdraw their letter, this note, originally written as a reply to their letter, was retained. It highlights the fact that the isolated formant transition must be viewed as part of a more global aspect of structurein the acoustic speech stream, one that arises from the rather slowly changing adjustments made invocal-tract geometry. Only by maintaining this perspective of acoustic speech structure can we ensure that we design stimuli that provide valid tests of our hypotheses and interpret results in a meaningful way.  相似文献   

11.
Discrimination of speech-sound pairs drawn from a computer-generated continuum in which syllables varied along the place of articulation phonetic feature (/b,d,g/) was tested with macaques. The acoustic feature that was varied along the two-formant 15-step continuum was the starting frequency of the second-formant transition. Discrimination of stimulus pairs separated by two steps was tested along the entire continuum in a same-different task. Results demonstrated that peaks in the discrimination functions occur for macaques at the "phonetic boundaries" which separate the /b-d/ and /d-g/ categories for human listeners. The data support two conclusions. First, although current theoretical accounts of place perception by human adults suggest that isolated second-formant transitions are "secondary" cues, learned by association with primary cues, the animal data are more compatible with the notion that second-formant transitions are sufficient to allow the appropriate partitioning of a place continuum in the absence of associative pairing with other more complex cues. Second, we discuss two potential roles played by audition in the evolution of the acoustics of language. One is that audition provided a set of "natural psychophysical boundaries," based on rather simple acoustic properties, which guided the selection of the phonetic repertoire but did not solely determine it; the other is that audition provided a set of rules for the formation of "natural classes" of sound and that phonetic units met those criteria. The data provided in this experiment provide support for the former. Experiments that could more clearly differentiate the two hypotheses are described.  相似文献   

12.
Integral processing of phonemes: evidence for a phonetic mode of perception   总被引:1,自引:0,他引:1  
To investigate the extent and locus of integral processing in speech perception, a speeded classification task was utilized with a set of noise-tone analogs of the fricative-vowel syllables (fae), (integral of ae), (fu), and (integral of u). Unlike the stimuli used in previous studies of selective perception of syllables, these stimuli did not contain consonant-vowel transitions. Subjects were asked to classify on the basis of one of the two syllable components. Some subjects were told that the stimuli were computer generated noise-tone sequences. These subjects processed the noise and tone separably. Irrelevant variation of the noise did not affect reaction times (RTs) for the classification of the tone, and vice versa. Other subjects were instructed to treat the stimuli as speech. For these subjects, irrelevant variation of the fricative increased RTs for the classification of the vowel, and vice versa. A second experiment employed naturally spoken fricative-vowel syllables with the same task. Classification RTs showed a pattern of integrality in that irrelevant variation of either component increased RTs to the other. These results indicate that knowledge of coarticulation (or its acoustic consequences) is a basic element of speech perception. Furthermore, the use of this knowledge in phonetic coding is mandatory, even in situations where the stimuli do not contain coarticulatory information.  相似文献   

13.
We investigate analytically and experimentally various aspects of the angular chirp of ultrashort laser pulses. This type of chirp is easily produced by slight misalignment of standard pulse stretcher and/or compressor setups. Angular chirp leads to tilted pulse fronts in the near field and to a strong reduction of intensity in the focus. The effect is rather difficult to observe with standard diagnostic techniques. We present a method that is based on interferometric field autocorrelation and allows us to measure the angular chirp reliably. Suggestions on how to avoid this effect are outlined as well.  相似文献   

14.
Single-molecule experiments on polymeric DNA show that the molecule can be overstretched at nearly constant force by about 70% beyond its relaxed contour length. In this publication we use steered molecular dynamics (MD) simulation to study the effect of structural defects on force-extension curves and structures at high elongation in a 30 base pair duplex pulled by its torsionally unconstrained 5' -5' ends. The defect-free duplex shows a plateau in the force-extension curve at 120pN in which large segments with inclined and paired bases (“S-DNA”) near both ends of the duplex coexist with a central B-type segment separated from the former by small denaturation bubbles. In the presence of a base mismatch or a nick, force-extension curves are very similar to the ones of the defect-free duplex. For the duplex with a base mismatch, S-type segments with highly inclined base pairs are not observed; rather, the overstretched duplex consists of B-type segments separated by denaturation bubbles. The nicked duplex evolves, via a two-step transition, into a two-domain structure characterized by a large S-type segment coexisting with several short S-type segments which are separated by short denaturation bubbles. Our results suggest that in the presence of nicks the force-extension curve of highly elongated duplex DNA might reflect locally highly inhomogeneous stretching.  相似文献   

15.
The amount of acoustic information that native and non-native listeners need for syllable identification was investigated by comparing the performance of monolingual English speakers and native Spanish speakers with either an earlier or a later age of immersion in an English-speaking environment. Duration-preserved silent-center syllables retaining 10, 20, 30, or 40 ms of the consonant-vowel and vowel-consonant transitions were created for the target vowels /i, I, eI, epsilon, ae/ and /a/, spoken by two males in /bVb/ context. Duration-neutral syllables were created by editing the silent portion to equate the duration of all vowels. Listeners identified the syllables in a six-alternative forced-choice task. The earlier learners identified the whole-word and 40 ms duration-preserved syllables as accurately as the monolingual listeners, but identified the silent-center syllables significantly less accurately overall. Only the monolingual listener group identified syllables significantly more accurately in the duration-preserved than in the duration-neutral condition, suggesting that the non-native listeners were unable to recover from the syllable disruption sufficiently to access the duration cues in the silent-center syllables. This effect was most pronounced for the later learners, who also showed the most vowel confusions and the greatest decrease in performance from the whole word to the 40 ms transition condition.  相似文献   

16.
Perceptual distances among single tokens of American English vowels were established for nonreverberant and reverberant conditions. Fifteen vowels in the phonetic context (b-t), embedded in the sentence "Mark the (b-t) again" were recorded by a male talker. For the reverberant condition, the sentences were played through a room with a reverberation time of 1.2 s. The CVC syllables were removed from the sentences and presented in pairs to ten subjects with audiometrically normal hearing, who judged the similarity of the syllable pairs separately for the nonreverberant and reverberant conditions. The results were analyzed by multidimensional scaling procedures, which showed that the perceptual data were accounted for by a three-dimensional vowel space. Correlations were obtained between the coordinates of the vowels along each dimension and selected acoustic parameters. For both conditions, dimensions 1 and 2 were highly correlated with formant frequencies F2 and F1, respectively, and dimension 3 was correlated with the product of the duration of the vowels and the difference between F3 and F1 expressed on the Bark scale. These observations are discussed in terms of the influence of reverberation on speech perception.  相似文献   

17.
A fs time-resolved selective control of multilevel systems using superposition of two identical, frequency-chirped fields is proposed and demonstrated. By adjusting the delay between the pulses, a selected transition of the Rb doublet was brought into the "holes" of the interference pattern and remained nonexcited, thus allowing to manipulate another transition by the laser field as if it were an isolated two-level system. Based on light interference, this technique needs neither strong driving field intensities nor controlling the chirp direction to achieve the selectivity.  相似文献   

18.
19.
Speech motor control timing was examined by means of a multiple correlational analysis involving interarticulatory delay and speech rate as predictor variables, and four subsyllabic time segments of the syllable [ka] as dependent variables. The hypothesis was that the two putative temporal constraints have differential predictive capacity for various segments of the syllable. Results from 11 subjects were in support of the hypothesis. Syllable onset duration was reliably predicted by the linear addition of interarticulatory delay and speech rate, while the duration of the midportion of the syllable was nearly exclusively predicted by the overall speech rate. This model was found to be applicable to all conditions of normal and clenched teeth, context-free and contextual, normally paced and rapid speech production, with minor differences in predictive capacity for different conditions.  相似文献   

20.
This paper reports acoustic measurements and results from a series of perceptual experiments on the voiced-voiceless distinction for syllable-final stop consonants in absolute final position and in the context of a following syllable beginning with a different stop consonant. The focus is on temporal cues to the distinction, with vowel duration and silent closure duration as the primary and secondary dimensions, respectively. The main results are that adding a second syllable to a monosyllable increases the number of voiced stop consonant responses, as does shortening of the closure duration in disyllables. Both of these effects are consistent with temporal regularities in speech production: Vowel durations are shorter in the first syllable of disyllables than in monosyllables, and closure durations are shorter for voiced than for voiceless stops in disyllabic utterances of this type. While the perceptual effects thus may derive from two separate sources of tacit phonetic knowledge available to listeners, the data are also consistent with an interpretation in terms of a single effect; one of temporal proximity of following context.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号