首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Which acoustic properties of the speech signal differ between rhythmically prominent syllables and non-prominent ones? A production experiment was conducted to identify these acoustic properties. Subjects read out repetitive text to a metronome, trying to match stressed syllables to its beat. The analysis searched for the function of the speech signal that best predicts the timing of the metronome ticks. The most important factor in this function is found to be the contrast in loudness between a syllable and its neighbors. The prominence of a syllable can be deduced from the specific loudness in an (approximately) 360 ms window centered on the syllable in question relative to an (approximately) 800-ms-wide symmetric window.  相似文献   

2.
Dynamic specification of coarticulated vowels spoken in sentence context   总被引:3,自引:0,他引:3  
According to a dynamic specification account, coarticulated vowels are identified on the basis of time-varying acoustic information, rather than solely on the basis of "target" information contained within a single spectral cross section of an acoustic syllable. Three experiments utilizing digitally segmented portions of consonant-vowel-consonant (CVC) syllables spoken rapidly in a carrier sentence were designed to examine the relative contribution of (1) target information available in vocalic nuclei, (2) intrinsic duration information specified by syllable length, and (3) dynamic spectral information defined over syllable onsets and offsets. In experiments 1 and 2, vowels produced in three consonantal contexts by an adult male were examined. Results showed that vowels in silent-center (SC) syllables (in which vocalic nuclei were attentuated to silence leaving initial and final transitional portions in their original temporal relationship) were perceived relatively accurately, although not as well as unmodified syllables (experiment 1); random versus blocked presentation of consonantal contexts did not affect performance. Error rates were slightly greater for vowels in SC syllables in which intrinsic duration differences were neutralized by equating the duration of silent intervals between initial and final transitional portions. However, performance was significantly better than when only initial transitions or final transitions were presented alone (experiment 2). Experiment 3 employed CVC stimuli produced by another adult male, and included six consonantal contexts. Both SC syllables and excised syllable nuclei with appropriate intrinsic durations were identified no less accurately than unmodified controls. Neutralizing duration differences in SC syllables increased identification errors only slightly, while truncating excised syllable nuclei yielded a greater increase in errors. These results demonstrate that time-varying information is necessary for accurate identification of coarticulated vowels. Two hypotheses about the nature of the dynamic information specified over syllable onsets and offsets are discussed.  相似文献   

3.
In a series of four experiments, the ability of 3- to 4-month-old infants to form categorical representations to syllable-initial consonants in monosyllabic stimuli (experiments 1 and 2) and to initial and final syllables in bisyllabic stimuli (experiments 3 and 4, respectively) was investigated. Experiment 1 yielded no evidence of categorical representations for the initial consonant. However, the results indicated that the four or six stimuli presented during the initial phase of familiarization had been remembered. The results of experiment 2, which employed a less stringent familiarization criterion, replicated the findings of experiment 1, although there was some evidence for categorization for infants whose familiarization performance more closely matched the weaker criterion. In experiment 3, there was strong evidence for a categorical representation of the initial syllable of bisyllabic stimuli for infants experiencing six familiar stimuli. In experiment 4, there was less robust evidence of categorization of the final syllable of bisyllabic stimuli, but again only when six familiar stimuli were experienced. The results were discussed in terms of the earliest representation of speech being syllables that could be modified by the rhythmic nature of the infant's native language.  相似文献   

4.
Stress is an important parameter for prosody processing in speech synthesis. In this paper, we compare the acoustic features of neutral tone syllables and strong stress syllables with moderate stress syllables, including pitch, syllable duration, intensity and pause length after syllable. The relation between duration and pitch, as well as the Third Tone (T3) and pitch are also studied. Three stress prediction models based on ANN, i.e. the acoustic model, the linguistic model and the mixed model, are presented for predicting Chinese sentential stress. The results show that the mixed model performs better than the other two models. In order to solve the problem of the diversity of manual labeling, an evaluation index of support ratio is proposed.  相似文献   

5.
The perceptive multi-dimension structure of Chinese syllables is studied by psychological-physical experiment. The results indicate that FO and duration are interrelated to two main dimensions of the perceptive structure of Chinese syllable. And the prosodic characteristics such as the position of syllable in prosodic hierarchical structure, as well as the stress will be induced the various distribution of syllable in perception space.  相似文献   

6.
Integral processing of phonemes: evidence for a phonetic mode of perception   总被引:1,自引:0,他引:1  
To investigate the extent and locus of integral processing in speech perception, a speeded classification task was utilized with a set of noise-tone analogs of the fricative-vowel syllables (fae), (integral of ae), (fu), and (integral of u). Unlike the stimuli used in previous studies of selective perception of syllables, these stimuli did not contain consonant-vowel transitions. Subjects were asked to classify on the basis of one of the two syllable components. Some subjects were told that the stimuli were computer generated noise-tone sequences. These subjects processed the noise and tone separably. Irrelevant variation of the noise did not affect reaction times (RTs) for the classification of the tone, and vice versa. Other subjects were instructed to treat the stimuli as speech. For these subjects, irrelevant variation of the fricative increased RTs for the classification of the vowel, and vice versa. A second experiment employed naturally spoken fricative-vowel syllables with the same task. Classification RTs showed a pattern of integrality in that irrelevant variation of either component increased RTs to the other. These results indicate that knowledge of coarticulation (or its acoustic consequences) is a basic element of speech perception. Furthermore, the use of this knowledge in phonetic coding is mandatory, even in situations where the stimuli do not contain coarticulatory information.  相似文献   

7.
8.
The study measured listener sensitivity to increments in the inter-onset intervals (IOIs) of successive 20-ms 4000-Hz tone bursts in isochronous sequences. The stimulus sequences contained two-six tone bursts, separated equally by silent intervals, with tonal IOIs ranging from 25 to 100 ms. Difference limens (DLs) for increments of the tonal IOIs were measured to assess listener sensitivity to changes of sequence rate. Comparative DLs were also measured for increments of a single interval located within six-tone isochronous sequences with different tone rates. Listeners included younger normal-hearing adults and two groups of older adults with and without high-frequency sensorineural hearing loss. The results, expressed as Weber fractions (DL/IOI), revealed that discrimination improved as the sequence tone rate decreased and the number of tonal components increased. Discrimination of a single sequence interval also improved as the number of sequence components increased from two to six but only for brief intervals and fast sequence rates. Discrimination performance of the older listeners with and without hearing loss was equivalent and significantly poorer than that of the younger listeners. The discrimination results are examined and discussed within the context of multiple-look mechanisms and possible age-related differences in the sensory coding of signal onsets.  相似文献   

9.
Segmental duration patterns have long been used to support the proposal that syllables are basic speech planning units, but production experiments almost always confound syllable and word boundaries. The current study tried to remedy this problem by comparing word-internal and word-peripheral consonantal duration patterns. Stress and sequencing were used to vary the nominal location of word-internal boundaries in American English productions of disyllabic nonsense words with medial consonant sequences. The word-internal patterns were compared to those that occurred at the edges of words, where boundary location was held constant and only stress and sequence order were varied. The English patterns were then compared to patterns from Russian and Finnish. All three languages showed similar effects of stress and sequencing on consonantal duration, but an independent effect of syllable position was observed only in English and only at a word boundary. English also showed stronger effects of stress and sequencing across a word boundary than within a word. Finnish showed the opposite pattern, whereas Russian showed little difference between word-internal and word-peripheral patterns. Overall, the results suggest that the suprasegmental units of motor planning are language-specific and that the word may be more a relevant planning unit in English.  相似文献   

10.
Systems designed to recognize continuous speech must be able to adapt to many types of acoustic variation, including variations in stress. A speaker-dependent recognition study was conducted on a group of stressed and destressed syllables. These syllables, some containing the short vowel /I/ and others the long vowel /ae/, were excised from continuous speech and transformed into arrays of cepstral coefficients at two levels of precision. From these data, four types of template dictionaries varying in size and stress composition were formed by a time-warping procedure. Recognition performance data were gathered from listeners and from a computer recognition algorithm that also employed warping. It was found that for a significant portion of the data base, stressed and destressed versions of the same syllable are sufficiently different from one another as to justify the use of separate dictionary templates. Second, destressed syllables exhibit roughly the same acoustic variance as their stressed counterparts. Third, long vowels tend to be involved in proportionally fewer cross-vowel errors but tend to diminish the warping algorithm's ability to discriminate consonantal information. Finally, the pattern of consonant errors that listeners make as a function of vowel length shows significant differences from that produced by the computer.  相似文献   

11.
Two experiments were conducted to investigate whether or not anchoring and selective adaptation induce basically the same psychological effects. The purpose of the first experiment is to show how an audiovisual anchor modifies the perception of consonant-vowel (CV) syllables. The anchors were two purely acoustical, two purely optical, and three audiovisual CV syllables. The results were compared with those of audiovisual speech selective-adaptation experiments conducted by Roberts and Summerfield [Percept. Psychophys. 30, 309-314 (1981)] and Salda?a and Rosenblum [J. Acoust. Soc. Am. 95, 3658-3661 (1994)]. The audiovisual anchoring effects were found to be very similar to the audiovisual selective-adaptation effects, but the incompatible audiovisual anchor produced more auditory-based contrast than the purely acoustical anchor or the compatible audiovisual anchor. This difference in contrast had not been found in the previous selective-adaptation experiments. The second experiment was conducted to directly compare audiovisual anchoring and selective-adaptation effects under the same stimuli and with the same subjects. It was found that the compatible audiovisual syllable (AbVb) caused more contrast in selective adaptation than in anchoring, although the discrepant audiovisual syllable (AbVg) caused no difference between anchoring and selective adaptation. It was also found that the anchor AbVg caused more auditory-based contrast than the anchor AbVb. It is suggested that the mechanisms behind these results are different.  相似文献   

12.
Nowadays, it is widely believed that the temporal structure of the auditory nerve fibers' response to sound stimuli plays an important role in auditory perception. An influential hypothesis is that information is extracted from this temporal structure by neural operations akin to an autocorrelation algorithm. The goal of the present work was to test this hypothesis. The stimuli consisted of sequences of unipolar clicks that were high-pass filtered and mixed with low-pass noise so as to exclude spectral cues. In experiment 1, "interfering" clicks were inserted in an otherwise periodic (isochronous) click sequence. Each click belonging to the periodic sequence was followed, after a random portion of the period, by one interfering click. This disrupted the detection of temporal regularity, even when the interfering clicks were 5 dB less intense than the periodic clicks. Experiments 2-4 used click sequences that showed a single peak in their autocorrelation functions. For some sequences, this peak originated from "first-order" temporal regularities, that is from the temporal relations between consecutive clicks. For other sequences, the peak originated instead from "second-order" regularities, relative to nonconsecutive clicks. The detection of second-order regularities appeared to be much more difficult than the detection of comparable first-order regularities. Overall, these results do not tally with the current autocorrelation models of temporal processing. They suggest that the extraction of temporal information from a group of closely spaced spectral components makes no use of time intervals between nonconsecutive peaks of the amplitude envelope.  相似文献   

13.
The Mongolian gerbil (Meriones unguiculatus) has been an important model system in auditory physiology, but its natural sounds are not well known. Vocalizations produced by colonies of adult gerbils were recorded during various social interactions in a standard laboratory animal-rearing facility. Sound recordings were made continuously for 24 h. This species exhibited a rich repertoire of vocalizations that varied in spectrotemporal structure. Calls were classified into 13 distinct syllable types. These syllables were further categorized into eight simple syllables and five composite syllables, which could be described by combinations of two to three simple syllables. The durations of individual syllables ranged from 30 to 330 ms with fundamental frequencies of 5 to 50 kHz. Those with lower fundamental frequencies typically contained more harmonic components (up to nine). Analysis of syllable sequences indicated that syllables may be combined into three types of simple phrases. These results provide a basis for future studies not only of the behavioral significance of vocalization, but also of the neural basis of vocal communication in the Mongolian gerbil.  相似文献   

14.
The contribution of the nasal murmur and vocalic formant transition to the perception of the [m]-[n] distinction by adult listeners was investigated for speakers of different ages in both consonant-vowel (CV) and vowel-consonant (VC) syllables. Three children in each of the speaker groups 3, 5, and 7 years old, and three adult females and three adult males produced CV and VC syllables consisting of either [m] or [n] and followed or preceded by [i ae u a], respectively. Two productions of each syllable were edited into seven murmur and transitions segments. Across speaker groups, a segment including the last 25 ms of the murmur and the first 25 ms of the vowel yielded higher perceptual identification of place of articulation than any other segment edited from the CV syllable. In contrast, the corresponding vowel+murmur segment in the VC syllable position improved nasal identification relative to other segment types for only the adult talkers. Overall, the CV syllable was perceptually more distinctive than the VC syllable, but this distinctiveness interacted with speaker group and stimulus duration. As predicted by previous studies and the current results of perceptual testing, acoustic analyses of adult syllable productions showed systematic differences between labial and alveolar places of articulation, but these differences were only marginally observed in the youngest children's speech. Also predicted by the current perceptual results, these acoustic properties differentiating place of articulation of nasal consonants were reliably different for CV syllables compared to VC syllables. A series of comparisons of perceptual data across speaker groups, segment types, and syllable shape provided strong support, in adult speakers, for the "discontinuity hypothesis" [K. N. Stevens, in Phonetic Linguistics: Essays in Honor of Peter Ladefoged, edited by V. A. Fromkin (Academic, London, 1985), pp. 243-255], according to which spectral discontinuities at acoustic boundaries provide critical cues to the perception of place of articulation. In child speakers, the perceptual support for the "discontinuity hypothesis" was weaker and the results indicative of developmental changes in speech production.  相似文献   

15.
A model of a spectrum target prediction mechanism is proposed and evaluated by comparing predicted values with results of psychoacoustic experiments. When the trajectory of the cepstrally smoothed LPC spectrum is approximated by a second-order critically damped system, the proposed model can estimate target values using short-period spectrum sequences (50 ms) without being given the onset positions of the spectral transition. Additionally, this model decreases the length of transitional sounds and recovers vowel characteristics neutralized by coarticulation. Moreover, this model compensates for the transitions of syllables and extracts stable characteristics from syllable transitions. This model is applicable to coarticulation recovery in speech signal processing.  相似文献   

16.
Speech motor control timing was examined by means of a multiple correlational analysis involving interarticulatory delay and speech rate as predictor variables, and four subsyllabic time segments of the syllable [ka] as dependent variables. The hypothesis was that the two putative temporal constraints have differential predictive capacity for various segments of the syllable. Results from 11 subjects were in support of the hypothesis. Syllable onset duration was reliably predicted by the linear addition of interarticulatory delay and speech rate, while the duration of the midportion of the syllable was nearly exclusively predicted by the overall speech rate. This model was found to be applicable to all conditions of normal and clenched teeth, context-free and contextual, normally paced and rapid speech production, with minor differences in predictive capacity for different conditions.  相似文献   

17.
For all but the most profoundly hearing-impaired (HI) individuals, auditory-visual (AV) speech has been shown consistently to afford more accurate recognition than auditory (A) or visual (V) speech. However, the amount of AV benefit achieved (i.e., the superiority of AV performance in relation to unimodal performance) can differ widely across HI individuals. To begin to explain these individual differences, several factors need to be considered. The most obvious of these are deficient A and V speech recognition skills. However, large differences in individuals' AV recognition scores persist even when unimodal skill levels are taken into account. These remaining differences might be attributable to differing efficiency in the operation of a perceptual process that integrates A and V speech information. There is at present no accepted measure of the putative integration process. In this study, several possible integration measures are compared using both congruent and discrepant AV nonsense syllable and sentence recognition tasks. Correlations were tested among the integration measures, and between each integration measure and independent measures of AV benefit for nonsense syllables and sentences in noise. Integration measures derived from tests using nonsense syllables were significantly correlated with each other; on these measures, HI subjects show generally high levels of integration ability. Integration measures derived from sentence recognition tests were also significantly correlated with each other, but were not significantly correlated with the measures derived from nonsense syllable tests. Similarly, the measures of AV benefit based on nonsense syllable recognition tests were found not to be significantly correlated with the benefit measures based on tests involving sentence materials. Finally, there were significant correlations between AV integration and benefit measures derived from the same class of speech materials, but nonsignificant correlations between integration and benefit measures derived from different classes of materials. These results suggest that the perceptual processes underlying AV benefit and the integration of A and V speech information might not operate in the same way on nonsense syllable and sentence input.  相似文献   

18.
汉语综合资料库的设计   总被引:1,自引:0,他引:1       下载免费PDF全文
语言是人类最重要的交际工具,随着现代信息技术的发展,语言也是人与机器之间交际的有效工具.近年来世界各国纷纷建立本国的言语资料库作为言语科学研究和言语技术开发的基础.汉语综合资料库的语音材料有:汉语全部有调音节、数字串、单词、韵律特征材料,以及语言清晰度试验用音节表、词表、句表和有代表性的短文等.汉语综合资料库在语言学和语音学特征以及声学特征方面充分体现汉语的基本特点.首先要解决语料选取问题,考虑各种语言单位的使用频率,不仅要包括全部高频词,也要反映较全面的语音现象.数据库在结构上是开放的模块式的,同时配有灵活的数据库管理系统.  相似文献   

19.
Young deaf children using a cochlear implant develop speech abilities on the basis of speech temporal-envelope signals distributed over a limited number of frequency bands. A Headturn Preference Procedure was used to measure looking times in 6-month-old, normal-hearing infants during presentation of repeating or alternating sequences composed of different tokens of /aba/and /apa/ processed to retain envelope information below 64 Hz while degrading temporal fine structure cues. Infants attended longer to the alternating sequences, indicating that they perceive the voicing contrast on the basis of envelope cues alone in the absence of fine spectral and temporal structure information.  相似文献   

20.
The articulatory kinematics of final lengthening   总被引:4,自引:0,他引:4  
In order to understand better the phonetic control of final lengthening, the articulation of phrase-final syllables was compared with that of two other contexts known to increase syllable duration: accent and slow tempo. The kinematics of jaw movements in [pap] sequences and of lower lip movements in [pe] sequences for four subjects were interpreted in terms of a task-dynamic model. There was evidence of two different control strategies: decreasing intragestural stiffness to slow down some part of the syllable, and changing intergestural phasing to decrease overlap of the vowel gesture by the consonant. The first was used in slowing down tempo, whereas the second was used to increase the duration of accented syllables over unaccented syllables. Both strategies were implicated in phrase-final lengthening. In accented syllables, final closing gestures generally were longer and slower, but not more displaced. The two slowest subjects, however, used the other strategy in their slow-tempo final syllables. Final lengthening in reduced syllables was more difficult to interpret. The relationship between peak velocity and displacement suggested that a lesser stiffness is obscured by an increased gestural amplitude. Thus, by comparison to lengthening for accent, final lengthening is like a localized change in speaking tempo, although it cannot be equated directly with the specification of stiffness.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号