首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 370 毫秒
1.
Recent studies have demonstrated that mothers exaggerate phonetic properties of infant-directed (ID) speech. However, these studies focused on a single acoustic dimension (frequency), whereas speech sounds are composed of multiple acoustic cues. Moreover, little is known about how mothers adjust phonetic properties of speech to children with hearing loss. This study examined mothers' production of frequency and duration cues to the American English tense/lax vowel contrast in speech to profoundly deaf (N?=?14) and normal-hearing (N?=?14) infants, and to an adult experimenter. First and second formant frequencies and vowel duration of tense (/i/,?/u/) and lax (/I/,?/?/) vowels were measured. Results demonstrated that for both infant groups mothers hyperarticulated the acoustic vowel space and increased vowel duration in ID speech relative to adult-directed speech. Mean F2 values were decreased for the /u/ vowel and increased for the /I/ vowel, and vowel duration was longer for the /i/, /u/, and /I/ vowels in ID speech. However, neither acoustic cue differed in speech to hearing-impaired or normal-hearing infants. These results suggest that both formant frequencies and vowel duration that differentiate American English tense/lx vowel contrasts are modified in ID speech regardless of the hearing status of the addressee.  相似文献   

2.
The ability of baboons to discriminate changes in the formant structures of a synthetic baboon grunt call and an acoustically similar human vowel (/epsilon/) was examined to determine how comparable baboons are to humans in discriminating small changes in vowel sounds, and whether or not any species-specific advantage in discriminability might exist when baboons discriminate their own vocalizations. Baboons were trained to press and hold down a lever to produce a pulsed train of a standard sound (e.g., /epsilon/ or a baboon grunt call), and to release the lever only when a variant of the sound occurred. Synthetic variants of each sound had the same first and third through fifth formants (F1 and F3-5), but varied in the location of the second formant (F2). Thresholds for F2 frequency changes were 55 and 67 Hz for the grunt and vowel stimuli, respectively, and were not statistically different from one another. Baboons discriminated changes in vowel formant structures comparable to those discriminated by humans. No distinct advantages in discrimination performances were observed when the baboons discriminated these synthetic grunt vocalizations.  相似文献   

3.
A quantitative perceptual model of human vowel recognition based upon psychoacoustic and speech perception data is described. At an intermediate auditory stage of processing, the specific bark difference level of the model represents the pattern of peripheral auditory excitation as the distance in critical bands (barks) between neighboring formants and between the fundamental frequency (F0) and first formant (F1). At a higher, phonetic stage of processing, represented by the critical bark difference level of the model, the transformed vowels may be dichotomously classified based on whether the difference between formants in each dimension falls within or exceeds the critical distance of 3 bark for the spectral center of gravity effect [Chistovich et al., Hear. Res. 1, 185-195 (1979)]. Vowel transformations and classifications correspond well to several major phonetic dimensions and features by which vowels are perceived and traditionally classified. The F1-F0 dimension represents vowel height, and high vowels have F1-F0 differences within 3 bark. The F3-F2 dimension corresponds to vowel place of articulation, and front vowels have F3-F2 differences of less than 3 bark. As an inherent, speaker-independent normalization procedure, the model provides excellent vowel clustering while it greatly reduces between-speaker variability. It offers robust normalization through feature classification because gross binary categorization allows for considerable acoustic variability. There was generally less formant and bark difference variability for closely spaced formants than for widely spaced formants. These findings agree with independently observed perceptual results and support Stevens' quantal theory of vowel production and perceptual constraints on production predicted from the critical bark difference level of the model.  相似文献   

4.
Changes in the speech spectrum of vowels and consonants before and after tonsillectomy were investigated to find out the impact of the operation on speech quality. Speech recordings obtained from patients were analyzed using the Kay Elemetrics, Multi-Dimensional Voice Processing (MDVP Advanced) software. Examination of the time-course changes after the operation revealed that certain speech parameters changed. These changes were mainly F3 (formant center frequency) and B3 (formant bandwidth) for the vowel /o/ and a slight decrease in B1 and B2 for the vowel /a/. The noise-to-harmonic ratio (NHR) also decreased slightly, suggesting less nasalized vowels. It was also observed that the fricative, glottal consonant /h/ has been affected. The larger the tonsil had been, the more changes were seen in the speech spectrum. The changes in the speech characteristics (except F3 and B3 for the vowel /o/) tended to recover, suggesting an involvement of auditory feedback and/or replacement of a new soft tissue with the tonsils. Although the changes were minimal and, therefore, have little effect on the extracted acoustic parameters, they cannot be disregarded for those relying on their voice for professional reasons, that is, singers, professional speakers, and so forth.  相似文献   

5.
6.
7.
A significant body of evidence has accumulated indicating that vowel identification is influenced by spectral change patterns. For example, a large-scale study of vowel formant patterns showed substantial improvements in category separability when a pattern classifier was trained on multiple samples of the formant pattern rather than a single sample at steady state [J. Hillenbrand et al., J. Acoust. Soc. Am. 97, 3099-3111 (1995)]. However, in the earlier study all utterances were recorded in a constant /hVd/ environment. The main purpose of the present study was to determine whether a close relationship between vowel identity and spectral change patterns is maintained when the consonant environment is allowed to vary. Recordings were made of six men and six women producing eight vowels (see text) in isolation and in CVC syllables. The CVC utterances consisted of all combinations of seven initial consonants (/h,b,d,g,p,t,k/) and six final consonants (/b,d,g,p,t,k/). Formant frequencies for F1-F3 were measured every 5 ms during the vowel using an interactive editing tool. Results showed highly significant effects of phonetic environment. As with an earlier study of this type, particularly large shifts in formant patterns were seen for rounded vowels in alveolar environments [K. Stevens and A. House, J. Speech Hear. Res. 6, 111-128 (1963)]. Despite these context effects, substantial improvements in category separability were observed when a pattern classifier incorporated spectral change information. Modeling work showed that many aspects of listener behavior could be accounted for by a fairly simple pattern classifier incorporating F0, duration, and two discrete samples of the formant pattern.  相似文献   

8.
The formant hypothesis of vowel perception, where the lowest two or three formant frequencies are essential cues for vowel quality perception, is widely accepted. There has, however, been some controversy suggesting that formant frequencies are not sufficient and that the whole spectral shape is necessary for perception. Three psychophysical experiments were performed to study this question. In the first experiment, the first or second formant peak of stimuli was suppressed as much as possible while still maintaining the original spectral shape. The responses to these stimuli were not radically different from the ones for the unsuppressed control. In the second experiment, F2-suppressed stimuli, whose amplitude ratios of high- to low-frequency components were systemically changed, were used. The results indicate that the ratio changes can affect perceived vowel quality, especially its place of articulation. In the third experiment, the full-formant stimuli, whose amplitude ratios were changed from the original and whose F2's were kept constant, were used. The results suggest that the amplitude ratio is equal to or more effective than F2 as a cue for place of articulation. We conclude that formant frequencies are not exclusive cues and that the whole spectral shape can be crucial for vowel perception.  相似文献   

9.
10.
The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice.  相似文献   

11.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

12.
Thresholds of vowel formant discrimination for F1 and F2 of isolated vowels with full and partial vowel spectra were measured for normal-hearing listeners at fixed and roving speech levels. Performance of formant discrimination was significantly better for fixed levels than for roving levels with both full and partial spectra. The effect of vowel spectral range was present only for roving levels, but not for fixed levels. These results, consistent with studies of profile analysis, indicated different perceptual mechanisms for listeners to discriminate vowel formant frequency at fixed and roving levels.  相似文献   

13.
This study investigated, first, the effect of stimulus frequency on mismatch negativity (MMN), N1, and P2 components of the cortical auditory event-related potential (ERP) evoked during passive listening to an oddball sequence. The hypothesis was that these components would show frequency-related changes, reflected in their latency and magnitude. Second, the effect of stimulus complexity on those same ERPs was investigated using words and consonant-vowel tokens (CVs) discriminated on the basis of formant change. Twelve normally hearing listeners were tested with tone bursts in the speech frequency range (400/440, 1,500/1,650, and 3,000/3,300 Hz), words (/baed/ vs /daed/) and CVs (/bae/ vs /dae/). N1 amplitude and latency decreased as frequency increased. P2 amplitude, but not latency, decreased as frequency increased. Frequency-related changes in MMN were similar to those for N1, resulting in a larger MMN area to low frequency contrasts. N1 amplitude and latency for speech sounds were similar to those found for low tones but MMN had a smaller area. Overall, MMN was present in 46%-71% of tests for tone contrasts but for only 25%-32% of speech contrasts. The magnitude of N1 and MMN for tones appear to be closely related, and both reflect the tonotopicity of the auditory cortex.  相似文献   

14.
The present study investigated the relationship between functionally relevant compound gestures and single-articulator component movements of the jaw and the constrictors lower lip and tongue tip during rate-controlled syllable repetitions. In nine healthy speakers, the effects of speaking rate (3 vs 5 Hz), place of articulation, and vowel type during stop consonant-vowel repetitions (/pa/, /pi/, /ta/, /ti/) on the amplitude and peak velocity of differential jaw and constrictor opening-closing movements were measured by means of electromagnetic articulography. Rather than homogeneously scaled compound gestures, the results suggest distinct control mechanisms for the jaw and the constrictors. In particular, jaw amplitude was closely linked to vowel height during bilabial articulation, whereas the lower lip component amplitude turned out to be predominantly rate sensitive. However, the observed variability across subjects and conditions does not support the assumption that single-articulator gestures directly correspond to basic phonological units. The nonhomogeneous effects of speech rate on articulatory subsystem parameters indicate that single structures are differentially rate sensitive. On average, an increase in speech rate resulted in a more or less proportional increase of the steepness of peak velocity/amplitude scaling for jaw movements, whereas the constrictors were less rate sensitive in this respect. Negative covariation across repetitions between jaw and constrictor amplitudes has been considered an indicator of motor equivalence. Although significant in some cases, such a relationship was not consistently observed across subjects. Considering systematic sources of variability such as vowel height, speech rate, and subjects, jaw-constrictor amplitude correlations showed a nonhomogeneous pattern strongly depending on place of articulation.  相似文献   

15.
Vowel identity correlates well with the shape of the transfer function of the vocal tract, in particular the position of the first two or three formant peaks. However, in voiced speech the transfer function is sampled at multiples of the fundamental frequency (F0), and the short-term spectrum contains peaks at those frequencies, rather than at formants. It is not clear how the auditory system estimates the original spectral envelope from the vowel waveform. Cochlear excitation patterns, for example, resolve harmonics in the low-frequency region and their shape varies strongly with F0. The problem cannot be cured by smoothing: lag-domain components of the spectral envelope are aliased and cause F0-dependent distortion. The problem is severe at high F0's where the spectral envelope is severely undersampled. This paper treats vowel identification as a process of pattern recognition with missing data. Matching is restricted to available data, and missing data are ignored using an F0-dependent weighting function that emphasizes regions near harmonics. The model is presented in two versions: a frequency-domain version based on short-term spectra, or tonotopic excitation patterns, and a time-domain version based on autocorrelation functions. It accounts for the relative F0-independency observed in vowel identification.  相似文献   

16.
The effects of age, sex, and vocal tract configuration on the glottal excitation signal in speech are only partially understood, yet understanding these effects is important for both recognition and synthesis of speech as well as for medical purposes. In this paper, three acoustic measures related to the voice source are analyzed for five vowels from 3145 CVC utterances spoken by 335 talkers (8-39 years old) from the CID database [Miller et al., Proceedings of ICASSP, 1996, Vol. 2, pp. 849-852]. The measures are: the fundamental frequency (F0), the difference between the "corrected" (denoted by an asterisk) first two spectral harmonic magnitudes, H1* - H2* (related to the open quotient), and the difference between the "corrected" magnitudes of the first spectral harmonic and that of the third formant peak, H1* - A3* (related to source spectral tilt). The correction refers to compensating for the influence of formant frequencies on spectral magnitude estimation. Experimental results show that the three acoustic measures are dependent to varying degrees on age and vowel. Age dependencies are more prominent for male talkers, while vowel dependencies are more prominent for female talkers suggesting a greater vocal tract-source interaction. All talkers show a dependency of F0 on sex and on F3, and of H1* - A3* on vowel type. For low-pitched talkers (F0 < or = 175 Hz), H1* - H2* is positively correlated with F0 while for high-pitched talkers, H1* - H2* is dependent on F1 or vowel height. For high-pitched talkers there were no significant sex dependencies of H1* - H2* and H1* - A3*. The statistical significance of these results is shown.  相似文献   

17.
The phonetic identification ability of an individual (SS) who exhibits the best, or equal to the best, speech understanding of patients using the Symbion four-channel cochlear implant is described. It has been found that SS: (1) can use aspects of signal duration to form categories that are isomorphic with the phonetic categories established by listeners with normal auditory function; (2) can combine temporal and spectral cues in a normal fashion to form categories; (3) can use aspects of fricative noises to form categories that correspond to normal phonetic categories; (4) uses information from both F1 and higher formants in vowel identification; and (5) appears to identify stop consonant place of articulation on the basis of information provided by the center frequency of the burst and by the abruptness of frequency change following signal onset. SS has difficulty identifying stop consonants from the information provided by formant transitions and cannot differentially identify signals that have identical F1's and relatively low-frequency F2's. SS's performance suggests that simple speech processing strategies (filtering of the signal into four bands) and monopolar electrode design are viable options in the design of cochlear prostheses.  相似文献   

18.
Previous experiments on the perception of initial stops differing in voice-onset time have used sounds where the boundary between aspiration and voicing is clearly marked by a variety of acoustic events. In natural speech these events do not always occur at the same moment and there is disagreement among phoneticians as to which mark the onset of voicing. The three experiments described here examine how the phoneme boundary between syllable-initial, prestress /b/ and /p/ is influenced by the way in which voicing starts. In the first experiment the first 30 ms of buzz excitation is played at four different levels relative to the steady state of the vowel and with two different frequency distributions: In the F1-only conditions buzz is confined to the first formant, whereas in the F123 conditions all three formants are excited by buzz. The results reject the hypothesis that voicing is perceived to start when periodic excitation is present in the first formant. The results of the third experiment show also that buzz excitation confined to the fundamental frequency for 30 ms before the onset of full voicing (formant excitation) has little effect on the voicing boundary. The second experiment varies aspiration noise intensity and buzz onset intensity independently. Together with the first experiment it shows that: (1) at all buzz onset levels a change in aspiration intensity moves the boundary by about the 0.43 ms/dB found by Repp [Lang. Speech 27, 173-189 (1979)]; (2) when buzz onsets at levels greater than - 15 dB relative to the final vowel level, changes in buzz onset level again move the /b/-/p/ boundary by the same amount; (3) when buzz onsets at levels less than - 15 dB relative to the vowel, decreasing the buzz onset level gives more /p/- percepts than Repp's ratio predicts. This last result, taken with the results of the first experiment, may reflect a decision based on overall intensity about when voicing has started.  相似文献   

19.
The effect of speaking rate variations on second formant (F2) trajectories was investigated for a continuum of rates. F2 trajectories for the schwa preceding a voiced bilabial stop, and one of three target vocalic nuclei following the stop, were generated for utterances of the form "Put a bV here, where V was /i/,/ae/ or /oI/. Discrete spectral measures at the vowel-consonant and consonant-vowel interfaces, as well as vowel target values, were examined as potential parameters of rate variation; several different whole-trajectory analyses were also explored. Results suggested that a discrete measure at the vowel consonant (schwa-consonant) interface, the F2off value, was in many cases a good index of rate variation, provided the rates were not unusually slow (vowel durations less than 200 ms). The relationship of the spectral measure at the consonant-vowel interface, F2 onset, as well as that of the "target" for this vowel, was less clearly related to rate variation. Whole-trajectory analyses indicated that the rate effect cannot be captured by linear compressions and expansions of some prototype trajectory. Moreover, the effect of rate manipulation on formant trajectories interacts with speaker and vocalic nucleus type, making it difficult to specify general rules for these effects. However, there is evidence that a small number of speaker strategies may emerge from a careful qualitative and quantitative analysis of whole formant trajectories. Results are discussed in terms of models of speech production and a group of speech disorders that is usually associated with anomalies of speaking rate, and hence of formant frequency trajectories.  相似文献   

20.
Speech coding in the auditory nerve: III. Voiceless fricative consonants   总被引:1,自引:0,他引:1  
Responses of auditory-nerve fibers in anesthetized cats were recorded for synthetic voiceless fricative consonants. The four stimuli (/x/, /s/, /s/, and /f/) were presented at two levels corresponding to speech in which the levels of the vowels would be approximately 60 and 75 dB SPL, respectively. Discharge patterns were characterized in terms of PST histograms and their power spectra. For both stimulus levels, frequency regions in which the stimuli had considerable energy corresponded well with characteristic-frequency (CF) regions in which average discharge rates were the highest. At the higher level, the profiles of discharge rate against CF were more distinctive for the stimulus onset than for the central portion. Power spectra of PST histograms had large response components near fiber characteristic frequencies for CFs up to 3-4 kHz, as well as low-frequency components for all fibers. The relative amplitudes of these components varied for the different stimuli. In general, the formant frequencies of the fricatives did not correspond with the largest response components, except for formants below about 3 kHz. Processing schemes based on fine time patterns of discharge that were effective for vowel stimuli generally failed to extract the formant frequencies of fricatives.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号