期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Acoustic analysis and perception of vowels in stuttered speech

P Howell L Vause 《The Journal of the Acoustical Society of America》1986,79(5):1571-1579

In stuttered repetitions of a syllable, the vowel that occurs often sounds like schwa even when schwa is not intended. In this article, acoustic analyses are reported which show that the spectral properties of stuttered vowels are similar to the following fluent vowel, so it would appear that the stutterers are articulating the vowel appropriately. Though spectral properties of the stuttered vowels are normal, others are unusual: The stuttered vowels are low in amplitude and short in duration. In two experiments, the effects of amplitude and duration on perception of these vowels are examined. It is shown that, if the amplitude of stuttered vowels is made normal and their duration is lengthened, they sound more like the intended vowels. These experiments lead to the conclusion that low amplitude and short duration are the factors that cause stuttered vowels to sound like schwa. This differs from the view of certain clinicians and theorists who contend that stutterers actually articulate /schwa/'s when these are heard in stuttered speech. Implications for stuttering therapy are considered. 相似文献

2.

The contribution of the excitatory source to the perception of neutral vowels in stuttered speech

P Howell M Williams 《The Journal of the Acoustical Society of America》1988,84(1):80-89

The vowel in part-word repetitions in stuttered speech often sounds neutralized. In the present article, measurements of the excitatory source made during such episodes of dysfluency are reported. These measurements show that, compared with fluent utterances, the glottal volume velocities are lower in amplitude and shorter in duration and that the energy occurs more towards the low-frequency end of the spectrum. In a first perceptual experiment, the effects of varying the amplitude and duration of the glottal source were assessed. The glottal volume velocity recordings of the /ae/ vowels used in the analyses were employed as driving sources for an articulatory synthesizer so that judgments about the vowel quality could be made. With dysfluent glottal sources (either as spoken or by editing a fluent source so that it was low in amplitude and brief), the vowels sounded more neutralized than with fluent glottal sources (as spoken or by editing a dysfluent source to increase its amplitude and lengthen it). In a second perceptual experiment, synthetic glottal volume velocities were used to verify these findings and to assess the influence of the low-frequency emphasis in the dysfluent speech. This experiment showed that spectral bias and duration both cause stuttered vowels to sound neutralized. 相似文献

3.

Coherence in children's speech perception.

S Nittrouer C S Crowther 《The Journal of the Acoustical Society of America》2001,110(4):2129-2140

Studies with adults have demonstrated that acoustic cues cohere in speech perception such that two stimuli cannot be discriminated if separate cues bias responses equally, but oppositely, in each. This study examined whether this kind of coherence exists for children's perception of speech signals, a test that first required that a contrast be found for which adults and children show similar cue weightings. Accordingly, experiment 1 demonstrated that adults, 7-, and 5-year-olds weight F2-onset frequency and gap duration similarly in "spa" versus "sa" decisions. In experiment 2, listeners of these same ages made "same" or "not-the-same" judgments for pairs of stimuli in an AX paradigm when only one cue differed, when the two cues were set within a stimulus to bias the phonetic percept towards the same category (relative to the other stimulus in the pair), and when the two cues were set within a stimulus to bias the phonetic percept towards different categories. Unexpectedly, adults' results contradicted earlier studies: They were able to discriminate stimuli when the two cues conflicted in how they biased phonetic percepts. Results for 7-year-olds replicated those of adults, but were not as strong. Only the results of 5-year-olds revealed the kind of perceptual coherence reported by earlier studies for adults. Thus, it is concluded that perceptual coherence for speech signals is present from an early age, and in fact listeners learn to overcome it under certain conditions. 相似文献

4.

Acoustic and linguistic factors in the perception of bandpass-filtered speech

Stickney GS Assmann PF 《The Journal of the Acoustical Society of America》2001,109(3):1157-1165

Speech can remain intelligible for listeners with normal hearing when processed by narrow bandpass filters that transmit only a small fraction of the audible spectrum. Two experiments investigated the basis for the high intelligibility of narrowband speech. Experiment 1 confirmed reports that everyday English sentences can be recognized accurately (82%-98% words correct) when filtered at center frequencies of 1500, 2100, and 3000 Hz. However, narrowband low predictability (LP) sentences were less accurately recognized than high predictability (HP) sentences (20% lower scores), and excised narrowband words were even less intelligible than LP sentences (a further 23% drop). While experiment 1 revealed similar levels of performance for narrowband and broadband sentences at conversational speech levels, experiment 2 showed that speech reception thresholds were substantially (>30 dB) poorer for narrowband sentences. One explanation for this increased disparity between narrowband and broadband speech at threshold (compared to conversational speech levels) is that spectral components in the sloping transition bands of the filters provide important cues for the recognition of narrowband speech, but these components become inaudible as the signal level is reduced. Experiment 2 also showed that performance was degraded by the introduction of a speech masker (a single competing talker). The elevation in threshold was similar for narrowband and broadband speech (11 dB, on average), but because the narrowband sentences required considerably higher sound levels to reach their thresholds in quiet compared to broadband sentences, their target-to-masker ratios were very different (+23 dB for narrowband sentences and -12 dB for broadband sentences). As in experiment 1, performance was better for HP than LP sentences. The LP-HP difference was larger for narrowband than broadband sentences, suggesting that context provides greater benefits when speech is distorted by narrow bandpass filtering. 相似文献

5.

Acoustic temporal modulation detection and speech perception in cochlear implant listeners

Won JH Drennan WR Nie K Jameyson EM Rubinstein JT 《The Journal of the Acoustical Society of America》2011,130(1):376-388

The goals of the present study were to measure acoustic temporal modulation transfer functions (TMTFs) in cochlear implant listeners and examine the relationship between modulation detection and speech recognition abilities. The effects of automatic gain control, presentation level and number of channels on modulation detection thresholds (MDTs) were examined using the listeners' clinical sound processor. The general form of the TMTF was low-pass, consistent with previous studies. The operation of automatic gain control had no effect on MDTs when the stimuli were presented at 65 dBA. MDTs were not dependent on the presentation levels (ranging from 50 to 75 dBA) nor on the number of channels. Significant correlations were found between MDTs and speech recognition scores. The rates of decay of the TMTFs were predictive of speech recognition abilities. Spectral-ripple discrimination was evaluated to examine the relationship between temporal and spectral envelope sensitivities. No correlations were found between the two measures, and 56% of the variance in speech recognition was predicted jointly by the two tasks. The present study suggests that temporal modulation detection measured with the sound processor can serve as a useful measure of the ability of clinical sound processing strategies to deliver clinically pertinent temporal information. 相似文献

6.

Assimilation and contrast in the phonetic perception of vowels.

S Shigeno 《The Journal of the Acoustical Society of America》1991,90(1):103-111

The perceptual mechanisms of assimilation and contrast in the phonetic perception of vowels were investigated. In experiment 1, 14 stimulus continua were generated using an /i/-/e/-/a/ vowel continuum. They ranged from a continuum with both ends belonging to the same phonemic category in Japanese, to a continuum with both ends belonging to different phonemic categories. The AXB method was employed and the temporal position of X was changed under three conditions. In each condition ten subjects were required to judge whether X was similar to A or to B. The results demonstrated that assimilation to the temporally closer sound occurs if the phonemic categories of A and B are the same and that contrast to the temporally closer sound occurs if A and B belong to different phonemic categories. It was observed that the transition from assimilation to contrast is continuous except in the /i'/-X-/e/ condition. In experiment 2, the total duration of t 1 (between A and X) and t 2 (between X and B) was changed under five conditions. One stimulus continuum consisted of the same phonemic category in Japanese and the other consisted of different phonemic categories. Six subjects were required to make similarity judgements of X. The results demonstrated that the occurrence of assimilation and contrast to the temporally closer sound seemed to be constant under each of the five conditions. The present findings suggest that assimilation and contrast are determined by three factors: the temporal position of the three stimuli, the acoustic distance between the three stimuli on the stimulus continuum, and the phonemic categories of the three stimuli. 相似文献

7.

Effect of cochlear implants on children's perception and production of speech prosody

Nakata T Trehub SE Kanda Y 《The Journal of the Acoustical Society of America》2012,131(2):1307-1314

Japanese 5- to 13-yr-olds who used cochlear implants (CIs) and a comparison group of normally hearing (NH) Japanese children were tested on their perception and production of speech prosody. For the perception task, they were required to judge whether semantically neutral utterances that were normalized for amplitude were spoken in a happy, sad, or angry manner. The performance of NH children was error-free. By contrast, child CI users performed well below ceiling but above chance levels on happy- and sad-sounding utterances but not on angry-sounding utterances. For the production task, children were required to imitate stereotyped Japanese utterances expressing disappointment and surprise as well as culturally typically representations of crow and cat sounds. NH 5- and 6-year-olds produced significantly poorer imitations than older hearing children, but age was unrelated to the imitation quality of child CI users. Overall, child CI user's imitations were significantly poorer than those of NH children, but they did not differ significantly from the imitations of the youngest NH group. Moreover, there was a robust correlation between the performance of child CI users on the perception and production tasks; this implies that difficulties with prosodic perception underlie their difficulties with prosodic imitation. 相似文献

8.

Acoustic recognition of voice disorders: a comparative study of running speech versus sustained vowels 总被引：4，自引：0，他引：4

F Klingholtz 《The Journal of the Acoustical Society of America》1990,87(5):2218-2224

The signals of running speech and sustained vowels of normals and subjects suffering from dysphonia were analyzed statistically with respect to the signal-to-noise ratio (SNR). The distribution of the SNR measured in multiple overlapping frames in the speech signal was described by a linear combination of the distribution frequencies for SNR = 0 dB, 0 dB less than SNR less than 15 dB, and SNR greater than or equal to 15 dB. The values of the linear combination, the SNR of the vowels, and clinical assignment of the voices to normal and pathologic populations based on laryngoscopic and stroboscopic investigation parameters were used to compare the different evaluations of the voices. The SNR distribution in speech remained stable over signal lengths of more than 30 s. The correlation coefficient between the SNR measure for running speech and the SNR of sustained vowels amounted to only 0.63. The error rate in the discrimination between normal and dysphonic voices amounted to 22.6% in application to sustained vowels and 5.6% when the SNR distribution was used. Possible reasons for the observed discrepancies are discussed, and the results are compared to those of other studies. 相似文献

9.

Acoustic evidence for dynamic formant trajectories in Australian English vowels.

C I Watson J Harrington 《The Journal of the Acoustical Society of America》1999,106(1):458-468

The extent to which it is necessary to model the dynamic behavior of vowel formants to enable vowel separation has been the subject of debate in recent years. To investigate this issue, a study has been made on the vowels of 132 Australian English speakers (male and female). The degree of vowel separation from the formant values at the target was contrasted to that from modeling the formant contour with discrete cosine transform coefficients. The findings are that, although it is necessary to model the formant contour to separate out the diphthongs, the formant values at the target, plus vowel duration are sufficient to separate out the monophthongs. However, further analysis revealed that there are formant contour differences which benefit the within-class separation of the tense/lax monophthong pairs. 相似文献

10.

Modeling the perception of concurrent vowels: vowels with different fundamental frequencies 总被引：5，自引：0，他引：5

P F Assmann Q Summerfield 《The Journal of the Acoustical Society of America》1990,88(2):680-697

If two vowels with different fundamental frequencies (fo's) are presented simultaneously and monaurally, listeners often hear two talkers producing different vowels on different pitches. This paper describes the evaluation of four computational models of the auditory and perceptual processes which may underlie this ability. Each model involves four stages: (i) frequency analysis using an "auditory" filter bank, (ii) determination of the pitches present in the stimulus, (iii) segregation of the competing speech sources by grouping energy associated with each pitch to create two derived spectral patterns, and (iv) classification of the derived spectral patterns to predict the probabilities of listeners' vowel-identification responses. The "place" models carry out the operations of pitch determination and spectral segregation by analyzing the distribution of rms levels across the channels of the filter bank. The "place-time" models carry out these operations by analyzing the periodicities in the waveforms in each channel. In their "linear" versions, the place and place-time models operate directly on the waveforms emerging from the filters. In their "nonlinear" versions, analogous operations are applied to the output of an additional stage which applied a compressive nonlinearity to the filtered waveforms. Compared to the other three models, the nonlinear place-time model provides the most accurate estimates of the fo's of paris of concurrent synthetic vowels and comes closest to predicting the identification responses of listeners to such stimuli. Although the model has several limitations, the results are compatible with the idea that a place-time analysis is used to segregate competing sound sources. 相似文献

11.

Acoustic and perceptual effects of changes in vocal tract constrictions for vowels.

T Gay L J Boé P Perrier 《The Journal of the Acoustical Society of America》1992,92(3):1301-1309

The purpose of this study was to use vocal tract simulation and synthesis as means to determine the acoustic and perceptual effects of changing both the cross-sectional area and location of vocal tract constrictions for six different vowels: Area functions at and near vocal tract constrictions are considered critical to the acoustic output and are also the central point of hypotheses concerning speech targets. Area functions for the six vowels, [symbol: see text] were perturbed by changing the cross-sectional area of the constriction (Ac) and the location of the constriction (Xc). Perturbations for Ac were performed for different values of Xc, producing several series of acoustic continua for the different vowels. Acoustic simulations for the different area functions were made using a frequency domain model of the vocal tract. Each simulated vowel was then synthesized as a 1-s duration steady-state segment. The phoneme boundaries of the perturbed synthesized vowels were determined by formal perception tests. Results of the perturbation analyses showed that formants for each of the vowels were more sensitive to changes in constriction cross-sectional area than changes in constriction location. Vowel perception, however, was highly resistant to both types of changes. Results are discussed in terms of articulatory precision and constriction-related speech production strategies. 相似文献

12.

Context-independent dynamic information for the perception of coarticulated vowels.

J J Jenkins W Strange S A Trent 《The Journal of the Acoustical Society of America》1999,106(1):438-448

Most investigators agree that the acoustic information for American English vowels includes dynamic (time-varying) parameters as well as static "target" information contained in a single cross section of the syllable. Using the silent-center (SC) paradigm, the present experiment examined the case in which the initial and final portions of stop consonant-vowel-stop consonant (CVC) syllables containing the same vowel but different consonants were recombined into mixed-consonant SC syllables and presented to listeners for vowel identification. Ten vowels were spoken in six different syllables, /b Vb, bVd, bVt, dVb, dVd, dVt/, embedded in a carrier sentence. Initial and final transitional portions of these syllables were cross-matched in: (1) silent-center syllables with original syllable durations (silences) preserved (mixed-consonant SC condition) and (2) mixed-consonant SC syllables with syllable duration equated across the ten vowels (fixed duration mixed-consonant SC condition). Vowel-identification accuracy in these two mixed consonant SC conditions was compared with performance on the original SC and fixed duration SC stimuli, and in initial and final control conditions in which initial and final transitional portions were each presented alone. Vowels were identified highly accurately in both mixed-consonant SC and original syllable SC conditions (only 7%-8% overall errors). Neutralizing duration information led to small, but significant, increases in identification errors in both mixed-consonant and original fixed-duration SC conditions (14%-15% errors), but performance was still much more accurate than for initial and finals control conditions (35% and 52% errors, respectively). Acoustical analysis confirmed that direction and extent of formant change from initial to final portions of mixed-consonant stimuli differed from that of original syllables, arguing against a target + offglide explanation of the perceptual results. Results do support the hypothesis that temporal trajectories specifying "style of movement" provide information for the differentiation of American English tense and lax vowels, and that this information is invariant over the place of articulation and voicing of the surrounding stop consonants. 相似文献

13.

Modeling the perception of concurrent vowels: vowels with the same fundamental frequency

P F Assmann Q Summerfield 《The Journal of the Acoustical Society of America》1989,85(1):327-338

The ability of listeners to identify pairs of simultaneous synthetic vowels has been investigated in the first of a series of studies on the extraction of phonetic information from multiple-talker waveforms. Both members of the vowel pair had the same onset and offset times and a constant fundamental frequency of 100 Hz. Listeners identified both vowels with an accuracy significantly greater than chance. The pattern of correct responses and confusions was similar for vowels generated by (a) cascade formant synthesis and (b) additive harmonic synthesis that replaced each of the lowest three formants with a single pair of harmonics of equal amplitude. In order to choose an appropriate model for describing listeners' performance, four pattern-matching procedures were evaluated. Each predicted the probability that (i) any individual vowel would be selected as one of the two responses, and (ii) any pair of vowels would be selected. These probabilities were estimated from measures of the similarities of the auditory excitation patterns of the double vowels to those of single-vowel reference patterns. Up to 88% of the variance in individual responses and up to 67% of the variance in pairwise responses could be accounted for by procedures that highlighted spectral peaks and shoulders in the excitation pattern. Procedures that assigned uniform weight to all regions of the excitation pattern gave poorer predictions. These findings support the hypothesis that the auditory system pays particular attention to the frequencies of spectral peaks, and possibly also of shoulders, when identifying vowels. One virtue of this strategy is that the spectral peaks and shoulders can indicate the frequencies of formants when other aspects of spectral shape are obscured by competing sounds. 相似文献

14.

Auditory spectral integration in the perception of diphthongal vowels

Fox RA Jacewicz E Chang CY 《The Journal of the Acoustical Society of America》2010,128(4):2070-2074

This study considers an operation of an auditory spectral integration process which may be involved in perceiving dynamic time-varying changes in speech found in diphthongs and glide-type transitions. Does the auditory system need explicit vowel formants to track the dynamic changes over time? Listeners classified diphthongs on the basis of a moving center of gravity (COG) brought about by changing intensity ratio of static spectral components instead of changing an F2. Listeners were unable to detect COG movement only when the F2 change was small (160 Hz) or when the separation between the static components was large (4.95 bark). 相似文献

15.

Native Italian speakers' perception and production of English vowels 总被引：2，自引：0，他引：2

Flege JE MacKay IR Meador D 《The Journal of the Acoustical Society of America》1999,106(5):2973-2987

This study examined the production and perception of English vowels by highly experienced native Italian speakers of English. The subjects were selected on the basis of the age at which they arrived in Canada and began to learn English, and how much they continued to use Italian. Vowel production accuracy was assessed through an intelligibility test in which native English-speaking listeners attempted to identify vowels spoken by the native Italian subjects. Vowel perception was assessed using a categorial discrimination test. The later in life the native Italian subjects began to learn English, the less accurately they produced and perceived English vowels. Neither of two groups of early Italian/English bilinguals differed significantly from native speakers of English either for production or perception. This finding is consistent with the hypothesis of the speech learning model [Flege, in Speech Perception and Linguistic Experience: Theoretical and Methodological Issues (York, Timonium, MD, 1995)] that early bilinguals establish new categories for vowels found in the second language (L2). The significant correlation observed to exist between the measures of L2 vowel production and perception is consistent with another hypothesis of the speech learning model, viz., that the accuracy with which L2 vowels are produced is limited by how accurately they are perceived. 相似文献

16.

Acoustic and perceptual similarity of Japanese and American English vowels

Nishi K Strange W Akahane-Yamada R Kubo R Trent-Brown SA 《The Journal of the Acoustical Society of America》2008,124(1):576-588

Acoustic and perceptual similarities between Japanese and American English (AE) vowels were investigated in two studies. In study 1, a series of discriminant analyses were performed to determine acoustic similarities between Japanese and AE vowels, each spoken by four native male speakers using F1, F2, and vocalic duration as input parameters. In study 2, the Japanese vowels were presented to native AE listeners in a perceptual assimilation task, in which the listeners categorized each Japanese vowel token as most similar to an AE category and rated its goodness as an exemplar of the chosen AE category. Results showed that the majority of AE listeners assimilated all Japanese vowels into long AE categories, apparently ignoring temporal differences between 1- and 2-mora Japanese vowels. In addition, not all perceptual assimilation patterns reflected context-specific spectral similarity patterns established by discriminant analysis. It was hypothesized that this incongruity between acoustic and perceptual similarity may be due to differences in distributional characteristics of native and non-native vowel categories that affect the listeners' perceptual judgments. 相似文献

17.

Acoustic and perceptual correlates of the non-nasal--nasal distinction for vowels 总被引：1，自引：0，他引：1

S Hawkins K N Stevens 《The Journal of the Acoustical Society of America》1985,77(4):1560-1575

For each of five vowels [i e a o u] following [t], a continuum from non-nasal to nasal was synthesized. Nasalization was introduced by inserting a pole-zero pair in the vicinity of the first formant in an all-pole transfer function. The frequencies and spacing of the pole and zero were systematically varied to change the degree of nasalization. The selection of stimulus parameters was determined from acoustic theory and the results of pilot experiments. The stimuli were presented for identification and discrimination to listeners whose language included a non-nasal--nasal vowel opposition (Gujarati, Hindi, and Bengali) and to American listeners. There were no significant differences between language groups in the 50% crossover points of the identification functions. Some vowels were more influenced by range and context effects than were others. The language groups showed some differences in the shape of the discrimination functions for some vowels. On the basis of the results, it is postulated that (1) there is a basic acoustic property of nasality, independent of the vowel, to which the auditory system responds in a distinctive way regardless of language background; and (2) there are one or more additional acoustic properties that may be used to various degrees in different languages to enhance the contrast between a nasal vowel and its non-nasal congener. A proposed candidate for the basic acoustic property is a measure of the degree of prominence of the spectral peak in the vicinity of the first formant. Additional secondary properties include shifts in the center of gravity of the low-frequency spectral prominence, leading to a change in perceived vowel height, and changes in overall spectral balance. 相似文献

18.

Acoustic and perceptual similarity of North German and American English vowels

Strange W Bohn OS Trent SA Nishi K 《The Journal of the Acoustical Society of America》2004,115(4):1791-1807

Current theories of cross-language speech perception claim that patterns of perceptual assimilation of non-native segments to native categories predict relative difficulties in learning to perceive (and produce) non-native phones. Cross-language spectral similarity of North German (NG) and American English (AE) vowels produced in isolated hVC(a) (di)syllables (study 1) and in hVC syllables embedded in a short sentence (study 2) was determined by discriminant analyses, to examine the extent to which acoustic similarity was predictive of perceptual similarity patterns. The perceptual assimilation of NG vowels to native AE vowel categories by AE listeners with no German language experience was then assessed directly. Both studies showed that acoustic similarity of AE and NG vowels did not always predict perceptual similarity, especially for "new" NG front rounded vowels and for "similar" NG front and back mid and mid-low vowels. Both acoustic and perceptual similarity of NG and AE vowels varied as a function of the prosodic context, although vowel duration differences did not affect perceptual assimilation patterns. When duration and spectral similarity were in conflict, AE listeners assimilated vowels on the basis of spectral similarity in both prosodic contexts. 相似文献

19.

Acoustic analysis of the effects of sustained wakefulness on speech

Vogel AP Fletcher J Maruff P 《The Journal of the Acoustical Society of America》2010,128(6):3747-3756

Exposing healthy adults to extended periods of wakefulness is known to induce changes in psychomotor functioning [Maruff et al. (2005). J. Sleep Res. 14, 21-27]. The effect of fatigue on speech is less well understood. To date, no studies have examined the pitch and timing of neurologically healthy individuals over 24 h of sustained wakefulness. Therefore, speech samples were systematically acquired (e.g., every 4 h) from 18 healthy adults over 24 h. Stimuli included automated and extemporaneous speech tasks, sustained vowel, and a read passage. Measures of timing, frequency and spectral energy were derived acoustically using PRAAT and significant changes were observed on all tasks. The effect of fatigue on speech was found to be strongest just before dawn (after 22 h). Specifically, total speech time, mean pause length, and total signal time all increased as a function of increasing levels of fatigue on the reading tasks; percentage pause and mean pause length decreased on the counting task; F4 variation decreased on the sustained vowel tasks /a:/; and alpha ratio increased on the extemporaneous speech tasks. These findings suggest that acoustic methodologies provide objective data on central nervous system functioning and that changes in speech production occur in healthy adults after just 24 h of sustained wakefulness. 相似文献

20.

Spanish listeners' perception of American and Southern British English vowels

Escudero P Chládková K 《The Journal of the Acoustical Society of America》2010,128(5):EL254-EL259

L2 studies demonstrate that learners differ in their speech perception patterns. Recent explanations attribute this variation to the different initial stages with which learners start their L2 development. Spanish listeners' categorization of Standard Southern British English and American English vowels is compared. The results show that, on the basis of steady-state F1 and F2 values, listeners classify the vowels of these two English varieties differently. This finding suggests that the dialect to which learners are exposed determines their initial stage for L2 perception and the tasks they need to perform to successfully acquire a new sound system. 相似文献