首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 839 毫秒
1.
The experiment reported here explores the ability of 4- to 5-day-old neonates to discriminate consonantal place of articulation and vowel quality using shortened CV syllables similar to those used by Blumstein and Stevens [J. Acoust. Soc. Am. 67, 648-662 (1980)], without vowel steady-state information. The results show that the initial 34-44 ms of CV stimuli provide infants with sufficient information to discriminate place of articulation differences in stop consonants ([ba] vs [da], [ba] vs [ga], [bi] vs [di], and [bi] vs [gi]) and following vowel quality ([ba] vs [bi], [da] vs [di], and [ga] vs [gi]). These results suggest that infants can discriminate syllables on the basis of the onset properties of CV signals. Furthermore, this experiment indicates that neonates require little or no exposure to speech to succeed in such a discrimination task.  相似文献   

2.
Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not.  相似文献   

3.
Amplitude change at consonantal release has been proposed as an invariant acoustic property distinguishing between the classes of stops and glides [Mack and Blumstein, J. Acoust. Soc. Am. 73, 1739-1750 (1983)]. Following procedures of Mack and Blumstein, we measured the amplitude change in the vicinity of the consonantal release for two speakers. The results for one speaker matched those of Mack and Blumstein, while those for the second speaker showed some differences. In a subsequent experiment, we tested the hypothesis that a difference in amplitude change serves as an invariant perceptual cue for distinguishing between continuants and noncontinuants, and more specifically, as a critical cue for identifying stops and glides [Shinn and Blumstein, J. Acoust. Soc. Am. 75, 1243-1252 (1984)]. Interchanging the amplitude envelopes of natural /bV/ and /wV/ syllables containing the same vowel had little effect on perception: 97% of all syllables were identified as originally produced. Thus, although amplitude change in the vicinity of consonantal release may distinguish acoustically between stops and glides with some consistency, the change is not fully invariant, and certainly does not seem to be a critical perceptual cue in natural speech.  相似文献   

4.
The contribution of the nasal murmur and the vocalic formant transitions to perception of the [m]-[n] distinction in utterance-initial position preceding [i,a,u] was investigated, extending the recent work of Kurowski and Blumstein [J. Acoust. Soc. Am. 76, 383-390 (1984)]. A variety of waveform-editing procedures were applied to syllables produced by six different talkers. Listeners' judgments of the edited stimuli confirmed that the nasal murmur makes a significant contribution to place of articulation perception. Murmur and transition information appeared to be integrated at a genuinely perceptual, not an abstract cognitive, level. This was particularly evident in [-i] context, where only the simultaneous presence of murmur and transition components permitted accurate place of articulation identification. The perceptual information seemed to be purely relational in this case. It also seemed to be context specific, since the spectral change from the murmur to the vowel onset did not follow an invariant pattern across front and back vowels.  相似文献   

5.
It has been suggested [e.g., Strange et al., J. Acoust. Soc. Am. 74, 695-705 (1983); Verbrugge and Rakerd, Language Speech 29, 39-57 (1986)] that the temporal margins of vowels in consonantal contexts, consisting mainly of the rapid CV and VC transitions of CVC's, contain dynamic cues to vowel identity that are not available in isolated vowels and that may be perceptually superior in some circumstances to cues which are inherent to the vowels proper. However, this study shows that vowel-inherent formant targets and cues to vowel-inherent spectral change (measured from nucleus to offglide sections of the vowel itself) persist in the margins of /bVb/ syllables, confirming a hypothesis of Nearey and Assmann [J. Acoust. Soc. Am. 80, 1297-1308 (1986)]. Experiments were conducted to test whether listeners might be using such vowel-inherent, rather than coarticulatory information to identify the vowels. In the first experiment, perceptual tests using "hybrid silent center" syllables (i.e., syllables which contain only brief initial and final portions of the original syllable, and in which speaker identity changes from the initial to the final portion) show that listeners' error rates and confusion matrices for vowels in /bVb/ syllables are very similar to those for isolated vowels. These results suggest that listeners are using essentially the same type of information in essentially the same way to identify both kinds of stimuli. Statistical pattern recognition models confirm the relative robustness of nucleus and vocalic offglide cues and can predict reasonably well listeners' error patterns in all experimental conditions, though performance for /bVb/ syllables is somewhat worse than for isolated vowels. The second experiment involves the use of simplified synthetic stimuli, lacking consonantal transitions, which are shown to provide information that is nearly equivalent phonetically to that of the natural silent center /bVb/ syllables (from which the target measurements were extracted). Although no conclusions are drawn about other contexts, for speakers of Western Canadian English coarticulatory cues appear to play at best a minor role in the perception of vowels in /bVb/ context, while vowel-inherent factors dominate listeners' perception.  相似文献   

6.
This study complements earlier experiments on the perception of the [m]-[n] distinction in CV syllables [B. H. Repp, J. Acoust. Soc. Am. 79, 1987-1999 (1986); B. H. Repp, J. Acoust. Soc. Am. 82, 1525-1538 (1987)]. Six talkers produced VC syllables consisting of [m] or [n] preceded by [i, a, u]. In listening experiments, these syllables were truncated from the beginning and/or from the end, or waveform portions surrounding the point of closure were replaced with noise, so as to map out the distribution of the place of articulation information for consonant perception. These manipulations revealed that the vocalic formant transitions alone conveyed about as much place of articulation information as did the nasal murmur alone, and both signal portions were about as informative in VC as in CV syllables. Nevertheless, full VC syllables were less accurately identified than full CV syllables, especially in female speech. The reason for this was hypothesized to be the relative absence of a salient spectral change between the vowel and the murmur in VC syllables. This hypothesis was supported by the relative ineffectiveness of two additional manipulations meant to disrupt the perception of relational spectral information (channel separation or temporal separation of vowel and murmur) and by subjects' poor identification scores for brief excerpts including the point of maximal spectral change. While, in CV syllables, the abrupt spectral change from the murmur to the vowel provides important additional place of articulation information, for VC syllables it seems as if the format transitions in the vowel and the murmur spectrum functioned as independent cues.  相似文献   

7.
Listeners' ability to understand speech in adverse listening conditions is partially due to the redundant nature of speech. Natural redundancies are often lost or altered when speech is filtered, such as done in AI/SII experiments. It is important to study how listeners recognize speech when the speech signal is unfiltered and the entire broadband spectrum is present. A correlational method [R. A. Lutfi, J. Acoust. Soc. Am. 97, 1333-1334 (1995); V. M. Richards and S. Zhu, J. Acoust. Soc. Am. 95, 423-424 (1994)] has been used to determine how listeners use spectral cues to perceive nonsense syllables when the full speech spectrum is present [K. A. Doherty and C. W. Turner, J. Acoust. Soc. Am. 100, 3769-3773 (1996); C. W. Turner et al., J. Acoust. Soc. Am. 104, 1580-1585 (1998)]. The experiments in this study measured spectral-weighting strategies for more naturally occurring speech stimuli, specifically sentences, using a correlational method for normal-hearing listeners. Results indicate that listeners placed the greatest weight on spectral information within bands 2 and 5 (562-1113 and 2807-11,000 Hz), respectively. Spectral-weighting strategies for sentences were also compared to weighting strategies for nonsense syllables measured in a previous study (C. W. Turner et al., 1998). Spectral-weighting strategies for sentences were different from those reported for nonsense syllables.  相似文献   

8.
At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation.  相似文献   

9.
Kewley-Port, Watson, and Foyle [J. Acoust. Soc. Am. 83, 1133-1145 (1988)] describe a study that uses several different procedures to measure thresholds for stimuli whose components differ in temporal onset. Unfortunately, misunderstandings and misconceptions (shared with other recent publications) resulted in conclusions that are both unnecessary and unwarranted. The Kewley-Port et al. article is discussed in terms of often replicated published findings on temporal order thresholds, and current misconceptions of perceptual concepts and models.  相似文献   

10.
The perception of breathiness in vowels is cued by multiple acoustic cues, including changes in aspiration noise (AH) and the open quotient (OQ) [Klatt and Klatt, J. Acoust. Soc. Am. 87(2), 820-857 (1990)]. A loudness model can be used to determine the extent to which AH masks the harmonic components in voice. The resulting "partial loudness" (PL) and loudness of AH ["noise loudness" (NL)] have been shown to be good predictors of perceived breathiness [Shrivastav and Sapienza, J. Acoust. Soc. Am. 114(1), 2217-2224 (2003)]. The levels of AH and OQ were systematically manipulated for ten synthetic vowels. Perceptual judgments of breathiness were obtained and regression functions to predict breathiness from the ratio of NL to PL (η) were derived. Results show that breathiness can be modeled as a power function of η. The power parameter of this function appears to be affected by the fundamental frequency of the vowel. A second experiment was conducted to determine if the resulting power function could estimate breathiness in a different set of voices. The breathiness of these stimuli, both natural and synthetic, was determined in a listening test. The model estimates of breathiness were highly correlated with perceptual data but the absolute predicted values showed some discrepancies.  相似文献   

11.
This note concerns the evaluation of the static acoustic radiation torque exerted by an acoustic field on a scatterer immersed in a nonviscous fluid based on far-field scattering. The radiation torque is expressed as the integral of the time-averaged flux of angular momentum over a spherical surface far removed from the scattering object with its center at the centroid of the object. That result was given previously [G. Maidanik, J. Acoust. Soc. Am. 30, 620-623 (1956)]. Another expression given recently [Z. W. Fan et al., J. Acoust. Soc. Am. 124, 2727-2732 (2008)] is simplified to this formula. Comments are made on obtaining it directly from the general theorem of angular momentum conservation in the integral form.  相似文献   

12.
According to recent theoretical accounts of place of articulation perception, global, invariant properties of the stop CV syllable onset spectrum serve as primary, innate cues to place of articulation, whereas contextually variable formant transitions constitute secondary, learned cues. By this view, one might expect that young infants would find the discrimination of place of articulation contrasts signaled by formant transition differences more difficult than those cued by gross spectral differences. Using an operant head-turning paradigm, we found that 6-month-old infants were able to discriminate two-formant stimuli contrasting in place of articulation as well as they did five-formant + burst stimuli. Apparently, neither the global properties of the onset spectrum nor simply the additional acoustic information contained in the five-formant + burst stimuli afford the infant any advantage in the discrimination task. Rather, formant transition information provides a sufficient basis for discriminating place of articulation differences.  相似文献   

13.
Pastore [J. Acoust. Soc. Am. 84, 2262-2266 (1988)] has written a lengthy response to Kewley-Port, Watson, and Foyle [J. Acoust. Soc. Am. 83, 1133-1145 (1988)]. In this reply to Pastore's letter, several of his arguments are addressed, and new data are reported which support the conclusion of the original article. That conclusion is, basically, that the temporal acuity of the auditory system does not appear to be the origin of categorical perception of speech or nonspeech sounds differing in temporal onsets.  相似文献   

14.
The goal of this study was to establish the ability of normal-hearing listeners to discriminate formant frequency in vowels in everyday speech. Vowel formant discrimination in syllables, phrases, and sentences was measured for high-fidelity (nearly natural) speech synthesized by STRAIGHT [Kawahara et al., Speech Commun. 27, 187-207 (1999)]. Thresholds were measured for changes in F1 and F2 for the vowels /I, epsilon, ae, lambda/ in /bVd/ syllables. Experimental factors manipulated included phonetic context (syllables, phrases, and sentences), sentence discrimination with the addition of an identification task, and word position. Results showed that neither longer phonetic context nor the addition of the identification task significantly affected thresholds, while thresholds for word final position showed significantly better performance than for either initial or middle position in sentences. Results suggest that an average of 0.37 barks is required for normal-hearing listeners to discriminate vowel formants in modest length sentences, elevated by 84% compared to isolated vowels. Vowel formant discrimination in several phonetic contexts was slightly elevated for STRAIGHT-synthesized speech compared to formant-synthesized speech stimuli reported in the study by Kewley-Port and Zheng [J. Acoust. Soc. Am. 106, 2945-2958 (1999)]. These elevated thresholds appeared related to greater spectral-temporal variability for high-fidelity speech produced by STRAIGHT than for formant-synthesized speech.  相似文献   

15.
The utility of phonetic features versus acoustic properties for describing perceptual relations among speech sounds was evaluated with a multidimensional scaling analysis of Miller and Nicely's [J. Acoust. Soc. Am. 27, 338-352 (1955)] consonant confusions data. The INDSCAL method and program were employed with the original data log transformed to enhance consistency with the linear INDSCAL model. A four-dimensional solution accounted for 69% of the variance and was best characterized in terms of acoustic properties of the speech signal, viz., temporal relationship of periodicity and burst onset, shape of voiced first formanant transition, shape of voiced second formanant transition, and amount of initial spectral dispersion, rather than in terms of phonetic features. The amplitude and spectral location of acoustic energy specifying each perceptual dimension were found to determine a dimension's perceptual effect as the signal was degraded by masking noise and bandpass filtering. Consequently, the perceptual bases of identification confusions between pairs of syllables were characterized in terms of the shared acoustic properties which remained salient in the degraded speech. Implications of these findings for feature-based accounts of perceptual relationships between phonemes are considered.  相似文献   

16.
Wojtczak and Viemeister [J. Acoust. Soc. Am. 106, 1917-1924 (1999)] demonstrated a close relationship between intensity difference limens (DLs) and 4-Hz amplitude modulation (AM) detection thresholds in normal-hearing acoustic listeners. The present study demonstrates a similar relationship between intensity DLs and AM detection thresholds in cochlear-implant listeners, for gated stimuli. This suggests that acoustic and cochlear-implant listeners make use of a similar decision variable to perform intensity discrimination and modulation detection tasks. It can be shown that the absence of compression in electric hearing does not preclude this possibility.  相似文献   

17.
The perceptual representation of speech is generally assumed to be discrete rather than continuous, pointing to the need for general discrete analytic models to represent observed perceptual similarities among speech sounds. The INDCLUS (INdividual Differences CLUStering) model and algorithm [J.D. Carroll and P. Arabie, Psychometrika 48, 157-169 (1983)] can provide this generality, representing symmetric three-way similarity data (stimuli X stimuli X conditions) as an additive combination of overlapping, and generally not hierarchial, clusters whose weights (which are numerical values gauging the importance of the clusters) vary both as a function of the cluster and condition being considered. INDCLUS was used to obtain a discrete representation of underlying perceptual structure in the Miller and Nicely consonant confusion data [G.A. Miller and P.E. Nicely, J. Acoust. Soc. Am. 27, 338-352 (1955)]. A 14-cluster solution accounted for 82.9% of total variance across the 17 listening conditions. The cluster composition and the variations in cluster weights as a function of stimulus degradation were interpreted in terms of the common and unique perceptual attributes of the consonants within each cluster. Low-pass filtering and noise masking selectively degraded unique attributes, especially the cues for place of articulation, while high-pass filtering degraded both unique and common attributes. The clustering results revealed that perceptual similarities among consonants are accurately modeled by additive combinations of their specific and discrete acoustic attributes whose weights are determined by the nature of the stimulus degradation.  相似文献   

18.
In many experiments on comodulation masking release (CMR), both across- and within-channel cues may be available. This makes it difficult to determine the mechanisms underlying CMR. The present study compared CMR in a flanking-band (FB) paradigm for a situation in which only across-channel cues were likely to be available [FBs placed distally from the on-frequency band (OFB)] and a situation where both across- and within-channel cues might have been available (proximally spaced FBs, for which larger CMRs have previously been observed). The use of across-channel cues was selectively disrupted using a manipulation of auditory grouping factors, following Dau et al. [J. Acoust. Soc. Am. 125, 2182-2188(2009)] and the use of within-channel cues was selectively disrupted using a manipulation called "OFB reversal," following Goldman et al. [J. Acoust. Soc. Am. 129, 3181-3193 (2011)]. The auditory grouping manipulation eliminated CMR for the distal-FB configuration and reduced CMR for the proximal-FB configuration. This may indicate that across-channel cues are available for proximal FB placement. CMR for the proximal-FB configuration persisted when both manipulations were used together, which suggests that OFB reversal does not entirely eliminate within-channel cues.  相似文献   

19.
20.
This study investigates the controversy regarding the influence of age on the acoustic reflex threshold for broadband noise, 500-, 1000-, 2000-, and 4000-Hz activators between Jerger et al. [Mono. Contemp. Audiol. 1 (1978)] and Jerger [J. Acoust. Soc. Am. 66 (1979)] on the one hand and Silman [J. Acoust. Soc. Am. 66 (1979)] and others on the other. The acoustic reflex thresholds for broadband noise, 500-, 1000-, 2000-, and 4000-Hz activators were evaluated under two measurement conditions. Seventy-two normal-hearing ears were drawn from 72 subjects ranging in age from 20-69 years. The results revealed that age was correlated with the acoustic reflex threshold for BBN activator but not for any of the tonal activators; the correlation was stronger under the 1-dB than under the 5-dB measurement condition. Also, the mean acoustic reflex thresholds for broadband noise activator were essentially similar to those reported by Jerger et al. (1978) but differed from those obtained in this study under the 1-dB measurement condition.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号