首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Previous psychophysical studies have shown that the perceptual distinction between voiceless fricatives and affricates in consonant-vowel syllables depends primarily on frication duration, whereas amplitude rise slope was suggested as the cue in automatic classification experiments. The effects of both cues on the manner of articulation between /integral of/ and /t integral of/ were investigated. Subjects performed a forced-choice task (/integral of/ or /t integral of) in response to edited waveforms of Japanese fricatives /integral of i/, /integral of u/, and /integral of a/. We found that frication duration, onset slope, and the interaction between duration and onset slope influenced the perceptual distinction. That is, the percent of /integral of/ responses increased with an increase in frication duration (experiments 1-3). The percent of /integral of/ responses also increased with a decrease in slope steepness (experiment 3), and the relative importance between slope portions was not even but weighted at onset (experiments 1 and 2). There was an interaction between the two cues of frication duration and steepness. The relative importance of the slope cue was maximum at a frication duration of 150 ms (experiment 3). It is concluded that the frication duration and amplitude rise slope at frication onset are acoustic cues that discriminate between /integral of/ and /t integral of/, and that the two cues interact with each other.  相似文献   

2.
Natural speech consonant-vowel (CV) syllables [( f, s, theta, s, v, z, ?] followed by [i, u, a]) were computer edited to include 20-70 ms of their frication noise in 10-ms steps as measured from their onset, as well as the entire frication noise. These stimuli, and the entire syllables, were presented to 12 subjects for consonant identification. Results show that the listener does not require the entire fricative-vowel syllable in order to correctly perceive a fricative. The required frication duration depends on the particular fricative, ranging from approximately 30 ms for [s, z] to 50 ms for [f, s, v], while [theta, ?] are identified with reasonable accuracy in only the full frication and syllable conditions. Analysis in terms of the linguistic features of voicing, place, and manner of articulation revealed that fricative identification in terms of place of articulation is much more affected by a decrease in frication duration than identification in terms of voicing and manner of articulation.  相似文献   

3.
What's in a whisper?   总被引:4,自引:0,他引:4  
Whispering is a common, natural way of reducing speech perceptibility, but whether and how whispering affects consonant identification and the acoustic features presumed important for it in normal speech perception are unknown. In this experiment, untrained listeners identified 18 different whispered initial consonants significantly better than chance in nonsense syllables. The phonetic features of place and manner of articulation and, to a lesser extent, voicing, were correctly identified. Confusion matrix and acoustic analyses indicated preservation of resonance characteristics for place and manner of articulation and suggested the use of burst, aspiration, or frication duration and intensity, and/or first-formant cutback for voicing decisions.  相似文献   

4.
Log-linear models, in conjunction with the G2 statistic, were developed and applied to several existing sets of consonant confusion data. Significant interactions of consonant error patterns were found with signal-to-noise ratio (S/N), presentation level, vowel context, and low-pass and high-pass filtering. These variables also showed significant interactions with error patterns when categorized on the basis of feature classifications. Patterns of errors were significantly altered by S/N for place of articulation (front, middle, back), voicing, frication, and nasality. Low-pass filtering significantly affected error patterns when categorized by place of articulation, duration, or nasality; whereas, high-pass filtering only affected voicing and frication error patterns. This paper also demonstrates the utility of log-linear modeling techniques in applications to confusion matrix analysis: specific effects can be tested; variant cells in a matrix can be isolated with respect to a particular model of interest; diagonal cells can be eliminated from the analysis; and the matrix can be collapsed across levels of variables, with no violation of independence. Finally, log-linear techniques are suggested for development of parsimonious and predictive models of speech perception.  相似文献   

5.
6.
Several types of measurements were made to determine the acoustic characteristics that distinguish between voiced and voiceless fricatives in various phonetic environments. The selection of measurements was based on a theoretical analysis that indicated the acoustic and aerodynamic attributes at the boundaries between fricatives and vowels. As expected, glottal vibration extended over a longer time in the obstruent interval for voiced fricatives than for voiceless fricatives, and there were more extensive transitions of the first formant adjacent to voiced fricatives than for the voiceless cognates. When two fricatives with different voicing were adjacent, there were substantial modifications of these acoustic attributes, particularly for the syllable-final fricative. In some cases, these modifications leads to complete assimilation of the voicing feature. Several perceptual studies with synthetic vowel-consonant-vowel stimuli and with edited natural stimuli examined the role of consonant duration, extent and location of glottal vibration, and extent of formant transitions on the identification of the voicing characteristics of fricatives. The perceptual results were in general consistent with the acoustic observations and with expectations based on the theoretical model. The results suggest that listeners base their voicing judgments of intervocalic fricatives on an assessment of the time interval in the fricative during which there is no glottal vibration. This time interval must exceed about 60 ms if the fricative is to be judged as voiceless, except that a small correction to this threshold is applied depending on the extent to which the first-formant transitions are truncated at the consonant boundaries.  相似文献   

7.
A decomposition algorithm that uses a pitch-scaled harmonic filter was evaluated using synthetic signals and applied to mixed-source speech, spoken by three subjects, to separate the voiced and unvoiced parts. Pulsing of the noise component was observed in voiced frication, which was analyzed by complex demodulation of the signal envelope. The timing of the pulsation, represented by the phase of the anharmonic modulation coefficient, showed a step change during a vowel-fricative transition corresponding to the change in location of the noise source within the vocal tract. Analysis of fricatives [see text] demonstrated a relationship between steady-state phase and place, and f0 glides confirmed that the main cause was a place-dependent delay.  相似文献   

8.
This study investigates cross-speaker differences in the factors that predict voicing thresholds during abduction-adduction gestures in six normal women. Measures of baseline airflow, pulse amplitude, subglottal pressure, and fundamental frequency were made at voicing offset and onset during intervocalic /h/, produced in varying vowel environments and at different loudness levels, and subjected to relational analyses to determine which factors were most strongly related to the timing of voicing cessation or initiation. The data indicate that (a) all speakers showed differences between voicing offsets and onsets, but the degree of this effect varied across speakers; (b) loudness and vowel environment have speaker-specific effects on the likelihood of devoicing during /h/; and (c) baseline flow measures significantly predicted times of voicing offset and onset in all participants, but other variables contributing to voice timing differed across speakers. Overall, the results suggest that individual speakers have unique methods of achieving phonatory goals during running speech. These data contribute to the literature on individual differences in laryngeal function, and serve as a means of evaluating how well laryngeal models can reproduce the range of voicing behavior used by speakers during running speech tasks.  相似文献   

9.
This study focuses on the extraction of robust acoustic cues of labial and alveolar voiceless obstruents in German and their acoustic differences in the speech signal to distinguish them in place and manner of articulation. The investigated obstruents include the affricates [pf] and [ts], the fricatives [f] and [s] and the stops [p] and [t]. The target sounds were analyzed in word-initial and word-medial positions. The speech data for the analysis were recorded in a natural environment, deliberately containing background noise to extract robust cues only. Three methods of acoustic analysis were chosen: (1) temporal measurements to distinguish the respective obstruents in manner of articulation, (2) static spectral characteristics in terms of logarithmic distance measure to distinguish place of articulation, and (3) amplitudinal analysis of discrete frequency bands as a dynamic approach to place distinction. The results reveal that the duration of the target phonemes distinguishes these in manner of articulation. Logarithmic distance measure, as well as relative amplitude analysis of discrete frequency bands, identifies place of articulation. The present results contribute to the question, which properties are robust with respect to variation in the speech signal.  相似文献   

10.
We have examined the effects of the relative amplitude of the release burst on perception of the place of articulation of utterance-initial voiceless and voiced stop consonants. The amplitude of the burst, which occurs within the first 10-15 ms following consonant release, was systematically varied in 5-dB steps from -10 to +10 dB relative to a "normal" burst amplitude for two labial-to-alveolar synthetic speech continua--one comprising voiceless stops and the other, voiced stops. The distribution of spectral energy in the bursts for the labial and alveolar stops at the ends of the continuum was consistent with the spectrum shapes observed in natural utterances, and intermediate shapes were used for intermediate stimuli on the continuum. The results of identification tests with these stimuli showed that the relative amplitude of the burst significantly affected the perception of the place of articulation of both voiceless and voiced stops, but the effect was greater for the former than the latter. The results are consistent with a view that two basic properties contribute to the labial-alveolar distinction in English. One of these is determined by the time course of the change in amplitude in the high-frequency range (above 2500 Hz) in the few tens of ms following consonantal release, and the other is determined by the frequencies of spectral peaks associated with the second and third formants in relation to the first formant.  相似文献   

11.
This study explored the claim that invariant acoustic properties corresponding to phonetic features generalize across languages. Experiment I examined whether the same invariant properties can characterize diffuse stop consonants in Malayalam, French, and English. Results showed that, contrary to theoretical predictions, we could not distinguish labials from dentals, nor could we classify dentals and alveolars together in terms of the same invariant properties. We developed an alternative metric based on the change in the distribution of spectral energy from the burst onset to the onset of voicing. This metric classified over 91% of the stops in Malayalam, French, and English. In experiment II, we investigated whether the invariant properties defined by the metric are used by English-speaking listeners in making phonetic decisions for place of articulation. Prototype CV syllables--[b d] in the context of [i e a o u]--were synthesized. The gross shape of the spectrum was manipulated first at the burst onset, then at the onset of voicing, such that the stimulus configuration had the spectral properties prescribed by our metric for labial and dental consonants, while the formant frequencies and transitions were appropriate to the contrasting place of articulation. Results of identification tests showed that listeners were able to perceive place of articulation as a function of the relative distribution of spectral energy specified by the metric.  相似文献   

12.
Closants, or consonantlike sounds in infant vocalizations, were described acoustically using 16-kHz spectrograms and LPC or FFT analyses based on waveforms sampled at 20 or 40 kHz. The two major closant types studied were fricatives and trills. Compared to similar fricative sounds in adult speech, the fricative sounds of the 3-, 6-, 9-, and 12-month-old infants had primary spectral components at higher frequencies, i.e., to and above 14 kHz. Trill rate varied from 16-180 Hz with a mean of about 100, approximately four times the mean trill rate reported for adult talkers. Acoustic features are described for various places of articulation for fricatives and trills. The discussion of the data emphasizes dimensions of acoustic contrast that appear in infant vocalizations during the first year of life, and implications of the spectral data for auditory and motor self-stimulation by normal-hearing and hearing-impaired infants.  相似文献   

13.
This study focuses on the initial component of the stop consonant release burst, the release transient. In theory, the transient, because of its impulselike source, should contain much information about the vocal tract configuration at release, but it is usually weak in intensity and difficult to isolate from the accompanying frication in natural speech. For this investigation, a human talker produced isolated release transients of /b,d,g/ in nine vocalic contexts by whispering these syllables very quietly. He also produced the corresponding CV syllables with regular phonation for comparison. Spectral analyses showed the isolated transients to have a clearly defined formant structure, which was not seen in natural release bursts, whose spectra were dominated by the frication noise. The formant frequencies varied systematically with both consonant place of articulation and vocalic context. Perceptual experiments showed that listeners can identify both consonants and vowels from isolated transients, though not very accurately. Knowing one of the two segments in advance did not help, but when the transients were followed by a compatible synthetic, steady-state vowel, consonant identification improved somewhat. On the whole, isolated transients, despite their clear formant structure, provided only partial information for consonant identification, but no less so, it seems, than excerpted natural release bursts. The information conveyed by artificially isolated transients and by natural (frication-dominated) release bursts appears to be perceptually equivalent.  相似文献   

14.
This study tested the relationship between frequency selectivity and the minimum spacing between harmonics necessary for accurate fo discrimination. Fundamental frequency difference limens (fo DLs) were measured for ten listeners with moderate sensorineural hearing loss (SNHL) and three normal-hearing listeners for sine- and random-phase harmonic complexes, bandpass filtered between 1500 and 3500 Hz, with fo's ranging from 75 to 500 Hz (or higher). All listeners showed a transition between small (good) fo DLs at high fo's and large (poor) fo DLs at low fo's, although the fo at which this transition occurred (fo,tr) varied across listeners. Three measures thought to reflect frequency selectivity were significantly correlated to both the fo,tr and the minimum fo DL achieved at high fo's: (1) the maximum fo for which fo DLs were phase dependent, (2) the maximum modulation frequency for which amplitude modulation and quasi-frequency modulation were discriminable, and (3) the equivalent rectangular bandwidth of the auditory filter, estimated using the notched-noise method. These results provide evidence of a relationship between fo discrimination performance and frequency selectivity in listeners with SNHL, supporting "spectral" and "spectro-temporal" theories of pitch perception that rely on sharp tuning in the auditory periphery to accurately extract fo information.  相似文献   

15.
Acoustic analyses were undertaken to explore the durational characteristics of the fricatives [f,theta,s,v,delta z] as cues to initial consonant voicing in English. Based on reports on the perception of voiced-voiceless fricatives, it was expected that there would be clear-cut duration differences distinguishing voiced and voiceless fricatives. Preliminary results for three speakers indicate that, although differences emerged in the overall mean duration of voiced and voiceless fricatives, contrary to expectations, there was a great deal of overlap in the duration distribution of voiced and voiceless fricative tokens. Further research is needed to examine the role of duration as a cue to syllable-initial fricative consonant voicing in English.  相似文献   

16.
The present study investigated the relationship between functionally relevant compound gestures and single-articulator component movements of the jaw and the constrictors lower lip and tongue tip during rate-controlled syllable repetitions. In nine healthy speakers, the effects of speaking rate (3 vs 5 Hz), place of articulation, and vowel type during stop consonant-vowel repetitions (/pa/, /pi/, /ta/, /ti/) on the amplitude and peak velocity of differential jaw and constrictor opening-closing movements were measured by means of electromagnetic articulography. Rather than homogeneously scaled compound gestures, the results suggest distinct control mechanisms for the jaw and the constrictors. In particular, jaw amplitude was closely linked to vowel height during bilabial articulation, whereas the lower lip component amplitude turned out to be predominantly rate sensitive. However, the observed variability across subjects and conditions does not support the assumption that single-articulator gestures directly correspond to basic phonological units. The nonhomogeneous effects of speech rate on articulatory subsystem parameters indicate that single structures are differentially rate sensitive. On average, an increase in speech rate resulted in a more or less proportional increase of the steepness of peak velocity/amplitude scaling for jaw movements, whereas the constrictors were less rate sensitive in this respect. Negative covariation across repetitions between jaw and constrictor amplitudes has been considered an indicator of motor equivalence. Although significant in some cases, such a relationship was not consistently observed across subjects. Considering systematic sources of variability such as vowel height, speech rate, and subjects, jaw-constrictor amplitude correlations showed a nonhomogeneous pattern strongly depending on place of articulation.  相似文献   

17.
The speech of a postlingually deafened preadolescent was recorded and analyzed while a single-electrode cochlear implant (3M/House) was in operation, on two occasions after it failed (1 day and 18 days) and on three occasions after stimulation of a multichannel cochlear implant (Nucleus 22) (1 day, 6 months, and 1 year). Listeners judged 3M/House tokens to be the most normal until the subject had one year's experience with the Nucleus device. Spectrograms showed less aspiration, better formant definition and longer final frication and closure duration post-Nucleus stimulation (6 MO. NUCLEUS and 1 YEAR NUCLEUS) relative to the 3M/House and no auditory feedback conditions. Acoustic measurements after loss of auditory feedback (1 DAY FAIL and 18 DAYS FAIL) indicated a constriction of vowel space. Appropriately higher fundamental frequency for stressed than unstressed syllables, an expansion of vowel space and improvement in some aspects of production of voicing, manner and place of articulation were noted one year post-Nucleus stimulation. Loss of auditory feedback results are related to the literature on the effects of postlingual deafness on speech. Nucleus and 3M/House effects on speech are discussed in terms of speech production studies of single-electrode and multichannel patients.  相似文献   

18.
The current study explores the role of the amplitude of the fricative noise in the perception of place of articulation in voiceless fricative consonants. The amplitude of the fricative noise in naturally produced fricative-vowel utterances was varied relative to the vowel and potential changes in perceptual responses were investigated. The amplitude of the fricative noise for [s] and [s] was reduced such that the amplitude of the noise relative to the vowel was similar to [f] and [O], and, conversely, the amplitude of the fricative noise of [f] and [O] was increased such that the amplitude of the noise relative to the vowel was similar to [s] and [s]. The fricative noise was presented to listeners in both its vowel context and in isolation. Results indicated that, when the spectral properties of the fricative noise and formant transitions are compatible, the perceptual effects of the amplitude manipulation of the amplitude of the noise had a small effect on the overall identification of place of articulation, and when effects emerged, they varied across the different fricative stimuli. Moreover, although decreasing the amplitude of [s] and [s] resulted in an increase in [f] and [O] responses, increasing the amplitude of [f] and [O] did not result in an increase in [s] and [s] responses. Implications of these findings for phonetic feature theories are considered.  相似文献   

19.
The classic [MN55] confusion matrix experiment (16 consonants, white noise masker) was repeated by using computerized procedures, similar to those of Phatak and Allen (2007). ["Consonant and vowel confusions in speech-weighted noise," J. Acoust. Soc. Am. 121, 2312-2316]. The consonant scores in white noise can be categorized in three sets: low-error set [/m/, /n/], average-error set [/p/, /t/, /k/, /s/, /[please see text]/, /d/, /g/, /z/, /Z/], and high-error set /f/theta/b/, /v/, /E/,/theta/]. The consonant confusions match those from MN55, except for the highly asymmetric voicing confusions of fricatives, biased in favor of voiced consonants. Masking noise cannot only reduce the recognition of a consonant, but also perceptually morph it into another consonant. There is a significant and systematic variability in the scores and confusion patterns of different utterances of the same consonant, which can be characterized as (a) confusion heterogeneity, where the competitors in the confusion groups of a consonant vary, and (b) threshold variability, where confusion threshold [i.e., signal-to-noise ratio (SNR) and score at which the confusion group is formed] varies. The average consonant error and errors for most of the individual consonants and consonant sets can be approximated as exponential functions of the articulation index (AI). An AI that is based on the peak-to-rms ratios of speech can explain the SNR differences across experiments.  相似文献   

20.
Speechreading supplemented with auditorily presented speech parameters   总被引:2,自引:0,他引:2  
Results are reported from two experiments in which the benefit of supplementing speechreading with auditorily presented information about the speech signal was investigated. In experiment I, speechreading was supplemented with information about the prosody of the speech signal. For ten normal-hearing subjects with no experience in speechreading, the intelligibility score for sentences increased significantly when speechreading was supplemented with information about the overall amplitude of the speech signal, information about the fundamental frequency, or both. Binary information about voicing appeared not to be a significant supplement. In experiment II, the best-scoring supplements of experiment I were compared with two supplementary signals from our previous studies, i.e., information about the sound-pressure levels in two 1-oct filter bands centered at 500 and 3160 Hz, or information about the frequencies of the first and second formants from voiced speech segments. Sentence-intelligibility scores were measured for 24 normal-hearing subjects with no experience in speechreading, and for 12 normal-hearing experienced speechreaders. For the inexperienced speechreaders, the sound-pressure levels appeared to be the best supplement (87.1% correct syllables). For the experienced speechreaders, the formant-frequency information (88.6% correct), and the fundamental-frequency plus amplitude information (86.0% correct), were equally efficient supplements as the sound-pressure information (86.1% correct). Discrimination of phonemes (both consonants and vowels) was measured for the group of 24 inexperienced speechreaders. Percentage correct responses, confusion among phonemes, and the percentage of transmitted information about different types of manner and place of articulation and about the feature voicing are presented.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号