首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 187 毫秒
1.
To examine the role of perceived gender on fricative identification, a study was conducted in which listeners identified /s/-/∫/ and /s/-/θ/ continua combined with vowels produced by a man and a woman. These were acoustically modified to be consistent with different-sized vocal tracts (VT), and were presented with pictures of men or women. Listeners identified more tokens of /s/ in the /s/-/∫/ and more tokens of /θ/ in the /s/-/θ/ continuum when these sounds were combined with men's vowels, with vowels consistent with a 17 cm VT, and with pictures of men. Results support the hypothesis that listeners incorporate information about talker gender during fricative perception.  相似文献   

2.
Ten American English vowels were sung in a /b/-vowel-/d/ consonantal context by a professional countertenor in full voice (at F0 = 130, 165, 220, 260, and 330 Hz) and in head voice (at F0 = 220, 260, 330, 440, and 520 Hz). Four identification tests were prepared using the entire syllable or the center 200-ms portion of either the full-voice tokens or the head-voice tokens. Listeners attempted to identify each vowel by circling the appropriate word on their answer sheets. Errors were more frequent when the vowels were sung at higher F0. In addition, removal of the consonantal context markedly increased identification errors for both the head-voice and full-voice conditions. Back vowels were misidentified significantly more often than front vowels. For equal F0 values, listeners were significantly more accurate in identifying the head-voice stimuli. Acoustical analysis suggests that the difference of intelligibility between head and full voice may have been due to the head voice having more energy in the first harmonic than the full voice.  相似文献   

3.
Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers.  相似文献   

4.
Traditional interval or ordinal rating scale protocols appear to be poorly suited to measuring vocal quality. To investigate why this might be so, listeners were asked to classify pathological voices as having or not having different voice qualities. It was reasoned that this simple task would allow listeners to focus on the kind of quality a voice had, rather than how much of a quality it possessed, and thus might provide evidence for the validity of traditional vocal qualities. In experiment 1, listeners judged whether natural pathological voice samples were or were not primarily breathy and rough. Listener agreement in both tasks was above chance, but listeners agreed poorly that individual voices belonged in particular perceptual classes. To determine whether these results reflect listeners' difficulty agreeing about single perceptual attributes of complex stimuli, listeners in experiment 2 classified natural pathological voices and synthetic stimuli (varying in f0 only) as low pitched or not low pitched. If disagreements derive from difficulties dividing an auditory continuum consistently, then patterns of agreement should be similar for both kinds of stimuli. In fact, listener agreement was significantly better for the synthetic stimuli than for the natural voices. Difficulty isolating single perceptual dimensions of complex stimuli thus appears to be one reason why traditional unidimensional rating protocols are unsuited to measuring pathologic voice quality. Listeners did agree that a few aphonic voices were breathy, and that a few voices with prominent vocal fry and/or interharmonics were rough. These few cases of agreement may have occurred because the acoustic characteristics of the voices in question corresponded to the limiting case of the quality being judged. Values of f0 that generated listener agreement in experiment 2 were more extreme for natural than for synthetic stimuli, consistent with this interpretation.  相似文献   

5.
This study investigated the extent to which language familiarity affects the perception of the indexical properties of speech by testing listeners' identification and discrimination of bilingual talkers across two different languages. In one experiment, listeners were trained to identify bilingual talkers speaking in only one language and were then tested on their ability to identify the same talkers speaking in another language. In the second experiment, listeners discriminated between bilingual talkers across languages in an AX discrimination paradigm. The results of these experiments indicate that there is sufficient language-independent indexical information in speech for listeners to generalize knowledge of talkers' voices across languages and to successfully discriminate between bilingual talkers regardless of the language they are speaking. However, the results of these studies also revealed that listeners do not solely rely on language-independent information when performing these tasks. Listeners use language-dependent indexical cues to identify talkers who are speaking a familiar language. Moreover, the tendency to perceive two talkers as the "same" or "different" depends on whether the talkers are speaking in the same language. The combined results of these experiments thus suggest that indexical processing relies on both language-dependent and language-independent information in the speech signal.  相似文献   

6.
Little is known about the perceptual importance of changes in the shape of the source spectrum, although many measures have been proposed and correlations with different vocal qualities (breathiness, roughness, nasality, strain...) have frequently been reported. This study investigated just-noticeable differences in the relative amplitudes of the first two harmonics (H1-H2) for speakers of Mandarin and English. Listeners heard pairs of vowels that differed only in the amplitude of the first harmonic and judged whether or not the voice tokens were identical in voice quality. Across voices and listeners, just-noticeable-differences averaged 3.18 dB. This value is small relative to the range of values across voices, indicating that H1-H2 is a perceptually valid acoustic measure of vocal quality. For both groups of listeners, differences in the amplitude of the first harmonic were easier to detect when the source spectral slope was steeply falling so that F0 dominated the spectrum. Mandarin speakers were significantly more sensitive (by about 1 dB) to differences in first harmonic amplitudes than were English speakers. Two explanations for these results are possible: Mandarin speakers may have learned to hear changes in harmonic amplitudes due to changes in voice quality that are correlated with the tones of Mandarin; or Mandarin speakers' experience with tonal contrasts may increase their sensitivity to small differences in the amplitude of F0 (which is also the first harmonic).  相似文献   

7.
Eight monolingual Japanese listeners were trained to identify English /r/ and /l/ by using 560 training tokens produced by ten talkers in three positions (200 word initial, 200 consonant cluster, and 160 intervocalic tokens). Their baseline performance and transfer of learning were measured using 200 word initial and 200 consonant cluster tokens produced by additional ten talkers. Long-term training (15 days) with feedback indeed increased sensitivity to the nontraining tokens, but tremendous individual differences were found in terms of initial and final sensitivity and response bias. Even after training, however, there remained some tokens for each subject that were misidentified at a level significantly below chance, suggesting that truly nativelike identification of /r/ and /l/ may never be achieved by adult Japanese learners of English.  相似文献   

8.
The purpose of this study was to examine the effect of reduced vowel working space on dysarthric talkers' speech intelligibility using both acoustic and perceptual approaches. In experiment 1, the acoustic-perceptual relationship between vowel working space area and speech intelligibility was examined in Mandarin-speaking young adults with cerebral palsy. Subjects read aloud 18 bisyllabic words containing the vowels /i/, /a/, and /u/ using their normal speaking rate. Each talker's words were identified by three normal listeners. The percentage of correct vowel and word identification were calculated as vowel intelligibility and word intelligibility, respectively. Results revealed that talkers with cerebral palsy exhibited smaller vowel working space areas compared to ten age-matched controls. The vowel working space area was significantly correlated with vowel intelligibility (r=0.632, p<0.005) and with word intelligibility (r=0.684, p<0.005). Experiment 2 examined whether tokens of expanded vowel working spaces were perceived as better vowel exemplars and represented with greater perceptual spaces than tokens of reduced vowel working spaces. The results of the perceptual experiment support this prediction. The distorted vowels of talkers with cerebral palsy compose a smaller acoustic space that results in shrunken intervowel perceptual distances for listeners.  相似文献   

9.
The goal of this study was to determine if there are acoustical differences between male and female voices, and if there are, where exactly do these differences lie. Extended speech samples were used. The recorded readings of a text by 31 women and by 24 men were analyzed by means of the Long-term Spectrum (LTAS), extracting the amplitude values (in decibels) at intervals of 160 Hz over a range of 8 kHz. The results showed a significant difference between genders, as well as an interaction of gender and frequency level. The female voice showed greater levels of aspiration noise, located in the spectral regions corresponding to the third formant, which causes the female voice to have a more “breathy” quality than the male voice. The lower spectral tilt in the women's voices is another consequence of this presence of greater aspiration noise.  相似文献   

10.
SUMMARY: In recent years, the multiparametric approach for evaluating perceptual rating of voice quality has been advocated. This study evaluates the accuracy of predicting perceived overall severity of voice quality with a minimal set of aerodynamic, voice range profile (phonetogram), and acoustic perturbation measures. One hundred and twelve dysphonic persons (93 women and 19 men) with laryngeal pathologies and 41 normal controls (35 women and six men) with normal voices participated in this study. Perceptual severity judgement was carried out by four listeners rating the G (overall grade) parameter of the GRBAS scale. The minimal set of instrumental measures was selected based on the ability of the measure to discriminate between dysphonic and normal voices, and to attain at least a moderate correlation with perceived overall severity. Results indicated that perceived overall severity was best described by maximum phonation time of sustained /a/, peak intraoral pressure of the consonant-vowel /pi/ strings production, voice range profile area, and acoustic jitter. Direct-entry discriminant function analysis revealed that these four voice measures in combination correctly predicted 67.3% of perceived overall severity levels.  相似文献   

11.
Two sounds with the same pitch may vary from each other based on saliency of their pitch sensation. This perceptual attribute is called "pitch strength." The study of voice pitch strength may be important in quantifying of normal and pathological qualities. The present study investigated how pitch strength varies across normal and dysphonic voices. A set of voices (vowel /a/) selected from the Kay Elemetrics Disordered Voice Database served as the stimuli. These stimuli demonstrated a wide range of voice quality. Ten listeners judged the pitch strength of these stimuli in an anchored magnitude estimation task. On a given trial, listeners heard three different stimuli. The first stimulus represented very low pitch strength (wide-band noise), the second stimulus consisted of the target voice and the third stimulus represented very high pitch strength (pure tone). Listeners estimated pitch strength of the target voice by positioning a continuous slider labeled with values between 0 and 1, reflecting the two anchor stimuli. Results revealed that listeners can judge pitch strength reliably in dysphonic voices. Moderate to high correlations with perceptual judgments of voice quality suggest that pitch strength may contribute to voice quality judgments.  相似文献   

12.
Recent findings in the domains of word and talker recognition reveal that listeners use previous experience with an individual talker's voice to facilitate subsequent perceptual processing of that talker's speech. These findings raise the possibility that listeners are sensitive to talker-specific acoustic-phonetic properties. The present study tested this possibility directly by examining listeners' sensitivity to talker differences in the voice-onset-time (VOT) associated with a word-initial voiceless stop consonant. Listeners were trained on the speech of two talkers. Speech synthesis was used to manipulate the VOTs of these talkers so that one had short VOTs and the other had long VOTs (counterbalanced across listeners). The results of two experiments using a paired-comparison task revealed that, when presented with a short- versus long-VOT variant of a given talker's speech, listeners could select the variant consistent with their experience of that talker's speech during training. This was true when listeners were tested on the same word heard during training and when they were tested on a different word spoken by the same talker, indicating that listeners generalized talker-specific VOT information to a novel word. Such sensitivity to talker-specific acoustic-phonetic properties may subserve at least in part listeners' capacity to benefit from talker-specific experience.  相似文献   

13.
Using only three measures of the waveform, the zero-crossing rate, the logarithm of the root-mean-square (rms) energy, and the derivative of the log rms energy with respect to time [termed rate of rise (ROR)], voiceless plosives (including affricates) can be distinguished from voiceless fricatives in word-initial, medial, and final positions. Peaks in the ROR contour are considered for significance to the plosive/fricative distinction by examining the log rms energy and zero-crossing rate. Then, the magnitude of the first significant peak in the ROR contour is used as the primary classifier. The algorithm was tested on 1364 tokens (720 word-initial tokens produced by four female and four male speakers; 360 word-medial tokens produced by two males and two females; 320 word-final tokens produced by two males and two females). Data from two male and two female speakers (360 word-initial tokens) were used as a training set, and the remaining data were used as a test set. The overall rate of correct classification was 96.8%. Implications of this result are discussed.  相似文献   

14.
The effect of talker and token variability on speech perception has engendered a great deal of research. However, most of this research has compared listener performance in multiple-talker (or variable) situations to performance in single-talker conditions. It remains unclear to what extent listeners are affected by the degree of variability within a talker, rather than simply the existence of variability (being in a multitalker environment). The present study has two goals: First, the degree of variability among speakers in their /s/ and /S/ productions was measured. Even among a relatively small pool of talkers, there was a range of speech variability: some talkers had /s/ and /S/ categories that were quite distinct from one another in terms of frication centroid and skewness, while other speakers had categories that actually overlapped one another. The second goal was to examine whether this degree of variability within a talker influenced perception. Listeners were presented with natural /s/ and /S/ tokens for identification, under ideal listening conditions, and slower response times were found for speakers whose productions were more variable than for speakers with more internal consistency in their speech. This suggests that the degree of variability, not just the existence of it, may be the more critical factor in perception.  相似文献   

15.
Vowel identification was tested in quiet, noise, and reverberation with 20 normal-hearing subjects and 20 hearing-impaired subjects. Stimuli were 15 English vowels spoken in a /b-t/context by six male talkers. Each talker produced five tokens of each vowel. In quiet, all stimuli were identified by two judges as the intended targets. The stimuli were degraded by reverberation or speech-spectrum noise. Vowel identification scores depended upon talker, listening condition, and subject type. The relationship between identification errors and spectral details of the vowels is discussed.  相似文献   

16.
The indirect auditory feedback from one's own voice arises from sound reflections at the room boundaries or from sound reinforcement systems. The relative variations of indirect auditory feedback are quantified through room acoustic parameters such as the room gain and the voice support, rather than the reverberation time. Fourteen subjects matched the loudness level of their own voice (the autophonic level) to that of a constant and external reference sound, under different synthesized room acoustics conditions. The matching voice levels are used to build a set of equal autophonic level curves. These curves give an indication of the amount of variation in voice level induced by the acoustic environment as a consequence of the sidetone compensation or Lombard effect. In the range of typical rooms for speech, the variations in overall voice level that result in a constant autophonic level are on the order of 2 dB, and more than 3 dB in the 4 kHz octave band. By comparison of these curves with previous studies, it is shown that talkers use acoustic cues other than loudness to adjust their voices when speaking in different rooms.  相似文献   

17.
The current investigation studied whether adults, children with normally developing language aged 4-5 years, and children with specific language impairment, aged 5-6 years identified vowels on the basis of steady-state or transitional formant frequencies. Four types of synthetic tokens, created with a female voice, served as stimuli: (1) steady-state centers for the vowels [i] and [ae]; (2) voweless tokens with transitions appropriate for [bib] and [baeb]; (3) "congruent" tokens that combined the first two types of stimuli into [bib] and [baeb]; and (4) "conflicting" tokens that combined the transitions from [bib] with the vowel from [baeb] and vice versa. Results showed that children with language impairment identified the [i] vowel more poorly than other subjects for both the voweless and congruent tokens. Overall, children identified vowels most accurately in steady-state centers and congruent stimuli (ranging between 94%-96%). They identified the vowels on the basis of transitions only from "voweless" tokens with 89% and 83.5% accuracy for the normally developing and language impaired groups, respectively. Children with normally developing language used steady-state cues to identify vowels in 87% of the conflicting stimuli, whereas children with language impairment did so for 79% of the stimuli. Adults were equally accurate for voweless, steady-state, and congruent tokens (ranging between 99% to 100% accuracy) and used both steady-state and transition cues for vowel identification. Results suggest that most listeners prefer the steady state for vowel identification but are capable of using the onglide/offglide transitions for vowel identification. Results were discussed with regard to Nittrouer's developmental weighting shift hypothesis and Strange and Jenkin's dynamic specification theory.  相似文献   

18.
Temporal information provided by cochlear implants enables successful speech perception in quiet, but limited spectral information precludes comparable success in voice perception. Talker identification and speech decoding by young hearing children (5-7 yr), older hearing children (10-12 yr), and hearing adults were examined by means of vocoder simulations of cochlear implant processing. In Experiment 1, listeners heard vocoder simulations of sentences from a man, woman, and girl and were required to identify the talker from a closed set. Younger children identified talkers more poorly than older listeners, but all age groups showed similar benefit from increased spectral information. In Experiment 2, children and adults provided verbatim repetition of vocoded sentences from the same talkers. The youngest children had more difficulty than older listeners, but all age groups showed comparable benefit from increasing spectral resolution. At comparable levels of spectral degradation, performance on the open-set task of speech decoding was considerably more accurate than on the closed-set task of talker identification. Hearing children's ability to identify talkers and decode speech from spectrally degraded material sheds light on the difficulty of these domains for child implant users.  相似文献   

19.
Mojmír Lejska   《Journal of voice》2004,18(2):209-215
There are various methods to evaluate voice parameters. Original software was used to assess the voice quality by the staff of AUDIO-Fon centr Brno, Czech Republic. A group of hereditary deaf persons was examined. Deaf persons have all of the biological conditions to make voice except for the possibility of acoustic feedback. We examined the voices of 35 persons (20 men and 15 women) with hereditary profound hearing impairments, and we compared voice parameters with the voice of intact persons. To measure we used special software called voice field measurements (VFMs). The program graphically records voice frequency and intensity. VFM is an objective method that enables the assessment of basic physical voice characteristics. It is suitable for the examination of both intact and disturbed voice. The voice of the deaf has a higher basic voice frequency in men as well as in women. This type of voice production, ie, childlike voice, which is fixed only by a motor stereotype, is much more demanding for a mature larynx. Hearing influences both the voice development and speech production. The voice of persons with hearing impairments has a higher basic voice frequency regardless of their sex. This type of voice production, which is fixed only by a motor stereotype, ie, child voice, is much more demanding for a larynx of an adult. Thus, phonation of deaf people is more demanding and their voice production needs greater effort. Deaf people, despite an intact phonic apparatus, cannot produce more than one type of voice. They cannot modulate their voices concerning the frequency and dynamics. They cannot change their voices continually. The voice is limited in both of these parameters (frequency and dynamics). If a deaf person wants to change a voice characteristic, it is possible only by discontinuous changes-"skipping."  相似文献   

20.
This study investigated acoustic-phonetic correlates of intelligibility for adult and child talkers, and whether the relative intelligibility of different talkers was dependent on listener characteristics. In experiment 1, word intelligibility was measured for 45 talkers (18 women, 15 men, 6 boys, 6 girls) from a homogeneous accent group. The material consisted of 124 words familiar to 7-year-olds that adequately covered all frequent consonant confusions; stimuli were presented to 135 adult and child listeners in low-level background noise. Seven-to-eight-year-old listeners made significantly more errors than 12-year-olds or adults, but the relative intelligibility of individual talkers was highly consistent across groups. In experiment 2, listener ratings on a number of voice dimensions were obtained for the adults talkers identified in experiment 1 as having the highest and lowest intelligibility. Intelligibility was significantly correlated with subjective dimensions reflecting articulation, voice dynamics, and general quality. Finally, in experiment 3, measures of fundamental frequency, long-term average spectrum, word duration, consonant-vowel intensity ratio, and vowel space size were obtained for all talkers. Overall, word intelligibility was significantly correlated with the total energy in the 1- to 3-kHz region and word duration; these measures predicted 61% of the variability in intelligibility. The fact that the relative intelligibility of individual talkers was remarkably consistent across listener age groups suggests that the acoustic-phonetic characteristics of a talker's utterance are the primary factor in determining talker intelligibility. Although some acoustic-phonetic correlates of intelligibility were identified, variability in the profiles of the "best" talkers suggests that high intelligibility can be achieved through a combination of different acoustic-phonetic characteristics.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号