首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Listeners' auditory discrimination of vowel sounds depends in part on the order in which stimuli are presented. Such presentation order effects have been argued to be language independent, and to result from psychophysical (not speech- or language-specific) factors such as the decay of memory traces over time or increased weighting of later-occurring stimuli. In the present study, native Cantonese speakers' discrimination of a linguistic tone continuum is shown to exhibit order of presentation effects similar to those shown for vowels in previous studies. When presented with two successive syllables differing in fundamental frequency by approximately 4 Hz, listeners were significantly more sensitive to this difference when the first syllable was higher in frequency than the second. However, American English-speaking listeners with no experience listening to Cantonese showed no such contrast effect when tested in the same manner using the same stimuli. Neither English nor Cantonese listeners showed any order of presentation effects in the discrimination of a nonspeech continuum in which tokens had the same fundamental frequencies as the Cantonese speech tokens but had a qualitatively non-speech-like timbre. These results suggest that tone presentation order effects, unlike vowel effects, may be language specific, possibly resulting from the need to compensate for utterance-related pitch declination when evaluating fundamental frequency for tone identification.  相似文献   

2.
This investigation aims at describing voice function of four nonclassical styles of singing, Rock, Pop, Soul, and Swedish Dance Band. A male singer, professionally experienced in performing in these genres, sang representative tunes, both with their original lyrics and on the syllable /pae/. In addition, he sang tones in a triad pattern ranging from the pitch Bb2 to the pitch C4 on the syllable /pae/ in pressed and neutral phonation. An expert panel was successful in classifying the samples, thus suggesting that the samples were representative of the various styles. Subglottal pressure was estimated from oral pressure during the occlusion for the consonant [p]. Flow glottograms were obtained from inverse filtering. The four lowest formant frequencies differed between the styles. The mean of the subglottal pressure and the mean of the normalized amplitude quotient (NAQ), that is, the ratio between the flow pulse amplitude and the product of period and maximum flow declination rate, were plotted against the mean of fundamental frequency. In these graphs, Rock and Swedish Dance Band assumed opposite extreme positions with respect to subglottal pressure and mean phonation frequency, whereas the mean NAQ values differed less between the styles.  相似文献   

3.
Listeners without absolute (or "perfect") pitch have difficulty identifying or producing isolated musical pitches from memory. Instead, they process the relative pattern of pitches, which remains invariant across pitch transposition. Musically untrained non-absolute pitch possessors demonstrated absolute pitch memory for the telephone dial tone, a stimulus that is always heard at the same absolute frequency. Listeners accurately classified pitch-shifted versions of the dial tone as "normal," "higher than normal" or "lower than normal." However, the role of relative pitch processing was also evident, in that listeners' pitch judgments were also sensitive to the frequency range of stimuli.  相似文献   

4.
An acoustic model of a multiple-channel cochlear implant   总被引:1,自引:0,他引:1  
A set of bandpass filtered, pulsed noise stimuli presented to three normally hearing subjects was found to have psychophysical properties similar to those of a set of pulsed electrical stimuli presented to two cochlear implant patients. Identical procedures were used to compare the performances of the two groups of subjects in the following tasks: (a) pulse rate difference limen measurements, (b) pitch scaling for stimuli differing in pulse rate, (c) pitch scaling and categorization of stimuli differing in filter frequency or electrode position, and (d) similarity judgments of stimuli differing in pulse rate and filter frequency or electrode position. By choosing the parameters of the acoustic stimuli appropriately, a high level of agreement between the two sets of results was achieved. Electrical stimuli on electrodes at different sites in the cochlea were matched with pulsed noise passed through bandpass filters with different center frequencies. Matching was achieved for equal electrical and acoustic pulse rates.  相似文献   

5.
The relationship of lung pressure, fundamental frequency, peak airflow, open quotient, and maximal flow declination rate to vocal intensity for a normal speaking, young male control group and an elderly male group was investigated. The control group consisted of 17 healthy male subjects with a mean age of 30 years and the elderly group consisted of 11 healthy male subjects with a mean age of 77 years. Data were collected at three levels of vocal intensity: soft, comfortable, and loud, corresponding to 25%, 50%, and 75% of dynamic range, respectively. Phonational threshold pressure and lung pressure were obtained using the intraoral technique. The oral airflow waveform was inverse filtered to provide an approximation to the glottal airflow waveform from which measures of fundamental frequency, peak airflow, open quotient, and maximal flow declination rate were determined. Excess lung pressure was calculated as lung pressure minus estimated phonational threshold pressure. The results show for both groups an increase in sound pressure level across the conditions, with corresponding increases in lung pressure, excess lung pressure, fundamental frequency, peak airflow, and maximal flow declination rate. Open quotient decreased with increasing vocal intensity. Lung pressure, sound pressure level, and peak airflow were all found to be significantly greater for the control group than for the elderly group at each condition. Open quotient was found to be significantly lower in the control group than in the elderly group at each condition. No significant difference was observed for excess lung pressure, phonational threshold pressure, fundamental frequency, or maximal flow declination rate between the two groups. These results show that a difference in vocal intensity does exist between young and elderly voices and that this difference is the result of differences in lung pressure, peak airflow, and open quotient.  相似文献   

6.
Both in normal speech voice and in some types of pathological voice, adjacent vocal cycles may alternate in amplitude or period, or both. When this occurs, the determination of voice fundamental frequency (defined as number of vocal cycles per second) becomes difficult. The present study attempts to address this issue by investigating how human listeners perceive the pitch of alternate cycles. As stimuli, vowels /a/ and /i/ were synthesized with fundamental frequencies at 140 Hz and 220 Hz, and the effect of alternate cycles was simulated with both amplitude- and frequency-modulation of the glottal volume velocity waveform. Subjects were asked to judge the pitch of the modulated vowels in reference to vowels without modulation. The results showed that (a) perceived pitch became lower as the amount of modulation increased, and the effect seems to be more dramatic than would be predicted by existing hypotheses, (b) perceived pitch differed across vowels, fundamental frequencies, and modulation types, that is, amplitude versus frequency modulation, and (c) the prediction of perceived pitch was best made in the frequency domain in terms of subharmonic-to-harmonic ratio. These findings provide useful information on how we should assess the pitch of alternate cycles. They may also be helpful in developing more robust pitch determination algorithms.  相似文献   

7.
The present study explores the use of extrinsic context in perceptual normalization for the purpose of identifying lexical tones in Cantonese. In each of four experiments, listeners were presented with a target word embedded in a semantically neutral sentential context. The target word was produced with a mid level tone and it was never modified throughout the study, but on any given trial the fundamental frequency of part or all of the context sentence was raised or lowered to varying degrees. The effect of perceptual normalization of tone was quantified as the proportion of non-mid level responses given in F0-shifted contexts. Results showed that listeners' tonal judgments (i) were proportional to the degree of frequency shift, (ii) were not affected by non-pitch-related differences in talker, (iii) and were affected by the frequency of both the preceding and following context, although (iv) following context affected tonal decisions more strongly than did preceding context. These findings suggest that perceptual normalization of lexical tone may involve a "moving window" or "running average" type of mechanism, that selectively weights more recent pitch information over older information, but does not depend on the perception of a single voice.  相似文献   

8.
Acoustic correlates of contrastive stress, i.e., fundamental frequency (F0), duration, and intensity, and listener perceptions of stress, were investigated in a profoundly deaf subject (RS) pre/post single-channel cochlear implant and longitudinally, and compared to the overall patterns of age-peer profoundly deaf (JM) and normally hearing subjects (DL). The stimuli were a group of general American English words in which a change of function from noun to verb is associated with a shift of stress from initial to final syllable, e.g., CON'trast versus conTRAST'. Precochlear implant, RS was unable to produce contrastive stress correctly. Hearing one day post-stimulation resulted in significantly higher F0 for initial and final stressed versus unstressed syllables. Four months post-stimulation, RS maintained significantly higher F0 on stressed syllables, as well as generalization of significantly increased intensity and longer syllable duration differences for all stressed versus unstressed syllables. Perceptually, listeners judged RS's contrastive stress placement as incorrect precochlear implant and as always correct post-cochlear implant. JM's contrastive stress was judged as 96% correct, and DL's contrastive stress placement was 100% correct. It was concluded that RS reacquired all acoustic correlates needed for appropriate differentiation of contrastive stress with longitudinal use of the cochlear implant.  相似文献   

9.
Measurements on the inverse filtered airflow waveform and of estimated average transglottal pressure and glottal airflow were made from syllable sequences in low, normal, and high pitch for 25 male and 20 female speakers. Correlation analyses indicated that several of the airflow measurements were more directly related to voice intensity than to fundamental frequency (F0). Results suggested that pressure may have different influences in low and high pitch in this speech task. It is suggested that unexpected results of increased pressure in low pitch were related to maintaining voice quality, that is, avoiding vocal fry. In high pitch, the increased pressure may serve to maintain vocal fold vibration. The findings suggested different underlying laryngeal mechanisms and vocal adjustments for increasing and decreasing F0 from normal pitch.  相似文献   

10.
The dependency of the brightness dimension of timbre on fundamental frequency (FO) was examined experimentally. Subjects compared the timbres of 24 synthetic stimuli, produced by the combination of six values of spectral centroid to obtain different values of expected brightness, and four FO's, ranging over 18 semitones. Subjects were instructed to ignore pitch differences. Dissimilarity scores were analyzed by both ANOVA and multidimensional scaling (MDS). Results show that timbres can be compared between stimuli with different FO's over the range tested, and that differences in FO affect timbre dissimilarity in two ways. First, dissimilarity scores reveal a term proportional to FO difference that shows up in the MDS solution as a dimension correlated with FO and orthogonal to other timbre dimensions. Second, FO affects systematically the timbre dimension (brightness) correlated with spectral centroid. Interestingly, both terms covaried with differences in FO rather than chroma or consonance. The first term probably corresponds to pitch. The second can be eliminated if the formula for spectral centroid is modified by introducing a corrective factor dependent on FO.  相似文献   

11.
Three different waveforms were generated from the same component frequencies by setting the phase of the components so they were either homophasic (all component sinusoids start at 0 degree), diphasic (sinusoids alternate between -45 degrees and + 45 degrees), or heterophasic (starting phase randomly selected). Listeners were asked to rate the saliency of all periodicity pitches they could detect in stimuli which contained 12 or more components at frequencies above the region where pitches were perceived . A major finding was that the highest ratings of fundamental frequency (f1) pitch "strength" were always obtained for homophasic waveforms, which among the test stimuli have the most abrupt envelope fluctuations. In contrast, diphasic and heterophasic waveforms, which have smoother envelopes, yielded lower pitch strength estimates at f1 and higher ratings two octaves above the fundamental. These data indicate that information concerning the stimulus waveform envelope influences the relative prominence of competing pitches evoked by periodicity pitch stimuli. However, no one-to-one correspondence between pitch and waveform periodicity is apparent.  相似文献   

12.
Psychophysical tests were carried out to investigate the perception of electrocutaneous stimuli delivered to the digital nerve bundles. The tests provided data for defining the operating range of a tactile aid for patients with profound-to-total hearing loss, as well as the individual differences between subjects and the information that could be transmitted. Monopolar biphasic constant current pulses with variable pulse widths were used. Threshold pulse widths varied widely between subjects and between fingers for the same subject. Thresholds were reasonably stable, but maximum comfortable levels increased with time. Perceived intensity was weakly dependent on pulse rate. Absolute identification of stimuli differing in pulse width gave information transmissions from 1.3-2.1 bits, limited by the dynamic ranges of the stimuli (3-17 dB). Stimuli from electrodes placed on either side of each finger were identified easily by all subjects. Absolute identification of stimuli differing in pulse rate gave information transmissions from 0.5-2.0 bits. Difference limens for pulse rate varied between subjects and were generally poor above 100 pps. On the basis of the results, an electrotactile speech processor is proposed, which codes the speech amplitude as pulse width, the fundamental frequency as pulse rate, and the second formant frequency as electrode position. Variable performances on tasks relying on amplitude and fundamental frequency cues are expected to arise from the intersubject differences in dynamic range and pulse rate discrimination. The psychophysical results for electrotactile stimulation are compared with previously published results for electroauditory stimulation with a multiple-channel cochlear implant.  相似文献   

13.
In this study the perception of the fundamental frequency (F0) of periodic stimuli by cochlear implant users is investigated. A widely used speech processor is the Continuous Interleaved Sampling (CIS) processor, for which the fundamental frequency appears as temporal fluctuations in the envelopes at the output. Three experiments with four users of the LAURA (Registered trade mark of Philips Hearing Implants, now Cochlear Technology Centre Europe) cochlear implant were carried out to examine the influence of the modulation depth of these envelope fluctuations on pitch discrimination. In the first experiment, the subjects were asked to discriminate between two SAM (sinusoidally amplitude modulated) pulse trains on a single electrode channel differing in modulation frequency ( deltaf = 20%). As expected, the results showed a decrease in the performance for smaller modulation depths. Optimal performance was reached for modulation depths between 20% and 99%, depending on subject, electrode channel, and modulation frequency. In the second experiment, the smallest noticeable difference in F0 of synthetic vowels was measured for three algorithms that differed in the obtained modulation depth at the output: the default CIS strategy, the CIS strategy in which the F0 fluctuations in the envelope were removed (FLAT CIS), and a third CIS strategy, which was especially designed to control and increase the depth of these fluctuations (F0 CIS). In general, performance was poorest for the FLAT CIS strategy, where changes in F0 are only apparent as changes of the average amplitude in the channel outputs. This emphasizes the importance of temporal coding of F0 in the speech envelope for pitch perception. No significantly better results were obtained for the F0 CIS strategy compared to the default CIS strategy, although the latter results in envelope modulation depths at which sub-optimal scores were obtained in some cases of the first experiment. This indicates that less modulation is needed if all channels are stimulated with synchronous F0 fluctuations. This hypothesis is confirmed in a third experiment where subjects performed significantly better in a pitch discrimination task with SAM pulse trains, if three channels were stimulated concurrently, as opposed to only one.  相似文献   

14.
This study investigated the integration of place- and temporal-pitch cues in pitch contour identification (PCI), in which cochlear implant (CI) users were asked to judge the overall pitch-change direction of stimuli. Falling and rising pitch contours were created either by continuously steering current between adjacent electrodes (place pitch), by continuously changing amplitude modulation (AM) frequency (temporal pitch), or both. The percentage of rising responses was recorded as a function of current steering or AM frequency change, with single or combined pitch cues. A significant correlation was found between subjects' sensitivity to current steering and AM frequency change. The integration of place- and temporal-pitch cues was most effective when the two cues were similarly discriminable in isolation. Adding the other (place or temporal) pitch cues shifted the temporal- or place-pitch psychometric functions horizontally without changing the slopes. PCI was significantly better with consistent place- and temporal-pitch cues than with inconsistent cues. PCI with single cues and integration of pitch cues were similar on different electrodes. The results suggest that CI users effectively integrate place- and temporal-pitch cues in relative pitch perception tasks. Current steering and AM frequency change should be coordinated to better transmit dynamic pitch information to CI users.  相似文献   

15.
This study investigated the perceptual and acoustical characteristicsof vocal presentation in both the masculine and the feminine modes by the same group of male subjects. Listeners (N = 88) evaluated 22 voice samples by using 18 semantic differential scales and 57 adjectives. The 22 voice samples were provided by I I biologically male speakers, who described themselves as heterosexual crossdressers. Each speaker read a standard passage under controlled conditions. In one reading, they demonstrated their typical masculine voice and in the other they spoke in their feminine voice. Acoustical analyses included mean fundamental frequency, frequency range, overall passage duration, and duration of a sample of stressed vowels. Results indicated that listeners heard significant differences between masculine and feminine presentations across the I I speakers and the 18 semantic differential scales. Masculine-feminine and high-low pitch were the most salient scales in the perceptual judgments. Acoustical analyses indicated wide variation according to speaker and condition. Clinical applications are provided.  相似文献   

16.
Two sounds with the same pitch may vary from each other based on saliency of their pitch sensation. This perceptual attribute is called "pitch strength." The study of voice pitch strength may be important in quantifying of normal and pathological qualities. The present study investigated how pitch strength varies across normal and dysphonic voices. A set of voices (vowel /a/) selected from the Kay Elemetrics Disordered Voice Database served as the stimuli. These stimuli demonstrated a wide range of voice quality. Ten listeners judged the pitch strength of these stimuli in an anchored magnitude estimation task. On a given trial, listeners heard three different stimuli. The first stimulus represented very low pitch strength (wide-band noise), the second stimulus consisted of the target voice and the third stimulus represented very high pitch strength (pure tone). Listeners estimated pitch strength of the target voice by positioning a continuous slider labeled with values between 0 and 1, reflecting the two anchor stimuli. Results revealed that listeners can judge pitch strength reliably in dysphonic voices. Moderate to high correlations with perceptual judgments of voice quality suggest that pitch strength may contribute to voice quality judgments.  相似文献   

17.
Previous studies have demonstrated that perturbations in voice pitch or loudness feedback lead to compensatory changes in voice F(0) or amplitude during production of sustained vowels. Responses to pitch-shifted auditory feedback have also been observed during English and Mandarin speech. The present study investigated whether Mandarin speakers would respond to amplitude-shifted feedback during meaningful speech production. Native speakers of Mandarin produced two-syllable utterances with focus on the first syllable, the second syllable, or none of the syllables, as prompted by corresponding questions. Their acoustic speech signal was fed back to them with loudness shifted by +/-3 dB for 200 ms durations. The responses to the feedback perturbations had mean latencies of approximately 142 ms and magnitudes of approximately 0.86 dB. Response magnitudes were greater and latencies were longer when emphasis was placed on the first syllable than when there was no emphasis. Since amplitude is not known for being highly effective in encoding linguistic contrasts, the fact that subjects reacted to amplitude perturbation just as fast as they reacted to F(0) perturbations in previous studies provides clear evidence that a highly automatic feedback mechanism is active in controlling both F(0) and amplitude of speech production.  相似文献   

18.
Although in a number of experiments noise-band vocoders have been shown to provide acoustic models for speech perception in cochlear implants (CI), the present study assesses in four experiments whether and under what limitations noise-band vocoders can be used as an acoustic model for pitch perception in CI. The first two experiments examine the effect of spectral smearing on simulated electrode discrimination and fundamental frequency (FO) discrimination. The third experiment assesses the effect of spectral mismatch in an FO-discrimination task with two different vocoders. The fourth experiment investigates the effect of amplitude compression on modulation rate discrimination. For each experiment, the results obtained from normal-hearing subjects presented with vocoded stimuli are compared to results obtained directly from CI recipients. The results show that place pitch sensitivity drops with increased spectral smearing and that place pitch cues for multi-channel stimuli can adequately be mimicked when the discriminability of adjacent channels is adjusted by varying the spectral slopes to match that of CI subjects. The results also indicate that temporal pitch sensitivity is limited for noise-band carriers with low center frequencies and that the absence of a compression function in the vocoder might alter the saliency of the temporal pitch cues.  相似文献   

19.
The purpose of this study was to investigate univariate relationships between perceived dysphonia and variation in pitch perturbation, amplitude perturbation, and additive noise. A time-domain, pitch-synchronous synthesis technique was used to generate sustained vowels varying in each of the three acoustic dimensions. A panel of trained listeners provided direct magnitude estimates of roughness in the case of the stimuli varying in pitch and amplitude perturbation, and breathiness in the case of the stimuli varying in additive noise. Very strong relationships were found between perceived roughness and either pitch or amplitude perturbation. However, unlike results reported previously for nonspeech stimuli, the subjective quality associated with pitch perturbation was quite different from that associated with amplitude perturbation. Results also showed that perceived roughness was affected not only by the amount of perturbation, but also by the degree of correlation between adjacent pitch or amplitude values. A strong relationship was found between perceived breathiness and signal-to-noise ratio. Contrary to previous findings, there was no interaction between signal-to-noise ratio and the amount of high-frequency energy in the periodic component of the stimulus: Stimuli with similar signal-to-noise ratios received similar ratings, regardless of differences in the spectral slope of the periodic component.  相似文献   

20.
Subglottal pressure is one of the main voice control factors, controlling vocal loudness. In this investigation the effects of subglottal pressure variation on the voice source in untrained female and male voices phonating at a low, a middle, and a high fundamental frequency are analyzed. The subjects produced a series of /pae/ syllables at varied degrees of vocal loudness, attempting to keep pitch constant. Subglottal pressure was estimated from the oral pressure during the /p/ occlusion. Ten subglottal pressure values, approximately equidistantly spaced within the pressure range used, were identified, and the voice source of the vowels following these pressure values was analyzed by inverse filtering the airflow signal as captured by a Rothenberg mask. The maximum flow declination rate (MFDR) was found to increase linearly with subglottal pressure, but a given subglottal pressure produced lower values for female than for male voices. The closed quotient increased quickly with subglottal pressure at low pressures and slowly at high pressures, such that the relationship can be approximated by a power function. For a given subglottal pressure value, female voices reached lower values of closed quotient than male voices.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号