首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 16 毫秒
1.
Although most recent multitalker research has emphasized the importance of binaural cues, monaural cues can play an equally important role in the perception of multiple simultaneous speech signals. In this experiment, the intelligibility of a target phrase masked by a single competing masker phrase was measured as a function of signal-to-noise ratio (SNR) with same-talker, same-sex, and different-sex target and masker voices. The results indicate that informational masking, rather than energetic masking, dominated performance in this experiment. The amount of masking was highly dependent on the similarity of the target and masker voices: performance was best when different-sex talkers were used and worst when the same talker was used for target and masker. Performance did not, however, improve monotonically with increasing SNR. Intelligibility generally plateaued at SNRs below 0 dB and, in some cases, intensity differences between the target and masking voices produced substantial improvements in performance with decreasing SNR. The results indicate that informational and energetic masking play substantially different roles in the perception of competing speech messages.  相似文献   

2.
Speech reception thresholds were measured in virtual rooms to investigate the influence of reverberation on speech intelligibility for spatially separated targets and interferers. The measurements were realized under headphones, using target sentences and noise or two-voice interferers. The room simulation allowed variation of the absorption coefficient of the room surfaces independently for target and interferer. The direct-to-reverberant ratio and interaural coherence of sources were also varied independently by considering binaural and diotic listening. The main effect of reverberation on the interferer was binaural and mediated by the coherence, in agreement with binaural unmasking theories. It appeared at lower reverberation levels than the effect of reverberation on the target, which was mainly monaural and associated with the direct-to-reverberant ratio, and could be explained by the loss of amplitude modulation in the reverberant speech signals. This effect was slightly smaller when listening binaurally. Reverberation might also be responsible for a disruption of the mechanism by which the auditory system exploits fundamental frequency differences to segregate competing voices, and a disruption of the "listening in the gaps" associated with speech interferers. These disruptions may explain an interaction observed between the effects of reverberation on the targets and two-voice interferers.  相似文献   

3.
Speech reception thresholds were measured to investigate the influence of a room on speech segregation between a spatially separated target and interferer. The listening tests were realized under headphones. A room simulation allowed selected positioning of the interferer and target, as well as varying the absorption coefficient of the room internal surfaces. The measurements involved target sentences and speech-shaped noise or 2-voice interferers. Four experiments revealed that speech segregation in rooms was not only dependent on the azimuth separation of sound sources, but also on their direct-to-reverberant energy ratio at the listening position. This parameter was varied for interferer and target independently. Speech intelligibility decreased as the direct-to-reverberant ratio of sources was degraded by sound reflections in the room. The influence of the direct-to-reverberant ratio of the interferer was in agreement with binaural unmasking theories, through its effect on interaural coherence. The effect on the target occurred at higher levels of reverberation and was explained by the intrinsic degradation of speech intelligibility in reverberation.  相似文献   

4.
The influence of duration on the virtual pitch of complex tones was measured using an absolute identification paradigm. If performance with two-tone complexes is expressed in terms of a single central frequency-coding noise function, this function is found to depend on duration in about the same way as the pure-tone difference limen function. The function is further found to be a reasonably good predictor of pitch identification performance with multitone complexes. Another experimental finding was that subjects tend to switch to the analytic mode of pitch perception when complex tones are shortened (i.e., they tend to hear the spectral pitches instead of the virtual ones). A third finding was that with simultaneous complex tones the degradation of each pitch percept depends not only on duration and harmonic order of the tone but also on the harmonic order of the other tone.  相似文献   

5.
Voice quality variations include a set of voicing sound source modifications ranging from laryngealized to normal to breathy phonation. Analysis of reiterant imitations of two sentences by ten female and six male talkers has shown that the potential acoustic cues to this type of voice quality variation include: (1) increases to the relative amplitude of the fundamental frequency component as open quotient increases; (2) increases to the amount of aspiration noise that replaces higher frequency harmonics as the arytenoids become more separated; (3) increases to lower formant bandwidths; and (4) introduction of extra pole zeros in the vocal-tract transfer function associated with tracheal coupling. Perceptual validation of the relative importance of these cues for signaling a breathy voice quality has been accomplished using a new voicing source model for synthesis of more natural male and female voices. The new formant synthesizer, KLSYN88, is fully documented here. Results of the perception study indicate that, contrary to previous research which emphasizes the importance of increased amplitude of the fundamental component, aspiration noise is perceptually most important. Without its presence, increases to the fundamental component may induce the sensation of nasality in a high-pitched voice. Further results of the acoustic analysis include the observations that: (1) over the course of a sentence, the acoustic manifestations of breathiness vary considerably--tending to increase for unstressed syllables, in utterance-final syllables, and at the margins of voiceless consonants; (2) on average, females are more breathy than males, but there are very large differences between subjects within each gender; (3) many utterances appear to end in a "breathy-laryngealized" type of vibration; and (4) diplophonic irregularities in the timing of glottal periods occur frequently, especially at the end of an utterance. Diplophonia and other deviations from perfect periodicity may be important aspects of naturalness in synthesis.  相似文献   

6.
The goals of this study were to measure sensitivity to the direct-to-reverberant energy ratio (D/R) across a wide range of D/R values and to gain insight into which cues are used in the discrimination process. The main finding is that changes in D/R are discriminated primarily based on spectral cues. Temporal cues may be used but only when spectral cues are diminished or not available, while sensitivity to interaural cross-correlation is too low to be useful in any of the conditions tested. These findings are based on an acoustic analysis of these variables and the results of two psychophysical experiments. The first experiment employs wideband noise with two values for onset and offset times to determine the D/R just-noticeable difference at -10, 0, 10, and 20 dB D/R. This yielded substantially higher sensitivity to D/R at 0 and 10 dB D/R (2-3 dB) than has been reported previously, while sensitivity is much lower at -10 and 20 dB D/R. The second experiment consists of three parts where specific cues to D/R are reduced or removed, which enabled the specified rank ordering of the cues. The acoustic analysis and psychophysical experiments also provide an explanation for the "auditory horizon effect."  相似文献   

7.
Five bilateral cochlear implant users were tested for their localization abilities and speech understanding in noise, for both monaural and binaural listening conditions. They also participated in lateralization tasks to assess the impact of variations in interaural time delays (ITDs) and interaural level differences (ILDs) for electrical pulse trains under direct computer control. The localization task used pink noise bursts presented from an eight-loudspeaker array spanning an arc of approximately 108 degrees in front of the listeners at ear level (0-degree elevation). Subjects showed large benefits from bilateral device use compared to either side alone. Typical root-mean-square (rms) averaged errors across all eight loudspeakers in the array were about 10 degrees for bilateral device use and ranged from 20 degrees to 60 degrees using either ear alone. Speech reception thresholds (SRTs) were measured for sentences presented from directly in front of the listeners (0 degrees) in spectrally matching speech-weighted noise at either 0 degrees, +90 degrees or -90 degrees for four subjects out of five tested who could perform the task. For noise to either side, bilateral device use showed a substantial benefit over unilateral device use when noise was ipsilateral to the unilateral device. This was primarily because of monaural head-shadow effects, which resulted in robust SRT improvements (P<0.001) of about 4 to 5 dB when ipsilateral and contralateral noise positions were compared. The additional benefit of using both ears compared to the shadowed ear (i.e., binaural unmasking) was only 1 or 2 dB and less robust (P = 0.04). Results from the lateralization studies showed consistently good sensitivity to ILDs; better than the smallest level adjustment available in the implants (0.17 dB) for some subjects. Sensitivity to ITDs was moderate on the other hand, typically of the order of 100 micros. ITD sensitivity deteriorated rapidly when stimulation rates for unmodulated pulse-trains increased above a few hundred Hz but at 800 pps showed sensitivity comparable to 50-pps pulse-trains when a 50-Hz modulation was applied. In our opinion, these results clearly demonstrate important benefits are available from bilateral implantation, both for localizing sounds (in quiet) and for listening in noise when signal and noise sources are spatially separated. The data do indicate, however, that effects of interaural timing cues are weaker than those from interaural level cues and according to our psychophysical findings rely on the availability of low-rate information below a few hundred Hz.  相似文献   

8.
Three experiments tested the hypothesis that fundamental frequency (fo) discrimination depends on the resolvability of harmonics within a tone complex. Fundamental frequency difference limens (fo DLs) were measured for random-phase harmonic complexes with eight fo's between 75 and 400 Hz, bandpass filtered between 1.5 and 3.5 kHz, and presented at 12.5-dB/component average sensation level in threshold equalizing noise with levels of 10, 40, and 65 dB SPL per equivalent rectangular auditory filter bandwidth. With increasing level, the transition from large (poor) to small (good) fo DLs shifted to a higher fo. This shift corresponded to a decrease in harmonic resolvability, as estimated in the same listeners with excitation patterns derived from measures of auditory filter shape and with a more direct measure that involved hearing out individual harmonics. The results are consistent with the idea that resolved harmonics are necessary for good fo discrimination. Additionally, fo DLs for high fo's increased with stimulus level in the same way as pure-tone frequency DLs, suggesting that for this frequency range, the frequencies of harmonics are more poorly encoded at higher levels, even when harmonics are well resolved.  相似文献   

9.
Perception of a target voice in the presence of a competing talker, of same or different gender as the target, was investigated in cochlear implant users, in implant-alone and bimodal (acoustic hearing in the non-implanted ear) conditions. Recordings of two male and two female talkers acted as targets and maskers, to investigate whether bimodal benefit increased for different compared to same gender target/maskers due to increased ability to perceive and utilize fundamental frequency and spectral-shape differences. In both listening conditions participants showed benefit of target/masker gender difference. There was an overall bimodal benefit, which was independent of target/masker gender difference.  相似文献   

10.
To examine the role of perceived gender on fricative identification, a study was conducted in which listeners identified /s/-/∫/ and /s/-/θ/ continua combined with vowels produced by a man and a woman. These were acoustically modified to be consistent with different-sized vocal tracts (VT), and were presented with pictures of men or women. Listeners identified more tokens of /s/ in the /s/-/∫/ and more tokens of /θ/ in the /s/-/θ/ continuum when these sounds were combined with men's vowels, with vowels consistent with a 17 cm VT, and with pictures of men. Results support the hypothesis that listeners incorporate information about talker gender during fricative perception.  相似文献   

11.
Three experiments studied the effect of pulse rate on temporal pitch perception by cochlear implant users. Experiment 1 measured rate discrimination for pulse trains presented in bipolar mode to either an apical, middle, or basal electrode and for standard rates of 100 and 200 pps. In each block of trials the signals could have a level of -0.35, 0, or +0.35 dB re the standard, and performance for each signal level was recorded separately. Signal level affected performance for just over half of the combinations of subject, electrode, and standard rate studied. Performance was usually, but not always, better at the higher signal level. Experiment 2 showed that, for a given subject and condition, the direction of the effect was similar in monopolar and bipolar mode. Experiment 3 employed a pitch comparison procedure without feedback, and showed that the signal levels in experiment 1 that produced the best performance for a given subject and condition also led to the signal having a higher pitch. It is concluded that small level differences can have a robust and substantial effect on pitch judgments and argue that these effects are not entirely due to response biases or to co-variation of place-of-excitation with level.  相似文献   

12.
Temporal effects in simultaneous masking were measured as a function of masker level for an on-frequency broadband masker and an off-frequency narrow-band masker for signal frequencies of 750, 1730, and 4000 Hz. The on-frequency masker was 10 equivalent rectangular bandwidths (ERBs) wide and centered at the signal frequency; the off-frequency masker was 500 Hz wide and its lower frequency edge was 1.038 ERBs higher in frequency than the signal. The primary goal of the study was to determine whether previously observed differences regarding the effects of signal frequency and masker level on the temporal effect for these two different types of masker might be due to considerably different signal levels at threshold. Despite similar masked thresholds, the effects of signal frequency and masker level in the present study were different for the two masker types. The temporal effect was significant for the two highest frequencies and absent for the lowest frequency in the presence of the broadband masker, but was more or less independent of frequency for the narrow-band masker. The temporal effect increased but then decreased as a function of level for the broadband masker (at the two higher signal frequencies, where there was a temporal effect), but increased and reached an asymptote for the narrow-band masker. Despite the different effects of signal frequency and masker level, the temporal effects for both types of masker can be understood in terms of a basilar-membrane input-output function that becomes more linear during the course of masker stimulation.  相似文献   

13.
Three experiments investigated how the onset asynchrony and ear of presentation of a single mistuned frequency component influence its contribution to the pitch of an otherwise harmonic complex tone. Subjects matched the pitch of the target complex by adjusting the pitch of a second similar but strictly periodic complex tone. When the mistuned component (the 4th harmonic of a 155 Hz fundamental) started 160 ms or more before the remaining harmonics but stopped simultaneously with them, it made a reduced contribution to the pitch of the complex. It made no contribution if it started more than 300 ms before. Pitch shifts and their reduction with onset time were larger for short (90 ms) sounds than for long (410 ms). Pitch shifts were slightly larger when the mistuned component was presented to the same ear as the remaining 11 in-tune harmonics than to the opposite ear. Adding a "captor" complex tone with a fundamental of 200 Hz and a missing 3rd harmonic to the contralateral ear did not augment the effect of onset time, even though the captor was synchronous with the mistuned harmonic, the mistuned component was equal in frequency to the missing 3rd harmonic of the captor complex tone and it was played to the same ear as the captor. The results show that a difference in onset time can prevent a resolved frequency component from contributing to the pitch of a complex tone even though it is present throughout that complex tone.  相似文献   

14.
The effect of variations in pitch, loudness, and timbre on the perception of the dynamics of isolated instrumental tones is investigated. A full factorial design was used in a listening experiment. The subjects were asked to indicate the perceived dynamics of each stimulus on a scale from pianissimo to fortissimo. Statistical analysis showed that for the instruments included (i.e., clarinet, flute, piano, trumpet, and violin) timbre and loudness had equally large effects, while pitch was relevant mostly for the first three. The results confirmed our hypothesis that loudness alone is not a reliable estimate of the dynamics of musical tones.  相似文献   

15.
The research aim is to investigate that how pitch and duration affect the perception of the neutral tone in standard Chinese and that which one of the two factors is more important. A psycho-acoustic experiment was conducted, in which the listening stimuli consisted of 15 groups of disyllabic words, and the pitches and durations of the two syllables in each of the words were artificially controlled. Thirty-three standard Chinese native speakers participated in a forced-choice task to judge if the second syllables of the words they heard carried the neutral tone or the normal tones. The results of the experiment indicated that, (1) the effects of both pitch and duration on the perception of the neutral tone are significant; (2) the effect of pitch is larger than that of duration; (3) both the F0 values of the high point in the pitch contour and the pitch contour pattern have influence on the perception. The results of the experiment were correspondent to some extent to those of the previous acoustic analyses on the neutral tone. The diversity between the results of the presented perceptual study and those of the acoustical studies implies that the acoustic features of the neutral tone that exist in natural speech while do not affect the perception may be phonologically redundant.  相似文献   

16.
音高和时长在普通话轻声知觉中的作用   总被引:4,自引:2,他引:2  
王韫佳 《声学学报》2004,29(5):453-461
目的在于探讨音高和时长两种因素在普通话轻声知觉中的作用方式以及比较两种因素所起作用的大小。使用了心理-声学的实验方法,所用刺激为音高和时长得到控制的15组合成的双音节语音词,要求33名普通话母语者对所有刺激的重音类型进行“重重”或“重轻”的强迫性选择判断。结果表明: (1)音高和时长对于普通话轻声的知觉均有显著作用, (2)音高对于轻声知觉的作用明显大于时长, (3)音高曲线的起点、高音点和调型曲拱均对轻声的知觉起作用。这些实验结果与自然语音中轻声的声学特征基本上是互相对应的,但也存在一定程度的差别。这些差别说明,自然语音中轻声的某些声学特征只是羡余特征而非音系特征。  相似文献   

17.
This article presents the results of two experiments investigating performance on a monaural envelope correlation discrimination task. Subjects were asked to discriminate pairs of noise bands that had identical envelopes (referred to as correlated stimuli) from pairs of noise bands that had envelopes which were independent (uncorrelated stimuli). In the first experiment, a number of stimulus parameters were varied: the center frequency of the lower frequency noise band in a pair, f1; the frequency separation between component noise bands; the duration of the stimuli; and the bandwidth of the component noise bands. For a long stimulus duration (500 ms) and a relatively wide bandwidth (100 Hz), subjects could easily discriminate correlated from uncorrelated stimuli for a wide range of frequency separations between the component noise bands. This was true both when f1 was 350 Hz, and when f1 was 2500 Hz. In each case, narrowing the bandwidth to 25 Hz, or shortening the duration to 100 ms, or both, made the task more difficult, but not impossible. In the second experiment, the level of the higher frequency noise band in a pair was varied. Performance did not decrease monotonically as the level of this band was decreased below the level of the other band, and only showed marked impairment when the level of the higher frequency band was at least 60 dB below that of the lower frequency band. The pattern of results in these two experiments is different from that which is obtained when the same stimulus parameters are varied in experiments investigating comodulation masking release (CMR). This suggests that the mechanisms underlying CMR and those underlying the discrimination of envelope correlation are not identical.  相似文献   

18.
Growth-of-masking (GOM) functions were obtained in three groups of normal-hearing subjects using a simultaneous-masking paradigm. In all cases, the signal frequency (fs) was higher than the masker frequency (fm), either by a certain ratio (1.44) or by a certain distance [3 equivalent rectangular bandwidths (ERBs)]. The purpose was to evaluate the effect of overall frequency on the slope of the steep portion of the GOM function, and to evaluate the change in slope that can occur at high levels. Signal frequency ranged from 400 to 5000 Hz, and masker level ranged from 40 to 95 dB SPL. On average, the slope of the steep portion of the GOM function was about 1.4 for signal frequencies from 400 to 750 Hz, and 2.0 for signal frequencies from 1944 to 5000 Hz. This is consistent with the possibility that the cochlea may behave more linearly at the apical (low-frequency) region than at the basal (high-frequency) region. In addition, for signal frequencies at and above 750 Hz, the slope of the masking function changed from a value much greater than 1.0 to a value of 1.0 at high levels. The change in slope was better correlated with signal sensation level (i.e., amount of masking) than with either signal or masker SPL: the lack of a change at the lower signal frequencies may reflect the smaller amounts of masking there. The change to a linear growth of masking may represent a change in the response to the signal from compressive to linear at high levels.  相似文献   

19.
The perception of fundamental pitch for two-harmonic complex tones was examined in musically experienced listeners with cochlear-based high-frequency hearing loss. Performance in a musical interval identification task was measured as a function of the average rank of the lowest harmonic for both monotic and dichotic presentation of the harmonics at 14 dB Sensation Level. Listeners with hearing loss demonstrated excellent musical interval identification at low fundamental frequencies and low harmonic numbers, but abnormally poor identification at higher fundamental frequencies and higher average ranks. The upper frequency limit of performance in the listeners with hearing loss was similar in both monotic and dichotic conditions. These results suggest that something other than frequency resolution per se limits complex-tone pitch perception in listeners with hearing loss.  相似文献   

20.
Although both perceived vocal effort and intensity are known to influence the perceived distance of speech, little is known about the processes listeners use to integrate these two parameters into a single estimate of talker distance. In this series of experiments, listeners judged the distances of prerecorded speech samples presented over headphones in a large open field. In the first experiment, virtual synthesis techniques were used to simulate speech signals produced by a live talker at distances ranging from 0.25 to 64 m. In the second experiment, listeners judged the apparent distances of speech stimuli produced over a 60-dB range of different vocal effort levels (production levels) and presented over a 34-dB range of different intensities (presentation levels). In the third experiment, the listeners judged the distances of time-reversed speech samples. The results indicate that production level and presentation level influence distance perception differently for each of three distinct categories of speech. When the stimulus was high-level voiced speech (produced above 66 dB SPL 1 m from the talker's mouth), the distance judgments doubled with each 8-dB increase in production level and each 12-dB decrease in presentation level. When the stimulus was low-level voiced speech (produced at or below 66 dB SPL at 1 m), the distance judgments doubled with each 15-dB increase in production level but were relatively insensitive to changes in presentation level at all but the highest intensity levels tested. When the stimulus was whispered speech, the distance judgments were unaffected by changes in production level and only decreased with increasing presentation level when the intensity of the stimulus exceeded 66 dB SPL. The distance judgments obtained in these experiments were consistent across a range of different talkers, listeners, and utterances, suggesting that voice-based distance cueing could provide a robust way to control the apparent distances of speech sounds in virtual audio displays.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号