首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 437 毫秒
1.
针对目前有监督语音增强忽略了纯净语音、噪声与带噪语音之间的幅度谱相似性对增强效果影响等问题,提出了一种联合精确比值掩蔽(ARM)与深度神经网络(DNN)的语音增强方法。该方法利用纯净语音与带噪语音、噪声与带噪语音的幅度谱归一化互相关系数,设计了一种基于时频域理想比值掩蔽的精确比值掩蔽作为目标掩蔽;然后以纯净语音和噪声幅度谱为训练目标的DNN为基线,通过该DNN的输出来估计目标掩蔽,并对基线DNN和目标掩蔽进行联合优化,增强语音由目标掩蔽从带噪语音中估计得到;此外,考虑到纯净语音与噪声的区分性信息,采用一种区分性训练函数代替均方误差(MSE)函数作为基线DNN的目标函数,以使网络输出更加准确。实验表明,区分性训练函数提升了基线DNN以及整个联合优化网络的增强效果;在匹配噪声和不匹配噪声下,相比于其它常见DNN方法,本文方法取得了更高的平均客观语音质量评估(PESQ)和短时客观可懂度(STOI),增强后的语音保留了更多语音成分,同时对噪声的抑制效果更加明显。   相似文献   

2.
Auditory enhancement of changes in spectral amplitude   总被引:1,自引:0,他引:1  
An auditory enhancement effect occurs when one component of a harmonic series is omitted for a few hundred milliseconds and then reintroduced: The reintroduced harmonic stands out perceptually. Three experiments are reported that studied a version of this effect in which several components of a harmonic series are enhanced to define the formants of a vowel. Using the accuracy of vowel identification to measure the prominence of the formant peaks in the effective auditory representation, forms of the effect were identified that are qualitatively similar to the incremental and decremental responses seen in primary auditory-nerve fibers. These results are compatible with an origin for the enhancement effect in peripheral auditory adaptation. However, an additional mechanism is required to account for the demonstration [Viemeister and Bacon, J. Acoust. Soc. Am. 71, 1502-1507 (1982)] that enhancement can involve a true gain in the frequency region of the reintroduced component. These effects demonstrate one way in which the auditory system may attenuate the prominence of background noises while preserving the ability to represent changes in spectral amplitude produced by newly arriving signals.  相似文献   

3.
张明慧  黄廉卿 《光学技术》2006,32(4):610-611
数字CR(computed radiography)医学放射图像动态范围宽、细节丰富、对比度差,只有对其进行增强处理才能满足医生临床诊断的需要。由于目前通用的CR图像增强算法的对比度和噪声增强过度,丢失了细节,为了对CR图像进行边缘细节增强,提出了一种非线性反锐化掩模算法。该算法使用钝化模糊影像来增加对选择空间频率的响应,以增强CR图像的结构边缘和细节。算法能根据CR图像的灰度特性来调节增强程度的加权因数K,从而可非线性地增强CR影像的边缘细节。实验证明,经算法处理后的CR图像细节丰富,信噪比高,细节方差与背景方差之比为通用算法的9.6倍,增强后的CR图像具有良好的视觉效果,是一种增强CR医学放射图像边缘细节的好方法。  相似文献   

4.
Although electrolarynx (EL) serves as an important method of phonation for the laryngectomees, the resulting speech is of poor intelligibility due to the presence of a steady background noise caused by the instrument, even worse in the case of additive noise. This paper investigates the problem of EL speech enhancement by taking into account the frequency-domain masking properties of the human auditory system. One approach is incorporating an auditory masking threshold (AMT) for parametric adaptation in a subtractive-type enhancement process. The other is the supplementary AMT (SAMT) algorithm, which applies a cross-correlation spectral subtraction (CCSS) approach as a post-processing scheme to enhancing EL speech dealt with the AMT method. The performance of these two algorithms was evaluated as compared to the power spectral subtraction (PSS) algorithm. The best performance of EL speech enhancement was associated with the SAMT algorithm, followed by the AMT algorithm and the PSS algorithm. Acoustic and perceptual analyses indicated that the AMT and SAMT algorithms achieved the better performances of noise reduction and the enhanced EL speech was more pleasant to human listeners as compared to the PSS algorithm.  相似文献   

5.
Although informational masking is thought to reflect central mechanisms, the effects are generally much stronger when the target and masker are presented to the same ear than when they are presented to different ears. However, the results of a recent study by Brungart and Simpson [J. Acoust. Soc. Am. 112, 2985-2995 (2002)] indicated that a speech masker that is presented contralateral to a speech signal can produce substantial amounts of informational masking when a second speech masker is played simultaneously in the same ear as the signal. In this study, we conducted a series of experiments that paralleled those of Brungart and Simpson but used a pure-tone signal and multitone informational maskers in a detection task. Both the signal and the maskers were played as sequences of short bursts in each observation interval. The maskers were arranged in two types of spectrotemporal patterns. One type of pattern, called "multiple-bursts same" (MBS), has previously been shown to produce very large amounts of informational masking while the other type of pattern, called "multiple-bursts different" (MBD), has been shown to produce very small amounts of informational masking. Several conditions of ipsilateral, contralateral, and combined presentation of these maskers were tested. The results showed that presentation of the MBS masker in the contralateral ear produced a substantial amount of informational masking when the MBD masker was simultaneously presented to the ipsilateral ear. The results supported the earlier findings of Brungart and Simpson indicating that listeners are unable to selectively focus their attention on a single ear in some complex dichotic listening conditions. These results suggest that this contralateral masking effect is not restricted to speech and may reflect more general limitations on processing capacity. Further, it was concluded that the magnitude of the contralateral masking effect was related both to the informational masking value of the contralateral masker and the complexity of the stimulus and/or task in the ear in which the signal was presented.  相似文献   

6.
When normal-hearing adults and children are required to detect a 1000-Hz tone in a random-frequency multitone masker, masking is often observed in excess of that predicted by traditional auditory filter models. The excess masking is called informational masking. Though individual differences in the effect are large, the amount of informational masking is typically much greater in young children than in adults [Oh et al., J. Acoust. Soc. Am. 109, 2888-2895 (2001)]. One factor that reduces informational masking in adults is spatial separation of the target tone and masker. The present study was undertaken to determine whether or not a similar effect of spatial separation is observed in children. An extreme case of spatial separation was used in which the target tone was presented to one ear and the random multitone masker to the other ear. This condition resulted in nearly complete elimination of masking in adults. In young children, however, presenting the masker to the nontarget ear typically produced only a slight decrease in overall masking and no change in informational masking. The results for children are interpreted in terms of a model that gives equal weight to the auditory filter outputs from each ear.  相似文献   

7.
Thresholds for 10-ms sinusoids simultaneously masked by bursts of bandpass noise centered on the signal frequency were measured for a wide range of signal frequencies and noise levels. Thresholds were defined as the signal power relative to the masker power at the output of an auditory filter centered on the signal frequency. It was found that the presentation of a continuous random noise, with a spectral notch centered on the signal frequency, produced a reduction in signal thresholds of up to 11 dB. A notched noise spectrum level of 0-5 dB above that of the masker proved most effective in producing a masking release, as measured by a reduction in masked threshold. A release from masking of up to 7 dB could be obtained with a continuous bandpass noise. The most effective spectrum level of this noise was 5 dB below that of the masker. The effect of the continuous notched noise was to reduce signal-to-masker ratios at threshold to about 0 dB, regardless of the threshold in the absence of continuous noise. Thus the greatest release from masking occurred when "unreleased" thresholds were highest. The release from masking is almost complete within 320 ms of notched noise onset, and persists for about 160 ms after notched noise offset, regardless of notched noise level. The phenomenon is similar in many ways to the "overshoot" effect reported by Zwicker [J. Acoust. Soc. Am. 37, 653-663 (1965)]. It is argued that both effects can be largely attributed to peripheral short-term adaptation, a mechanism which is also believed to be involved in forward masking.  相似文献   

8.
The effects of cochlear hypothermia on compound action potential tuning   总被引:3,自引:0,他引:3  
The effects of lowered cochlear temperature on eighth-nerve tuning were assessed by using forward masking of whole nerve action potential (AP) responses to generate AP tuning curves (APTCs) at cochlear temperatures ranging from 38.5 degrees to 30 degrees C for probe frequencies from 8 to 36 kHz. The data indicate that subnormal cochlear temperatures result in: broadened APTCs for probe frequencies above 10 kHz which are interpreted as resulting from reduced hair-cell frequency selectivity, lowered or more sensitive APTC tips where tone-burst thresholds are unchanged, and raised or less sensitive tips where thresholds to tone bursts were elevated. Increased tip sensitivity is explained in terms of enhanced eighth-nerve adaptation which occurred during hypothermia. Experiments directly addressing adaptation were performed, in which the masker-probe interval (delta t) was systematically lengthened. The normalized AP decrement versus delta t functions indicate an enhancement of both the amount and duration of adaptation during hypothermia. Functions relating the growth of response to the masker (AP decrement versus masker intensity functions) were reduced at low temperatures.  相似文献   

9.
Similarity between the target and masking voices is known to have a strong influence on performance in monaural and binaural selective attention tasks, but little is known about the role it might play in dichotic listening tasks with a target signal and one masking voice in the one ear and a second independent masking voice in the opposite ear. This experiment examined performance in a dichotic listening task with a target talker in one ear and same-talker, same-sex, or different-sex maskers in both the target and the unattended ears. The results indicate that listeners were most susceptible to across-ear interference with a different-sex within-ear masker and least susceptible with a same-talker within-ear masker, suggesting that the amount of across-ear interference cannot be predicted from the difficulty of selectively attending to the within-ear masking voice. The results also show that the amount of across-ear interference consistently increases when the across-ear masking voice is more similar to the target speech than the within-ear masking voice is, but that no corresponding decline in across-ear interference occurs when the across-ear voice is less similar to the target than the within-ear voice. These results are consistent with an "integrated strategy" model of speech perception where the listener chooses a segregation strategy based on the characteristics of the masker present in the target ear and the amount of across-ear interference is determined by the extent to which this strategy can also effectively be used to suppress the masker in the unattended ear.  相似文献   

10.
In a test sound consisting of a burst of pink noise, an arbitrarily selected target frequency band can be "enhanced" by the previous presentation of a similar noise with a spectral notch in the target frequency region. As a result of the enhancement, the test sound evokes a pitch sensation corresponding to the pitch of the target band. Here, a pitch comparison task was used to assess enhancement. In the first experiment, a stronger enhancement effect was found when the test sound and its precursor had the same interaural time difference (ITD) than when they had opposite ITDs. Two subsequent experiments were concerned with the audibility of an instance of dichotic pitch in binaural test sounds preceded by precursors. They showed that it is possible to enhance a frequency region on the sole basis of ITD manipulations, using spectrally identical test sounds and precursors. However, the observed effects were small. A major goal of this study was to test the hypothesis that enhancement originates at least in part from neural adaptation processes taking place at a central level of the auditory system. The data failed to provide strong support for this hypothesis.  相似文献   

11.
When a masking sound is spatially separated from a target speech signal, substantial releases from masking typically occur both for speech and noise maskers. However, when a delayed copy of the masker is also presented at the location of the target speech (a condition that has been referred to as the front target, right-front masker or F-RF configuration), the advantages of spatial separation vanish for noise maskers but remain substantial for speech maskers. This effect has been attributed to precedence, which introduces an apparent spatial separation between the target and masker in the F-RF configuration that helps the listener to segregate the target from a masking voice but not from a masking noise. In this study, virtual synthesis techniques were used to examine variations of the F-RF configuration in an attempt to more fully understand the stimulus parameters that influence the release from masking obtained in that condition. The results show that the release from speech-on-speech masking caused by the addition of the delayed copy of the masker is robust across a wide variety of source locations, masker locations, and masker delay values. This suggests that the speech unmasking that occurs in the F-RF configuration is not dependent on any single perceptual cue and may indicate that F-RF speech segregation is only partially based on the apparent left-right location of the RF masker.  相似文献   

12.
The ability to detect the existence of amplitude modulation at a target frequency is reduced when amplitude modulation exists at a flanking frequency. This effect has been termed modulation detection interference (MDI) [Yost and Sheft, J. Acoust. Soc. Am. 85, 848-857 (1989)]. One explanation for MDI holds that the masking and target frequencies are grouped together by the auditory system such that it is difficult to analyze the modulation at each frequency separately. The present study investigated conditions where the asynchrony of temporal gating of the target and flanking frequencies was manipulated in order to make the frequencies more or less likely to be grouped together by the auditory system and perceived as originating from a single putative source. A second experimental manipulation attempted to perceptually segregate the masking and target frequencies on the basis of harmonicity or spectral proximity. The results of the experiments indicated that manipulations that were intended to enhance the segregation of the masking and target frequencies reduced the magnitude of MDI effects. This generally supported an interpretation that MDI is related in some way to auditory grouping. A final experiment was performed in which the subject had to detect the presence of amplitude modulation, but also had to identify which of two frequency components carried the modulation. Subjects were often poor in discriminating which of two frequencies was amplitude modulated, even when the modulation itself was clearly audible. It was concluded that part of the MDI effect might be due to the poor ability of the auditory system to associate modulation with the carrier of the modulation.(ABSTRACT TRUNCATED AT 250 WORDS)  相似文献   

13.
Resonant, active coated nano-particles (CNPs) have been advocated as effective nano-amplifiers and nano-lasers. When it is properly designed to be in its super-resonant state, an electrically small active CNP captures significantly more of the incident field energy than its physical size suggests is possible. The corresponding enhancement of its extinction cross section is correlated with the concentration of the local field energy into its gain region. This energy localization can be visualized with the behavior of the flow lines of the Poynting vector field in the neighborhood of the CNP. Strong expulsion of the optical power generated from the interaction of the captured incident field energy with the gain medium creates an intense scattered field. As the interactions between the scattered field and the exciting plane wave increase, optical vortices form in the neighborhood of the active CNP. Gain depletion eventually occurs when the increase in the effective gain sufficiently detunes the resonance. A simple model for the gain enhancement effects observed in active CNPs is proposed that relates the enhanced effective size of the CNP caused by the field localization to the required gain necessary to achieve its super-resonant state. A comparison of the metal-covered, gain core, active CNP studied previously to the experimentally realized gain-impregnated silica-covered metal “SPASER” suggests that the active CNP design would require significantly less gain while offering a much larger enhancement of the incident field. Proposed modifications of both geometries that augment the field localization suggest further reductions in the gain values needed to achieve significant amplification of the output signal.  相似文献   

14.
In this study we demonstrate an effect for amplitude modulation (AM) that is analogous to forward making of audio frequencies, i.e., the modulation threshold for detection of AM (signal) is raised by preceding AM (masker). In the study we focused on the basic characteristics of the forward-masking effect. Functions representing recovery from AM forward masking measured with a 150- ms 40- Hz masker AM and a 50- ms signal AM of the same rate imposed on the same broadband-noise carrier, showed an exponential decay of forward masking with increasing delay from masker offset. Thresholds remained elevated by more than 2 dB over an interval of at least 150 ms following the masker. Masked-threshold patterns, measured with a fixed signal rate (20, 40, and 80 Hz) and a variable masker rate, showed tuning of the AM forward-masking effect. The tuning was approximately constant across signal modulation rates used and consistent with the idea of modulation-rate selective channels. Combining two equally effective forward maskers of different frequencies did not lead to an increase in forward masking relative to that produced by either component alone. Overall, the results are consistent with modulation-rate selective neural channels that adapt and recover from the adaptation relatively quickly.  相似文献   

15.
The purpose of this report is to present new data that provide a novel perspective on temporal masking, different from that found in the classical auditory literature on this topic. Specifically, measurement conditions are presented that minimize rather than maximize temporal spread of masking for a gated (200-ms) narrow-band (405-Hz-wide) noise masker logarithmically centered at 2500 Hz. Masked detection thresholds were measured for brief sinusoids in a two-interval, forced-choice (21FC) task. Detection was measured at each of 43 temporal positions within the signal observation interval for the sinusoidal signal presented either preceding, during, or following the gating of the masker, which was centered temporally within each 500-ms observation interval. Results are presented for three listeners; first, for detection of a 1900-Hz signal across a range of masker component levels (0-70 dB SPL) and, second, for masked detection as a function of signal frequency (fs = 500-5000 Hz) for a fixed masker component level (40 dB SPL). For signals presented off-frequency from the masker, and at low-to-moderate masker levels, the resulting temporal masking functions are characterized by sharp temporal edges. The sharpness of the edges is accentuated by complex patterns of temporal overshoot and undershoot, corresponding with diminished and enhanced detection, respectively, at both masker onset and offset. This information about the onset and offset timing of the gated masker is faithfully represented in the temporal masking functions over the full decade range of signal frequencies (except for fs=2500 Hz presented at the center frequency of the masker). The precise representation of the timing information is remarkable considering that the temporal envelope characteristics of the gated masker are evident in the remote masking response at least two octaves below the frequencies of the masker at a cochlear place where little or no masker activity would be expected. This general enhancement of the temporal edges of the masking response is reminiscent of spectral edge enhancement by lateral suppression/inhibition.  相似文献   

16.
王玥  李平  崔杰 《声学学报》2013,38(4):501-508
为了在噪声抑制和语音失真中之间寻找最佳平衡,提出了一种听觉频域掩蔽效应的自适应β阶贝叶斯感知估计语音增强算法,以期提高语音增强的综合性能。算法利用了人耳的听觉掩蔽效应,根据计算得到的频域掩蔽阈自适应调整β阶贝叶斯感知估计语音增强算法中的β值,从而仅将噪声抑制在掩蔽阈之下,保留较多的语音信息,降低语音失真。并分别用客观和主观评价方式,对所提出的算法的性能进行了评估,并与原来基于信噪比的自适应β阶贝叶斯感知估计语音增强算法进行了比较。结果表明,频域掩蔽的β阶贝叶斯感知估计方法的综合客观评价结果在信噪比为-10 dB至5 dB之间时均高于基于信噪比的自适应β阶贝叶斯感知估计语音增强算法。主观评价结果也表明频域掩蔽的β阶贝叶斯感知估计方法能在尽量保留语音信息的同时,较好的抑制背景噪声。   相似文献   

17.
Using a closed-set speech recognition paradigm thought to be heavily influenced by informational masking, auditory selective attention was measured in 38 children (ages 4-16 years) and 8 adults (ages 20-30 years). The task required attention to a monaural target speech message that was presented with a time-synchronized distracter message in the same ear. In some conditions a second distracter message or a speech-shaped noise was presented to the other ear. Compared to adults, children required higher target/distracter ratios to reach comparable performance levels, reflecting more informational masking in these listeners. Informational masking in most conditions was confirmed by the fact that a large proportion of the errors made by the listeners were contained in the distracter message(s). There was a monotonic age effect, such that even the children in the oldest age group (13.6-16 years) demonstrated poorer performance than adults. For both children and adults, presentation of an additional distracter in the contralateral ear significantly reduced performance, even when the distracter messages were produced by a talker of different sex than the target talker. The results are consistent with earlier reports from pure-tone masking studies that informational masking effects are much larger in children than in adults.  相似文献   

18.
水声目标识别一直是水声领域研究的重点问题之一,深度学习方法可以有效地解决目标识别问题,然而,水声样本的稀少限制了该方法的应用。该文 提出一种基于数据增强的水声信号深度学习目标识别方法,该方法以Mel功率谱作为网络的输入特征,通过对原始信号在时域和时频域的拉伸和掩蔽等变换,实现数据扩展和增加泛化性能的目的,最后,利用改进的VGG网络模型实现目标分类。实验结果表明,该文方法得到的水下目标识别准确率(95.2%) 要优于其他4种对比方法,证明了该文提出的网络模型和数据增强方法均有助于提高目标分类性能。  相似文献   

19.
This study investigated whether speech-like maskers without linguistic content produce informational masking of speech. The target stimuli were nonsense Chinese Mandarin sentences. In experiment I, the masker contained harmonics the fundamental frequency (F0) of which was sinusoidally modulated and the mean F0 of which was varied. The magnitude of informational masking was evaluated by measuring the change in intelligibility (releasing effect) produced by inducing a perceived spatial separation of the target speech and masker via the precedence effect. The releasing effect was small and was only clear when the target and masker had the same mean F0, suggesting that informational masking was small. Performance with the harmonic maskers was better than with a steady speech-shaped noise (SSN) masker. In experiments II and III, the maskers were speech-like synthesized signals, alternating between segments with harmonic structure and segments composed of SSN. Performance was much worse than for experiment I, and worse than when an SSN masker was used, suggesting that substantial informational masking occurred. The similarity of the F0 contours of the target and masker had little effect. The informational masking effect was not influenced by whether or not the noise-like segments of the masker were synchronous with the unvoiced segments of the target speech.  相似文献   

20.
Coherence masking protection (CMP) refers to the phenomenon in which a target formant is labeled at lower signal-to-noise levels when presented with a stable cosignal consisting of two other formants than when presented alone. This effect has been reported primarily for adults with first-formant (F1) targets and F2/F3 cosignals, but has also been found for children, in fact in greater magnitude. In this experiment, F2 was the target and F1/F3 was the cosignal. Results showed similar effects for each age group as had been found for F1 targets. Implications for auditory prostheses for listeners with hearing loss are discussed.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号