共查询到20条相似文献,搜索用时 0 毫秒
1.
Peng Jianxin 《Applied Acoustics》2010,71(4):386-390
The speech intelligibility in classroom can be influenced by background-noise levels, speech sound pressure level (SSPL), reverberation time and signal-to-noise ratio (SNR). The relationship between SSPL and subjective Chinese Mandarin speech intelligibility and the effect of different SNRs on Chinese Mandarin speech intelligibility in the simulated classroom were investigated through room acoustical simulation, auralisation technique and subjective evaluation. Chinese speech intelligibility test signals recorded in anechoic chamber were convolved with the simulated binaural room impulse responses, and then reproduced through the headphone by different SSPLs and SNRs. The results show that Chinese Mandarin speech intelligibility scores increase with increasing of SSPLs and SNRs within a certain range in simulated classrooms. Chinese Mandarin speech intelligibility scores have no significant difference with SNRs of no less than 15 dBA under the same reverberation time condition. 相似文献
2.
João Candido Fernandes 《Applied Acoustics》2003,64(6):581-590
The purpose of this study was to determine the influence of hearing protection devices (HPDs) on the understanding of speech in young adults with normal hearing, both in a silent situation and in the presence of ambient noise. The experimental research was carried out with the following variables: five different conditions of HPD use (without protectors, with two earplugs and with two earmuffs); a type of noise (pink noise); 4 test levels (60, 70, 80 and 90 dB[A]); 6 signal/noise ratios (without noise, +5, +10, zero, −5 and −10 dB); 5 repetitions for each case, totalling 600 tests with 10 monosyllables in each one. The variable measure was the percentage of correctly heard words (monosyllabic) in the test. The results revealed that, at the lowest levels (60 and 70 dB), the protectors reduced the intelligibility of speech (compared to the tests without protectors) while, in the presence of ambient noise levels of 80 and 90 dB and unfavourable signal/noise ratios (0, −5 and −10 dB), the HPDs improved the intelligibility. A comparison of the effectiveness of earplugs versus earmuffs showed that the former offer greater efficiency in respect to the recognition of speech, providing a 30% improvement over situations in which no protection is used. As might be expected, this study confirmed that the protectors' influence on speech intelligibility is related directly to the spectral curve of the protector's attenuation. 相似文献
3.
Speech intelligibility in these places of worship has been assessed through a study of the spatial distribution of the RASTI and the energy-based acoustic parameters, clarity for speech, C50 and definition, D50. Parameters have been obtained by intensity-modulated stationary noise (IMN) signals and by maximum length sequence (MLS) signals in order to obtain the impulse responses. The 12 churches analysed correspond to the same highly characteristic architectural typology of southern Spain, and measurements were taken without occupancy. A full correlation between those RASTI, C50 and D50 parameters produced by different experimental techniques has been carried out. This correlation has lead to a more profound characterization of these churches from this functional point of view, together with an analysis of the capability of each parameter to take into account the different aspects in the degradation of the signal from speaker to listener in a room, and has also lead to a study of the subjective range of qualification of the energy-based acoustic parameters. Simultaneously, a comparison has been made of the values of those variables with those expected from a semi-empirical model deduced for these religious spaces. 相似文献
4.
Jianxin Peng 《Applied Acoustics》2005,66(5):591-601
Subjective speech intelligibility can be assessed by speech recorded in an anechoic chamber and then convolved with room impulse responses that can be created by acoustic simulation. The speech intelligibility (SI) assessment based on auralization was validated in three rooms. The articulation scores obtained from simulated sound field were compared with the ones from measured sound field and from direct listening in rooms. Results show that the speech intelligibility prediction based on auralization technique with simulated binaural room impulse responses (BRIRs) is in agreement with reality and results from measured BRIRs. When this technique is used with simulated and measured monaural room impulse responses (MRIRs), the predicted results underestimate the reality. It has been shown that auralization technique with simulated BRIRs is capable of assessing subjective speech intelligibility of listening positions in the room. 相似文献
5.
Ning Han 《Applied Acoustics》2008,69(11):945-950
Optimal classroom acoustical design can directly enhance students’ learning efficiency. Effective acoustical designs are important and necessary to achieve a high degree of speech intelligibility for listeners. A speech intelligibility metric, U50, at different receiver positions in a classroom of 10 m × 8 m × 6 m was obtained by numerical simulations based on the mirror image model, with and without the uniform surface absorption coefficient. Comparisons show that increasing the absorption coefficient at the back wall can increase the speech intelligibility metric U50 to the largest extent in the classroom. A numerical case study was then conducted in a typical classroom of 10 m × 10 m × 3.5 m, and the speech intelligibility was assessed through a third-order polynomial of Wonyoung and Murray [Wonyoung Y, Murray H. Auralization study of optimal reverberation times for speech intelligibility for normal and hearing-impaired listeners in classrooms with diffuse sound field. J Acoust Soc Am 2006;120(2):801-7]. 相似文献
6.
This paper examines the impact of room acoustic conditions on the speech intelligibility of four languages (English, Polish, Arabic and Mandarin). Listening test scores (diagnostic rhyme tests, phonemically balanced word tests and phonemically balanced sentence tests) of the four languages were compared under four room acoustic conditions defined by their speech transmission index (STI = 0.2, 0.4, 0.6 and 0.8). The results obtained indicated that there was a statistically significant difference between the word intelligibility scores of languages under all room acoustic conditions, apart from the STI = 0.8 condition. English was the most intelligible language under all conditions, and differences with other languages were larger when conditions were poor (maximum difference of 29% at STI = 0.2, 33% at STI = 0.4 and 14% at STI = 0.6). Results also showed that Arabic and Polish were particularly sensitive to background noise, and that Mandarin was significantly more intelligible than those languages at STI = 0.4. Consonant-to-vowel ratios and languages’ distinctive features and acoustical properties explained some of the scores obtained. Sentence intelligibility scores confirmed variations between languages, but these variations were statistically significant only at the STI = 0.4 condition (sentence tests being less sensitive to very good and very poor room acoustic conditions). Overall, the results indicate that large variations between the speech intelligibility of different languages can occur, especially for spaces that are expected to be challenging in terms of room acoustic conditions. Recommendations solely based on room acoustic parameters (e.g. STI) might then prove to be insufficient for designing a multilingual environment. 相似文献
7.
This paper addresses the problem of speech intelligibility enhancement by adaptive filtering algorithms employed with subband techniques. The two structures named the forward and backward blind source separation structures are extensively used in the speech enhancement and source separation areas, and largely studied in the literature with convolutive and non-convolutive mixtures. These two structures use two-microphones to generate the convolutive/non-convolutive mixing signal, and provide at the outputs the target and the jammer signal components. In this paper, we focus our interest on the backward structure employed to enhance the speech signal from a convolutive mixture. Furthermore, we propose a subband implementation of this structure to improve its behavior with speech signal. The new proposed subband-Backward BSS (SBBSS) structure allows a very important improvement of the convergence speed of the adaptive filtering algorithms when the subband-number is selected high. In order to improve the robustness of the proposed subband structure, we have adapted then applied a new criterion that combines the System Mismatch and the Mean-Errors criterion minimization. The proposed subband backward structure, when it is combined with this new criterion minimization, allows to enhance the output speech signal by reducing the distortion and the noise components. The performance of the proposed subband backward structure is validated through several objective criteria which are given and described in this paper. 相似文献
8.
Lavandier M Jelfs S Culling JF Watkins AJ Raimond AP Makin SJ 《The Journal of the Acoustical Society of America》2012,131(1):218-231
When speech is in competition with interfering sources in rooms, monaural indicators of intelligibility fail to take account of the listener's abilities to separate target speech from interfering sounds using the binaural system. In order to incorporate these segregation abilities and their susceptibility to reverberation, Lavandier and Culling [J. Acoust. Soc. Am. 127, 387-399 (2010)] proposed a model which combines effects of better-ear listening and binaural unmasking. A computationally efficient version of this model is evaluated here under more realistic conditions that include head shadow, multiple stationary noise sources, and real-room acoustics. Three experiments are presented in which speech reception thresholds were measured in the presence of one to three interferers using real-room listening over headphones, simulated by convolving anechoic stimuli with binaural room impulse-responses measured with dummy-head transducers in five rooms. Without fitting any parameter of the model, there was close correspondence between measured and predicted differences in threshold across all tested conditions. The model's components of better-ear listening and binaural unmasking were validated both in isolation and in combination. The computational efficiency of this prediction method allows the generation of complex "intelligibility maps" from room designs. 相似文献
9.
10.
A number of objective evaluation methods are currently used to quantify the speech intelligibility in a built environment, including the speech transmission index (STI), rapid speech transmission index (RASTI), articulation index (AI), and the percent articulation loss of consonants (%ALCons). Certain software programs can quickly evaluate STI, RASTI, and %ALCons from a measured room impulse response. In this project, two impulse-response-based software packages (WinMLS and SIA-Smaart Acoustic Tools) were evaluated for their ability to determine intelligibility accurately. In four different spaces with background noise levels less than NC 45, speech intelligibility was measured via three methods: (1) with WinMLS 2000; (2) with SIA-Smaart Acoustic Tools (v4.0.2); and (3) from listening tests with humans. The study found that WinMLS measurements of speech intelligibility based on STI, RASTI, and %ALCons corresponded well with performance on the listening tests. SIA-Smaart results were correlated to human responses, but tended to under-predict intelligibility based on STI and RASTI, and over-predict intelligibility based on %ALCons. 相似文献
11.
We proposed and evaluated an estimation method for the forced selection speech intelligibility tests. Our proposal takes into account the forced selection manner of the Diagnostic Rhyme Test (DRT), which forces selection from a pair of rhyming words. A distance measure is calculated between the test word and the two candidate words, respectively, and the distance is compared to select the most likely word. We compared two distance measures. The first objective distance measure used here was based on the Articulation index Band Correlation (ABC). The ABC is the correlation of time–frequency (T–F) patterns between the test word and the template word speech of the two words in the candidate word pair. The word with the higher correlation was decided to be the likely candidate word. The T–F pattern was calculated in the Articulation Index (AI) bands, and the correlation was calculated between the corresponding bands of the test and candidate word sample. In order to estimate the intelligibility, we calculate the ratio of the number of bands in which higher correlation is seen for the correct word vs. the total number of bands (named ABC-est). This ratio quantifies how well the test word matches the correct word in the word pair. For the second objective distance, we used a measure based on the frequency-weighted segmental SNR (fwSNRseg). Segmental SNR (SNRseg) was calculated in AI bands, and compared among the candidate word templates. We then calculated the frequency-weighted ratio of the number of bands in which higher SNRseg was observed for the correct word vs. the total number of bands (named fwSNRseg-est), again to quantify how well the test word matches the selected candidate word in the pair. We estimated a logistic mapping function from the above two ratios to intelligibility scores using speech mixed with known noise. The mapping functions were then used to estimate the intelligibility of speech mixed with unknown noise. This estimation was compared to another measure that we previously evaluated, the conventional fwSNRseg, which directly maps the measure to intelligibility. Both proposed measures were proven to be significantly more accurate than conventional fwSNRseg. For most cases, the accuracy was comparable between the two proposed distance measures, ABC-est and fwSNRseg-est, with the latter showing correlation between the subjective and estimated intelligibility as high as 0.97, and root mean square as low as 0.11 for one of the test sets, but not as accurate for other sets. The ABC-est showed more stable accuracy for all sets. However, both measures show practical accuracies in all conditions tested. Thus, it should be possible to “screen” the intelligibility in many of the noise conditions to be tested, and cut down on the scale of the subjective test needed. 相似文献
12.
13.
In several auditoria, it has been observed that the reverberation time is longer than expected and that the cause is a horizontal reverberant field established in the region near the ceiling, a field which is remote from the sound absorbing audience. This has been observed in the Boston Symphony Hall, Massachusetts, and the Stadthalle Göttingen, Germany. Subjective remarks on their acoustics suggest that there are no unfavourable comments linked to the secondary sound field. Two acoustic scale models are considered here. In a generic rectangular concert hall model, the walls and ceiling contained openings in which either plane or scattering panels could be placed. With plane panels, the model reverberation time (RT) was measured as 53% higher than the Sabine prediction (frequency 500/1000 Hz), compared with 8% higher with scattering panels. The second model of a 300 seat lecture theatre with a 6 m or 8 m high ceiling had raked seating. In this case, the amount of absorption in the model was increased until the point was reached where speech had acceptable intelligibility, with the early energy fraction, D ? 0.5. For this acceptable speech condition with the 6 m ceiling, the measured mid-frequency T15 was 1.47 s, whereas the Sabine predicted RT was 1.06 s. The sound decay was basically non-linear with T30 > T15 > EDT. Exploiting a high-level horizontal reverberant field offers the possibility of acoustics that are better adapted as suitable for both speech and unamplified music, without any physical change in the auditorium. Using secondary reverberation in an auditorium for a wide variety of music might also be beneficial. 相似文献
14.
This article reports on the performance of an adaptive subband noise cancellation scheme, which performs binaural preprocessing of speech signals for a hearing-aid application. The multi-microphone subband adaptive (MMSBA) signal processing scheme uses the least mean squares (LMS) algorithm in frequency-limited subbands. The use of subbands enables a diverse processing mechanism to be employed, splitting the two-channel wide-band signal into smaller frequency-limited subbands, which can be processed according to their individual signal characteristics. The frequency delimiting used a linear- or cochlear-spaced subband distribution. The effect of the processing scheme on speech intelligibility was assessed in a trial involving 15 hearing-impaired volunteers with moderate sensorineural hearing loss. The acoustic material consisted of speech and speech-shaped noise signals, generated using simulated and real-room acoustic environments, at signal-to-noise ratios (SNRs) in the range -6 to +3 dB. The results show that the MMSBA scheme delivered average speech intelligibility improvements of 11.5%, with a maximum of 37.25%, in noisy reverberant conditions. There was no significant reduction in mean speech intelligibility due to processing, in any of the test conditions. 相似文献
15.
This paper examines the accuracy of the speech transmission index (STI) calculated from the reverberation time (T) and signal-to-noise ratio (LSN) of enclosed spaces. Differences between measured and predicted STIs have been analysed in two rooms (reverberant vs. absorbent), for a wide range of absorption conditions and signal-to-noise ratios (sixteen tests). The STI was measured using maximum length sequence analysis and predictions were calculated using either measured or predicted values of T and LSN, the latter assuming diffuse sound field conditions. The results obtained for all the conditions tested showed that STI predictions based on T and LSN tend to underestimate the STI, with differences between measured and predicted STIs always lower than 0.1 (on a 0.0–1.0 scale), and on average lower than 0.06. According to previous research, these differences are noticeable and therefore non-negligible, as 0.03 is the just noticeable difference in STI. The use of either measured or predicted values of T and LSN provided similar STI predictions (i.e. non-noticeable changes), with differences between predictions that are on average lower than 0.03 for the absorbent room, and lower than 0.01 for the reverberant room. 相似文献
16.
Unattended background speech is a known source of cognitive and subjective distraction in open-plan offices. This study investigated whether the deleterious effects of background speech can be affected by room acoustic design that decreases speech intelligibility, as measured by the Speech Transmission Index (STI). The experiment was conducted in an open-plan office laboratory (84 m2) in which four acoustic conditions were physically built. Three conditions contained background speech. A quiet condition was included for comparison. The speech conditions differed in terms of the degree of absorption, screen height, desk isolation, and the level of masking sound. The speech sounds simulated an environment where phone conversations are heard from different locations varying in distance. Ninety-eight volunteers were tested. The presence of background speech had detrimental effects on the subjective perceptions of noise effects and on cognitive performance in short-term memory and working memory tasks. These effects were not attenuated nor amplified within a three-hour working period. The reduction of the STI by room acoustic means decreased subjective disturbance, whereas the effects on cognitive performance were somewhat smaller than expected. The effects of room acoustic design on subjective distraction were stronger among noise-sensitive subjects, suggesting that they benefited more from acoustic improvements than non-sensitive subjects. The results imply that reducing the STI is beneficial for performance and acoustic satisfaction especially regarding speech coming from more distant desks. However, acoustic design does not sufficiently decrease the distraction caused by speech from adjacent desks. 相似文献
17.
Speech transmission index (STI) is an objective measure of the acoustic properties of office environments and is used to specify norms for acceptable acoustic work conditions. Yet, the tasks used to evaluate the effects of varying STIs on work performance have often been focusing on memory (as memory of visually presented words) and reading tasks and may not give a complete view of the severity even of low STI values (i.e., when speech intelligibility is low). Against this background, we used a more typical office-work task in the present study. The participants were asked to write short essays (5 min per essay) in 5 different STI conditions (0.08; 0.23; 0.34; 0.50; and 0.71). Writing fluency dropped drastically and the number of pauses longer than 5 s increased at STI values above 0.23. This study shows that realistic work-related performance drops even at low STI values and has implications for how to evaluate acoustic conditions in school and office environments. 相似文献
18.
已有骨导语音增强算法重点关注语音幅度谱增强,在波形合成时会因为相位不匹配导致语音质量下降。为解决该问题,提出了一种融合相位信息的波形网络(WaveNet)模型实现骨导语音增强波形生成。该方法以频带扩展WaveNet为基础,融合骨导语音相位谱信息与增强的语音幅度谱作为模型的条件特征,根据融合特征生成增强语音波形,实现了相位信息的有效利用。仿真实验综合对比了群时延谱和瞬时频率偏差谱相位特征,主客观结果表明,不论是采用串联融合还是卷积融合方式,骨导语音相位信息均有效补充了原有幅度谱条件特征,改善了语音增强效果。利用串联方式融合群时延谱特征可得到最佳结果,相比于原始骨导语音,平均意见得分(MOS)提升了约54.3%。 相似文献
19.
I.IntroductionTheF,patternsofspeechareimportantnotonlyforthcprosodicfeaturesbuta1soforvoicesourcecharactcristics.Nowmoreandmorespeechscientistsrecognizedthatvoiceexcitationsourceintcxt-to-spccchsystemsp1aysanimportantro1elnbothintclligibilityandnaturalnessorsynthcticspcech.Espccially,forChinese,atone1anguagewithmulti-tonesystem,thetonalpatternswhicharcmainlydcmonstratedintheF,con-tourscarry1exicalmeaning.SomecomparativestudiesoftheF,pattcrnsinbetweentonelanguage(Chinese)andstress1anguage(En… 相似文献
20.
语音质量的客观评价可以代替昂贵的人工评分,但是目前客观指标的计算通常需要纯净的参考语音,这在许多实际声学系统中很难获得。为此提出了一种融合辅助目标学习和卷积循环网络(CRN)的非侵入式语音质量评价算法。为降低算法的复杂度,算法采用基于仿人耳听觉特性滤波器的Bark频率倒谱系数(BFCCs)作为CRN的输入。算法首先构建一个卷积神经网络(CNN)从BFCCs中提取帧级特征。然后,构建双向的长短记忆网络,在帧级特征中建模长期的时间依赖性和序列特征。最后,利用自注意力机制自适应地从帧级特征中筛选出有用信息,将其整合至话语层面的特征中,并将这些话语级特征映射为客观得分。为改善质量评测的有效性,算法采用多任务训练策略,引入语音激活检测(VAD)作为辅助学习目标。基于开源数据库的实验显示,与其他非侵入式算法相比,提出的算法和平均主观意见分(MOS)具有更好的相关性。而且,算法参数规模较小且对ITU-T P.808发布的带有主观MOS的失真语音数据库具有良好的泛化能力,接近语音质量感知评估(PESQ)指标的精度。 相似文献