首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Panel speakers are investigated in terms of structural vibration and acoustic radiation. A panel speaker primarily consists of a panel and an inertia exciter. Contrary to conventional speakers, flexural resonance is encouraged such that the panel vibrates as randomly as possible. Simulation tools are developed to facilitate system integration of panel speakers. In particular, electro-mechanical analogy, finite element analysis, and fast Fourier transform are employed to predict panel vibration and the acoustic radiation. Design procedures are also summarized. In order to compare the panel speakers with the conventional speakers, experimental investigations were undertaken to evaluate frequency response, directional response, sensitivity, efficiency, and harmonic distortion of both speakers. The results revealed that the panel speakers suffered from a problem of sensitivity and efficiency. To alleviate the problem, a woofer using electronic compensation based on H2 model matching principle is utilized to supplement the bass response. As indicated in the result, significant improvement over the panel speaker alone was achieved by using the combined panel-woofer system.  相似文献   

2.
A hypophonic voice, characterized perceptually as weak and breathy, is associated with voice disorders such as vocal fold atrophy and unilateral vocal fold paralysis. Although voice therapy programs for hypophonia typically address the vocal folds or the sound source, twang voice quality was examined in this study as an alternative technique for increasing vocal power by altering the epilarynx or the sound filter. OBJECTIVE: This study investigated the effect of twang production on physiologic, acoustic, and perceived voice handicap measures in speakers with hypophonia. DESIGN/METHODS: This prospective pilot study compared the vocal outcomes of six participants with hypophonia at pre- and posttreatment time points. Outcome measures included mean airflow rate, intensity in dB sound pressure level (SPL), maximum phonation time, and self-report of voice handicap. RESULTS: All subjects improved in at least three of the four vocal outcome measures. Wilcoxon signed-rank test of paired differences revealed significant differences between pre- and posttherapy group means for airflow rate, SPL, and Voice Handicap Index scores. CONCLUSION: The twang voice quality as a manipulation of the sound filter offers a clinical complement to traditional voice therapies that primarily address the sound source.  相似文献   

3.
How are listeners able to identify whether the pitch of a brief isolated sample of an unknown voice is high or low in the overall pitch range of that speaker? Does the speaker's voice quality convey crucial information about pitch level? Results and statistical models of two experiments that provide answers to these questions are presented. First, listeners rated the pitch levels of vowels taken over the full pitch ranges of male and female speakers. The absolute f0 of the samples was by far the most important determinant of listeners' ratings, but with some effect of the sex of the speaker. Acoustic measures of voice quality had only a very small effect on these ratings. This result suggests that listeners have expectations about f0s for average speakers of each sex, and judge voice samples against such expectations. Second, listeners judged speaker sex for the same speech samples. Again, absolute f0 was the most important determinant of listeners' judgments, but now voice quality measures also played a role. Thus it seems that pitch level judgments depend on voice quality mostly indirectly, through its information about sex. Absolute f0 is the most important information for deciding both pitch level and speaker sex.  相似文献   

4.
The voice conversion (VC) technique recently has emerged as a new branch of speech synthesis dealing with speaker identity. In this work, a linear prediction (LP) analysis is carried out on speech signals to obtain acoustical parameters related to speaker identity - the speech fundamental frequency, or pitch, voicing decision, signal energy, and vocal tract parameters. Once these parameters are established for two different speakers designated as source and target speakers, statistical mapping functions can then be applied to modify the established parameters. The mapping functions are derived from these parameters in such a way that the source parameters resemble those of the target. Finally, the modified parameters are used to produce the new speech signal. To illustrate the feasibility of the proposed approach, a simple to use voice conversion software has been developed. This VC technique has shown satisfactory results. The synthesized speech signal virtually matching that of the target speaker.  相似文献   

5.
刘力  蔡野锋  吴鸣  杨军 《应用声学》2015,34(1):7-16
针对目前室外及超大型室内空间的扩声应用中声场分布不均匀的问题,本文提出了一种基于最小二乘法声场重建方法的扩声技术。该技术通过对目标声场的逼近来计算线性扬声器阵列各通道的输入参数,实现扩声区域内声压级的均匀分布,同时约束非扩声区域的声能量以获得较好指向性。本文通过仿真研究算法各参数的改变对控制结果的影响,探讨不同扩声区域和目标所对应参数的选取方法。仿真和实验比较了该算法以及未经控制的声场、相移法波束控制的效果,证明该方法可以获得更好的声场均匀度。  相似文献   

6.
俞一彪  曾道建  姜莹 《声学学报》2012,37(3):346-352
提出一种基于完全独立的说话人语音模型进行语音转换的方法。首先每个说话人采用各自的语料训练结构化高斯混合模型(Structured Gaussian Mixture Model,SGMM),然后根据源和目标说话人各自的模型采用全局声学结构(AcousticalUniversal Structure,AUS)进行匹配和高斯分布对准,最终得到相应的转换函数进行语音转换。ABX和MOS实验表明可以得到与传统的平行语料联合训练方法接近的转换性能,并且转换语音的目标说话人识别正确率达到94.5%。实验结果充分说明了本文提出的方法不仅具有较好的转换性能,而且具有较小的训练量和很好的系统扩展性。  相似文献   

7.
SUMMARY: The present study investigated the effect of tonal changes on voice onset time (VOT) between normal laryngeal (NL) and superior esophageal (SE) speakers of Mandarin Chinese. VOT values were measured from the syllables /pha/, /tha/, and /kha/ produced at four tone levels by eight NL and seven SE speakers who were native speakers of Mandarin. Results indicated that Mandarin tones were associated with significantly different VOT values for NL speakers, in which high-falling tone was associated with significantly shorter VOT values than mid-rising tone and falling-rising tone. Regarding speaker group, SE speakers showed significantly shorter VOT values than NL speakers across all tone levels. This may be related to their use of pharyngoesophageal (PE) segment as another sound source. SE speakers appear to take a shorter time to start PE segment vibration compared to NL speakers using the vocal folds for vibration.  相似文献   

8.
An efficient digital equalization method is applied successfully to the problem of spectral equalization of multi-exciter distributed mode loudspeakers (DML). It is based on a chain of second-order sections of infinite impulse response parametric filters with very low computational cost. The method compensates for the measured multi-exciter DML response in order to achieve a desired frequency response. The sound radiation of these flat loudspeakers is a complex superposition of excited modes that vary strongly with frequency. Therefore, the characteristic multi-exciter DML spectrum is very irregular and is equalized with the method presented here for a natural, uncolored response. In multichannel systems, such as wave field synthesis (WFS), the use of efficient filters to equalize a large amount of drivers is an advantageous approach. The equalization process has been applied to two multi-exciter DML prototypes, comprising three and five exciters per panel. Both panel and exciter equalization have been addressed, which consequences on the filtered responses are discussed. Finally, some subjective assessments are carried out to optimize the order of the filter while maintaining the perceived quality of the equalization.  相似文献   

9.
The acoustic effects of the adjustment in vocal effort that is required when the distance between speaker and addressee is varied over a large range (0.3-187.5 m) were investigated in phonated and, at shorter distances, also in whispered speech. Several characteristics were studied in the same sentence produced by men, women, and 7-year-old boys and girls: duration of vowels and consonants, pausing and occurrence of creaky voice, mean and range of F0, certain formant frequencies (F1 in [a] and F3), sound-pressure level (SPL) of voiced segments and [s], and spectral emphasis. In addition to levels and emphasis, vowel duration, F0, and F1 were substantially affected. "Vocal effort" was defined as the communication distance estimated by a group of listeners for each utterance. Most of the observed effects correlated better with this measure than with the actual distance, since some additional factors affected the speakers' choice. Differences between speaker groups emerged in segment durations, pausing behavior, and in the extent to which the SPL of [s] was affected. The whispered versions are compared with the phonated versions produced by the same speakers at the same distance. Several effects of whispering are found to be similar to those of increasing vocal effort.  相似文献   

10.
Under the condition of limited target speaker's corpus, this paper proposed an algorithm for voice conversion using unified tensor dictionary with limited corpus. Firstly,parallel speech of N speakers was selected randomly from the speech corpus to build the base of tensor dictionary. And then, after the operation of multi-series dynamic time warping for those chosen speech, N two-dimension basic dictionaries can be generated which constituted the unified tensor dictionary. During the conversion stage, the two dictionaries of source and target speaker were established by linear combination of the N basic dictionaries using the two speakers' speech. The experimental results showed that when the number of the basic speaker was 14, our algorithm can obtain the compared performance of the traditional NMFbased method with few target speaker corpus, which greatly facilitate the application of voice conversion system.  相似文献   

11.
惠琳  俞一彪 《声学学报》2017,42(6):762-768
提出一种短时频谱通用背景模型群与韵律参数相结合进行年龄语音转换的方法。谱参数转换方面,同一年龄段各说话者提取语音短时谱系数并建立高斯混合模型,然后依据语音特征相似性对说话者进行聚类,每一类训练一个通用背景模型,最终得到通用背景模型群和一组短时频谱转换函数。谱参数转换之后再对共振峰进一步微调。韵律参数转换方面,基频和语速分别建立单高斯和平均时长率模型来推导转换函数。实验结果显示,提出的方法在ABX和MOS等评价指标上比传统的双线性法有明显的优势,相对单一通用背景模型法的对数似然度变化率提高了4%。这一结果表明提出的方法能够使转换语音具有良好目标倾向性的同时有较好的语音质量,性能较传统方法有明显提升。  相似文献   

12.
谷东  简志华 《声学学报》2018,43(5):864-872
针对目标说话人可能存在语料不足的情况,本文提出了一种有限语料下的统一张量字典语音转换算法。从语料库中选取N个说话人作为语音张量字典的基础说话人,通过多序列动态时间规整算法使这N个说话人的平行语音段对齐,从而建立由N个二维基础字典构成的张量字典。在语音转换阶段,源、目标说话人语音都可以通过张量字典中各基础字典的线性组合,构造出各自的语音字典,实现了语音转换。实验结果表明,当基础说话人个数达到14时,只需要极少的目标说话人语料,便可获得与传统的基于非负矩阵分解转换算法相当的转换效果,这极大地方便了语音转换系统的应用。  相似文献   

13.
A phonetogram is a graph showing the sound pressure level (SPL) of softest and loudest phonation over the entire fundamental frequency range of a voice. A physiological interpretation of a phonetogram is facilitated if the SPL is measured with a flat frequency curve and if the vowel /a/ is used. It was found that in soft phonation, the SPL is mainly dependent on the amplitude of the fundamental, while in loud phonation, the SPL is mainly determined by overtones. The short-term SPL variation, i.e., the level variation within a tone, was about 5 dB in soft phonation and close to 2 dB in loud phonation. For two normal voices the long-term SPL variation, calculated as the mean standard deviation of SPL for day-to-day variation, was found to be between 2.4 and 3.4 dB in soft and loud phonation. Speakers who raise their loudness of phonation also tend to raise their mean voice fundamental frequency. Measures obtained from speaking at various voice levels were combined so that typical pathways could be introduced into the phonetogram. The average slope of these pathways was 0.3–0.5 st/dB for healthy subjects. Averaged phonetograms for male singers and male nonsingers did not differ significantly, but averaged phonetograms for female singers and female nonsingers did, in that the upper contour was higher for the female singers. Averaged phonetograms for female patients with non-organic dysphonia showed significantly lower SPL values in loudest phonation as compared to healthy female subjects, while no corresponding difference was seen for males in this regard. With respect to the SPL values for softest phonation, male dysphonic patients showed significantly higher SPL values than healthy male subjects, while no corresponding difference was seen in female subjects. The subglottal pressure mirrored these phonetogram differences between healthy and pathological voices. The averaged phonetograms of female patients after voice therapy showed an increased similarity with those of normal voices. For the male patients the averaged phonetogram did not change significantly after therapy.  相似文献   

14.
By speaking loudly for extended periods, teachers are vulnerable to laryngeal and voice changes associated with vocal fold “vibration overdose.” Voice clinicians frequently recommend voice amplification ostensibly designed to reduce vibration dose and improve voice. However, there are few data regarding the degree of vocal loudness attenuation achieved by specific amplification devices. The purpose of this investigation was to examine the effectiveness of the ChatterVox™ Portable Voice Amplification System (Siemens Hearing Instruments) for reducing the sound pressure level (SPL) of a speaker's voice during a simulated classroom lecture. Ten participants were instructed to continuously read one of two phonetically balanced passages while amplified and unamplified. Voice intensity measurements were obtained at three inches from the mouth (i.e., mouth level) and at the back of a classroom in both amplified and unamplified conditions. When amplified with the ChatterVox™, speakers experienced an average decrease in vocal intensity at mouth-level of 6.03 dB SPL (p < 0.002). Furthermore, an average increase of 2.55 dB SPL (p < 0.038) at the back of the classroom was observed. Collectively, these results indicate that the ChatterVox™ amplification device reduced the speaker's vocal intensity level at the microphone, while it augmented the voice heard at the back of the classroom. By inference, this degree of vocal attenuation at mouth level should contribute to a desirable reduction in vibration dose, thus lowering the risk of vibration overdose.  相似文献   

15.
16.
《Journal of voice》2020,34(5):806.e7-806.e18
There is a high prevalence of dysphonia among professional voice users and the impact of the disordered voice on the speaker is well documented. However, there is minimal research on the impact of the disordered voice on the listener. Considering that professional voice users include teachers and air-traffic controllers, among others, it is imperative to determine the impact of a disordered voice on the listener. To address this, the objectives of the current study included: (1) determine whether there are differences in speech intelligibility between individuals with healthy voices and those with dysphonia; (2) understand whether cognitive-perceptual strategies increase speech intelligibility for dysphonic speakers; and (3) determine the relationship between subjective voice quality ratings and speech intelligibility. Sentence stimuli were recorded from 12 speakers with dysphonia and four age- and gender-matched typical, healthy speakers and presented to 129 healthy listeners divided into one of three strategy groups (ie, control, acknowledgement, and listener strategies). Four expert raters also completed a perceptual voice assessment using the Consensus Assessment Perceptual Evaluation of Voice for each speaker. Results indicated that dysphonic voices were significantly less intelligible than healthy voices (P0.001) and the use of cognitive-perceptual strategies provided to the listener did not significantly improve speech intelligibility scores (P = 0.602). Using the subjective voice quality ratings, regression analysis found that breathiness was able to predict 41% of the variance associated with number of errors (P = 0.008). Overall results of the study suggest that speakers with dysphonia demonstrate reduced speech intelligibility and that providing the listener with specific strategies may not result in improved intelligibility.  相似文献   

17.
This study investigated the perceptual and acoustical characteristicsof vocal presentation in both the masculine and the feminine modes by the same group of male subjects. Listeners (N = 88) evaluated 22 voice samples by using 18 semantic differential scales and 57 adjectives. The 22 voice samples were provided by I I biologically male speakers, who described themselves as heterosexual crossdressers. Each speaker read a standard passage under controlled conditions. In one reading, they demonstrated their typical masculine voice and in the other they spoke in their feminine voice. Acoustical analyses included mean fundamental frequency, frequency range, overall passage duration, and duration of a sample of stressed vowels. Results indicated that listeners heard significant differences between masculine and feminine presentations across the I I speakers and the 18 semantic differential scales. Masculine-feminine and high-low pitch were the most salient scales in the perceptual judgments. Acoustical analyses indicated wide variation according to speaker and condition. Clinical applications are provided.  相似文献   

18.
毛燕蓉  沈勇 《应用声学》2019,38(2):217-222
与传统电动式扬声器单元相比,微型扬声器单元由于缺少定位支片等部件,更容易受到摆动模态的影响。该文利用激光传感器采集微型扬声器单元振膜各点振动位移,通过理论计算,从辐射声压级、辐射指向性等角度探究矩形微型扬声器单元摆动模态对其辐射声场的影响。经过研究发现,摆动模态会造成微型扬声器单元频响凹陷、中低频存在指向性等现象,对其辐射声场产生明显的影响。  相似文献   

19.
The social context of noise exposure is a codeterminant of noise annoyance. The present study shows that fairness of the exposure procedure (sound management) can be used as an instrument to reduce noise annoyance. In a laboratory experiment (N = 117) participants are exposed to aircraft sound of different sound pressure level (SPL: 50 vs 70 dB A)--which is experienced as noise--while they work on a reading task. The exposure procedure (fair versus neutral) is modeled in line with findings from social justice theory. In the fair condition, participants can voice their preference for a certain sound sample, although they cannot deduce whether their preference is granted. In the neutral condition, participants are not asked to voice their preference. Results show the predicted interaction effect of sound pressure level and procedure on annoyance: Annoyance ratings are significantly lower in the fair condition than in the neutral condition, but this effect is found only in the 70 dB condition. When the SPL is considerably disturbing, fair procedures reduce noise annoyance. Consequences of the reported findings for both theory and practice are discussed.  相似文献   

20.
用神经阵列网络进行文本无关的说话人识别   总被引:9,自引:1,他引:8  
提出了一种可用于说话人识别的神经阵列网络,它以仅完成两类模式区分的小型网络作为子网络,再将单个子网络组合成阵列形式来完成多类模式的区分。文中给出了阵列网络的构成及搜索算法,并使用径向基函数(RBF)阵列网络进行了文本无关的说话人识别的研究。实验显示,对 20名说话人,用 5秒语音训练, 2秒语音识别时,该方法可达到 98%的正确识别率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号