首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
A mathematical speech production model is considered that describes acoustic oscillation propagation in a vocal tract with mobile walls. The wave field function satisfies the Helmholtz equation with boundary conditions of the third kind (impedance type). The impedance mode corresponds to a threeparameter pendulum oscillation model. The experimental research demonstrates the nonlinear character of how the mobility of the vocal tract walls influence the spectral envelope of a speech signal.  相似文献   

2.
Modeling the peripheral speech motor system can advance the understanding of speech motor control and audiovisual speech perception. A 3-D physical model of the human face is presented. The model represents the soft tissue biomechanics with a multilayer deformable mesh. The mesh is controlled by a set of modeled facial muscles which uses a standard Hill-type representation of muscle dynamics. In a test of the model, recorded intramuscular electromyography (EMG) was used to activate the modeled muscles and the kinematics of the mesh was compared with 3-D kinematics recorded with OPTOTRAK. Overall, there was a good match between the recorded data and the model's movements. Animations of the model are provided as MPEG movies.  相似文献   

3.
一种改进的DNN-HMM的语音识别方法*   总被引:1,自引:1,他引:1       下载免费PDF全文
针对深度神经网络与隐马尔可夫模型(DNN-HMM)结合的声学模型在语音识别过程中建模能力有限等问题,提出了一种改进的DNN-HMM模型语音识别算法。首先根据深度置信网络(DBN)结合深度玻尔兹曼机(DBM),建立深度神经网络声学模型,然后提取梅尔频率倒谱系数(MFCC)和对数域的Mel滤波器组系数(Fbank)作为声学特征参数,通过TIMIT语音数据集进行实验。实验结果表明:结合了DBM的DNN-HMM模型相比DNN-HMM模型更具优势,其中,使用MFCC声学特征在词错误率与句错误率方面分别下降了1.26%和0.20%。此外,使用默认滤波器组的Fbank特征在词错误率与句错误率方面分别下降了0.48%和0.82%,并且适量增加滤波器组可以降低错误率。总之,研究取得句错误率与词错误率分别降低到21.06%和3.12%的好成绩。  相似文献   

4.
俞振利  程伯中 《声学学报》2000,25(5):455-462
提出基于语音生成模型和发音模型RTLA合成模式实现以共振峰轨迹为目标的语音合成的新方法。该方法采用了基于发音声学原理的反射型传输线模型来实现语音合成器。用于控制合成器的声道面积函数参数由以三个共振峰轨迹为目标的语音生成逆向解获得。该方法不仅可以得到动态过渡和自然度好的合成语音,能够方便灵活地控制或改变语音音色,合成器所需的输入控制参数少,参数更新率低。  相似文献   

5.
According to Wyke and Kirchner (Wyke B, Kirchner J. Neurology of the larynx. In: Hinchcliffe R, Harrison D, eds. Scientific foundation of otolaryngology. London: William Heinemann Medical Books, 1976:546–66) mechanoreceptors in the subglottal mucosa play a significant role in the control of laryngeal muscle activity in response to changes of subglottal pressure during phonation. In singers this pressure is adapted not only to phonatory loudness but also to fundamental frequency. By spraying Xylocaine solution with a needle inserted into the trachea through the anterior gap between the cricoid and thyroid cartilages, the subglottal mucosa was anesthetized in three singers. The effects on subglottal pressure and fundamental frequency of this anesthesia were examined. The pressure effects varied between the subjects, whereas the fundamental frequency accuracy was adversely affected in all three subjects. The implications of these findings are discussed.  相似文献   

6.
An acoustic system for the individual recognition of insects   总被引:1,自引:0,他引:1  
Research into acoustic recognition systems for insects has focused on species identification rather than individual identification. In this paper, the feasibility of applying pattern recognition techniques to construct an acoustic system capable of automatic individual recognition for insects is investigated analytically and experimentally across two species of Orthoptera. Mel-frequency cepstral coefficients serve as the acoustic feature, and α-Gaussian mixture models were selected as the classification models. The performance of the proposed acoustic system is promising and displays high accuracy. The results suggest that the acoustic feature and classifier method developed here have potential for individual animal recognition and can be applied to other species of interest.  相似文献   

7.
Effects of noise on speech production: acoustic and perceptual analyses   总被引:4,自引:0,他引:4  
Acoustical analyses were carried out on a set of utterances produced by two male speakers talking in quiet and in 80, 90, and 100 dB SPL of masking noise. In addition to replicating previous studies demonstrating increases in amplitude, duration, and vocal pitch while talking in noise, these analyses also found reliable differences in the formant frequencies and short-term spectra of vowels. Perceptual experiments were also conducted to assess the intelligibility of utterances produced in quiet and in noise when they were presented at equal S/N ratios for identification. In each experiment, utterances originally produced in noise were found to be more intelligible than utterances produced in the quiet. The results of the acoustic analyses showed clear and consistent differences in the acoustic-phonetic characteristics of speech produced in quiet versus noisy environments. Moreover, these accounts differences produced reliable effects on intelligibility. The findings are discussed in terms of: (1) the nature of the acoustic changes that taken place when speakers produce speech under adverse conditions such as noise, psychological stress, or high cognitive load: (2) the role of training and feedback in controlling and modifying a talker's speech to improve performance of current speech recognizers; and (3) the development of robust algorithms for recognition of speech in noise.  相似文献   

8.
Public engagement with science, technology, and engineering is seen as being increasingly important as the numbers of school leavers choosing to read for degrees in these areas is typically dropping. Engagement with pupils during their school years is seen as being a key element in influencing their choices of career for which seeds are sown from the primary years. Acoustics is an excellent vehicle for public engagement since the demonstrations can be appreciated directly by the sense of hearing and the underlying principles also apply in many branches of physics and engineering. This paper describes a number of demonstrations that have been employed during science engagement events for schools and the general public in the context of the principles of acoustics and human speech production. The apparatus used, which in some cases has been purpose-built, is described along with the activities themselves. In addition, a way to quantify the success of the process is proposed that involves a single button press on entry to and exit from an event.  相似文献   

9.
邵健  赵庆卫  颜永红 《声学学报》2010,35(5):587-592
研究汉语自然口语识别中的建模单元选择问题。在HMM三状态模型中,声韵母单元与音素单元作为两种最流行的建模单元各有优劣。一方面从自然口语音变严重的问题出发,倾向采用粗粒度的声韵母单元以概括各种音变;另一方面从三状态结构可能无法有效描述复杂单元的问题出发,又倾向采用细粒度的音素单元。本文在实验语音学理论研究成果与声韵母时长分析实验结果的基础上,主张对扩展声韵母单元进行有选择的拆分,提出了基于鼻韵尾分离的声韵母拆分方法。实验结果表明本文的方法与扩展声韵母单元、音素单元相比,识别性能有了明显改善,其字错误率分别降低2.23%和9.45%。  相似文献   

10.
11.
This paper proposes a model for acoustic radiation impedance of the mouth in the form of the equivalent electrical network. Five known models of radiation impedance are compared: radiation of a circular piston set in a spherical baffle: radiation of a circular piston set in an infinite baffle, the Flanagan model, the Wakita and Fant model, and the Stevens, Kasowski and Fant model. The proposed model most accurately approximates the radiation impedance of a circular piston set in a spherical baffle. Differences between the acoustic resistance and reactance calculated by the proposed model and the piston set in a spherical baffle of 9 cm radius are relatively small in the kr < 2 region. The deviations in calculated values of the acoustic resistance and the reactance are within ±0.023 × ρc/Am and ±0.008 × ρc/Am, respectively, where Am denotes the area of the mouth aperture. The accuracy of the proposed model is demonstrated by vowel formant frequency calculations. Differences in formant frequencies calculated by applying the proposed model and the piston set in a spherical baffle model are less than 0.3%.  相似文献   

12.
A new method "simultaneous inverse filtering and model matching" (SIM) is proposed that allows one to calculate voice source measures without any user interaction. It is based on the discrete all-pole modeling (DAP) technique for inverse filtering (IF), which is modified to include a model of the glottal flow as integral part [LF model, Fant et al., STL-QPSR (Stockholm) 4/1985, 1-13 (1986)]. As the correct LF parameters are initially unknown, they are estimated in an iterative procedure using multi-dimensional optimization techniques that are initialized according to the results of an exhaustive search. The error criteria applied reflect how well the IF is performed after the spectral contribution of the glottal flow has been removed. The resulting optimal LF parameter constellation serves as the basis to calculate 11 voice source measures. The performance was evaluated using synthesized signals and recordings of natural utterances. For the synthesized signals, the accuracy to reproduce the original parameters was high (correlations exceeding 0.88) for measures where the starting point of the glottal cycle did not enter explicitly. Errors were smaller compared to conventional estimation methods where the measures were estimated from the IF signal. The analysis of natural utterances indicates that problems still exist with regard to robustness, but that under advantageous conditions the open quotient, the speed quotient, the closing quotient, the parabolic spectral parameter, and the negative peak amplitude of the glottal flow derivative can indeed be determined automatically by the SIM method.  相似文献   

13.
Magnetic resonance imaging (MRI) has served as a valuable tool for studying static postures in speech production. Now, recent improvements in temporal resolution are making it possible to examine the dynamics of vocal-tract shaping during fluent speech using MRI. The present study uses spiral k-space acquisitions with a low flip-angle gradient echo pulse sequence on a conventional GE Signa 1.5-T CV/i scanner. This strategy allows for acquisition rates of 8-9 images per second and reconstruction rates of 20-24 images per second, making veridical movies of speech production now possible. Segmental durations, positions, and interarticulator timing can all be quantitatively evaluated. Data show clear real-time movements of the lips, tongue, and velum. Sample movies and data analysis strategies are presented.  相似文献   

14.
On the basis of the concept of a two-line density matrix and considering the tunneling effect a set of simultaneous differential equations is derived relating all its components. An equation is obtained describing the polarization vector relaxation in an electret. The relaxation parameter corresponding to the polarization vector total relaxation time is calculated microscopically. A system of differential equations allowing for the effect on the atoms of a strongly oscillating classical electric field is proposed. An analytical formula for thermal current in an electret is obtained and the behavior of two-level systems in a high-frequency electric field is analyzed.  相似文献   

15.
The finite-difference time-domain method is a simple but powerful numerical method for simulating full-wave acoustic propagation and scattering. However, the method can demand a large amount of computational resources. Traditionally, continuously curved boundaries are represented in a stair-step fashion and thus accurately modeling scattering from a boundary will require a finer discretization than would otherwise be necessary for modeling propagation in a homogeneous medium. However, a fine discretization might not be practical due to limited computational resources. A locally conformal technique is presented here for modeling acoustic scattering from continuously curved rigid boundaries. This technique is low cost, simple to implement, and gives better results for the same grid discretization than the traditional stair-step representation. These improvements can be traded for a coarser discretization which reduces the computational burden. The improved accuracy of this technique is demonstrated for a spherical scatterer.  相似文献   

16.
This paper shows an accurate speech detection algorithm for improving the performance of speech recognition systems working in noisy environments. The proposed method is based on a hard decision clustering approach where a set of prototypes is used to characterize the noisy channel. Detecting the presence of speech is enabled by a decision rule formulated in terms of an averaged distance between the observation vector and a cluster-based noise model. The algorithm benefits from using contextual information, a strategy that considers not only a single speech frame but also a neighborhood of data in order to smooth the decision function and improve speech detection robustness. The proposed scheme exhibits reduced computational cost making it adequate for real time applications, i.e., automated speech recognition systems. An exhaustive analysis is conducted on the AURORA 2 and AURORA 3 databases in order to assess the performance of the algorithm and to compare it to existing standard voice activity detection (VAD) methods. The results show significant improvements in detection accuracy and speech recognition rate over standard VADs such as ITU-T G.729, ETSI GSM AMR, and ETSI AFE for distributed speech recognition and a representative set of recently reported VAD algorithms.  相似文献   

17.
语音存在概率的估计是语音增强的核心技术之一,针对传统的存在概率估计方法是启发式的,没有把存在概率的估计统一到一个理论框架之中,不能保证估计最优,提出了一种基于序贯隐马尔可夫模型(SHMM)的存在概率估计方法,在每一子带上构建一个SHMM模型描述对数功率谱包络的时间序列,把谱包络序列看作一个在语音和噪声状态之间转移的动态一阶马尔可夫链,采用单高斯函数构建每一状态的概率模型,语音状态的后验概率即为语音信号的存在概率。为了满足算法实时性要求,SHMM参数估计简化为一阶回归过程,根据极大似然准则逐帧更新模型参数。实验表明:SHMM所描述的时序相关性对存在概率的估计起到关键作用,它优于一般的启发式估计方法;SHMM算法的语音增强分段信噪比(SegSNR)和对数谱失真(LSD)性能优于经典的改进型最小统计量控制递归平均(IMCRA)算法。  相似文献   

18.
汉语文语转换系统中可训练韵律模型的研究   总被引:4,自引:1,他引:4  
针对汉语的韵律特征受语境参数影响时,表现出层次性的特点,本文描述了一种带特殊加权因子和输出优化功能的人工神经网络,并用其来构筑汉语TTS系统的韵律模型。大量测试表明,该人工神经网络的拓扑结构相较传统的人工神经网络模型更能反映出汉语的韵律特点。它提高了模型本身的收敛速度和运算精度,从而改善了整个韵律模型的质量。同时,本文还对汉语音节的基频曲线进行了规格化处理,较详细的分析了音节基频规格化参数-SPiS,在基频调节中的作用和方式。SPiS参数能够反映出汉语的声调特点,且方便了网络模型的建立和汉语韵律的控制。  相似文献   

19.
朱明  王殊  王菽韬  夏东海 《物理学报》2008,57(9):5749-5755
研究了声波通过混合气体时,复合弛豫声吸收和声速与气体各成分浓度和声频率之间的依赖关系.以一氧化碳气体、水蒸气、氮气和氧气的混合气体为例,分别建立了弛豫声吸收和声速与气体浓度的三维模型,以及弛豫声吸收与声频率的二维模型.完成了通过测量弛豫声吸收和声速计算一氧化碳气体浓度的算法推导,提出了一种依据弛豫声吸收和声速检测气体浓度的简化算法.仿真实验结果不仅证明了算法的理论可行性,还给出了算法的最佳适用声频率范围,并估计了将算法应用于实验的误差原因,证明了算法具有实际可行性. 关键词: 气体浓度声学检测 一氧化碳浓度检测 复合弛豫声吸收 声速  相似文献   

20.
针对非水平边界条件近似处理所引起的双向声场能量不守恒问题,提出了一种满足能量守恒的双向耦合简正波模型。在模型的建立中,严格考虑了非水平界面的边界条件,并依据介质运动方程和连续性方程推导了便于数值计算的耦合微分方程。同时,结合能量守恒方程和不同海底参数下的数值计算,对模型的准确性进行了研究。由理论分析和仿真结果可以看出,边界条件的严格考虑将导致边界修正项的产生,而边界修正项的引入确保了耦合系数满足对称性和模型所计算的声场满足能量守恒,使得本模型可充分考虑边界水平变化对声场的影响,实现非水平分层波导中双向场的准确计算。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号