首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 921 毫秒
1.
第四讲语音信号处理的现状和展望   总被引:1,自引:0,他引:1  
李昌立 《物理》2005,34(4):300-306
文章简要介绍了“语音信号处理”这一分支学科形成和发展的历史过程.指出了它在现代信息科学技术中的地位和作用.介绍了语音信号处理在应用领域的一些重要课题,如语音的低速率编码,语音的规则合成和文一语转换系统,语音识别和人一机语音对话等,这些仍然是当前研究的热点.文章最后展望了语音信号处理的发展前景,指出在这个领域还有很多难题等待人们去研究探索.  相似文献   

2.
A novel method based on a statistical model for the fundamental-frequency (F0) synthesis in Mandarin text-to-speech is proposed. Specifically, a statistical model is employed to determine the relationship between F0 contour patterns of syllables and linguistic features representing the context. Parameters of the model were empirically estimated from a large training set of sentential utterances. Phonologic rules are then automatically deduced through the training process and implicitly memorized in the model. In the synthesis process, contextual features are extracted from a given input text, and the best estimates of F0 contour patterns of syllable are then found by a Viterbi algorithm using the well-trained model. This method can be regarded as employing a stochastic grammar to reduce the number of candidates of F0 contour pattern at each decision point of synthesis. Although linguistic features on various levels of input text can be incorporated into the model, only some relevant contextual features extracted from neighboring syllables were used in this study. Performance of this method was examined by simulation using a database composed of nine repetitions of 112 declarative sentential utterances of the same text, all spoken by a single speaker. By closely examining the well-trained model, some evidence was found to show that the declination effect as well as several sandhi rules are implicitly contained in the model. Experimental results show that 77.56% of synthesized F0 contours coincide with the VQ-quantized counterpart of the original natural speech. Naturalness of the synthesized speech was confirmed by an informal listening test.  相似文献   

3.
The well-known approximate separation of low molecular vibrational frequencies from high frequencies has in the past been combined with the Redlich-Teller isotopic product rule to split the latter into separate rules for the high and for the low frequencies (reduced isotopic product rules). A number of variants and properties of these rules are discussed, along with some modifications applicable when redundant coordinates are involved. A discussion is given of the use of nonbonded distances instead of the related bond angles, and the employment of computer programs which calculate normal frequencies from postulated force constants. A comparison is made of the relative advantages of using these reduced product rules for testing assignments of observed vibration frequencies as opposed to the use of frequencies or frequency product ratios calculated from transferred force constants. Finally benzene, ethylene, methyl isocyanide, and cyclohexane are worked out as examples and some suggestions made of possible remaining uncertainties in the assigned fundamental frequencies.  相似文献   

4.
This paper reports the results of an investigation that employed the modified rhyme test (MRT) to measure the segmental intelligibility of synthetic speech generated automatically by rule. Synthetic speech produced by ten text-to-speech systems was studied and compared to natural speech. A variation of the standard MRT was also used to study the effects of response set size on perceptual confusions. Results indicated that the segmental intelligibility scores formed a continuum. Several systems displayed very high levels of performance that were close to or equal to scores obtained with natural speech; other systems displayed substantially worse performance compared to natural speech. The overall performance of the best system, DECtalk--Paul, was equivalent to the data obtained with natural speech for consonants in syllable-initial position. The findings from this study are discussed in terms of the use of a set of standardized procedures for measuring intelligibility of synthetic speech under controlled laboratory conditions. Recent work investigating the perception of synthetic speech under more severe conditions in which greater demands are made on the listener's processing resources is also considered. The wide range of intelligibility scores obtained in the present study demonstrates important differences in perception and suggests that not all synthetic speech is perceptually equivalent to the listener.  相似文献   

5.
In the past 10 years a Chinese text-to-speech system including aphonetic library,static tone model and basic synthesis rules had been estab-lished in IAAS.The Chinese synthesis of unrestricted vocabulary had beenachieved,but further steps must be taken to improve the naturalness ofsynthesized Chinese.The effect of segmental and suprasegmental features ofsynthetic speech upon naturalness have been studied by use of subjective as-sessment method.The results show that the rhythm in time domain andcoarticulation occupy a basic position for improving the naturalness of synthet-ic speech.And the fundamental frequency curve decided by tone model onlysuit to synthesize short sentence of Chinese.If the synthesis of larger linguisticunit than simple sentence is considered,the fundamental frequency curveshould be carefully manipulated.This paper presents the experimental methodand results,and discusses the way how to improve the naturalness of syntheticChinese.  相似文献   

6.
拉巴顿珠  珠杰  欧珠  尼玛 《应用声学》2023,42(2):324-332
近年来,得益于计算机运算能力的提高和语音数据的不断积累,涌现出许多基于机器学习的语音处理新技术,其中基于深度神经网络算法,端到端的Tacotron2语音合成系统框架得到业界广泛的青睐。它是一个开源程序,简单易行,已成功地应用于多种语言和不同音色的语音合成。该文研究Tacotron2在藏语中的应用,取得了良好的实验结果。首先,通过自然语音采集、自动标注、声学分析等构建了一个中等规模(5500句)藏语卫藏方言的语音语料库,其中包括藏文音素转写、特殊符号处理和Mel谱等各项数据;其次,利用开源程序Tacotron2和上述语音库进行了藏语语音合成试验;最后,通过对合成语音和自然语音的偏差分析,和对合成语音的自然度的主观评价,表明了基于端到端的藏语语音合成方法有效地减少合成语音的频谱蜕变,提升了合成语音的自然度。因此,基于“端到端”的Tacotron2合成框架在藏语语音合成中具有重要的应用价值,值得进一步研究和推广应用。  相似文献   

7.
On the basis of the algebraic version of the resonating-group method (RGM) and within the framework of the discrete representation in the Fock-Bargmann space, a microscopic theory of nuclear reactions with due regard for a coexistence of different cluster configurations in a compound nucleus is realized. Fundamental tenets of the algebraic version of the RGM are stated both for a single binary cluster configuration and for a compound system, where several cluster configurations coexist. Several examples of norm kernels, their eigenvalues, phase shifts, and effective cross sections are given for a number of binary cluster systems. The text was submitted by the authors in English.  相似文献   

8.
A technique to synthesize laughter based on time-domain behavior of real instances of human laughter is presented. In the speech synthesis community, interest in improving the expressive quality of synthetic speech has grown considerably. While the focus has been on the linguistic aspects, such as precise control of speech intonation to achieve desired expressiveness, inclusion of nonlinguistic cues could further enhance the expressive quality of synthetic speech. Laughter is one such cue used for communicating, say, a happy or amusing context. It can be generated in many varieties and qualities: from a short exhalation to a long full-blown episode. Laughter is modeled at two levels, the overall episode level and at the local call level. The first attempts to capture the overall temporal behavior in a parametric model based on the equations that govern the simple harmonic motion of a mass-spring system is presented. By changing a set of easily available parameters, the authors are able to synthesize a variety of laughter. At the call level, the authors relied on a standard linear prediction based analysis-synthesis model. Results of subjective tests to assess the acceptability and naturalness of the synthetic laughter relative to real human laughter samples are presented.  相似文献   

9.
A large number of single-channel noise-reduction algorithms have been proposed based largely on mathematical principles. Most of these algorithms, however, have been evaluated with English speech. Given the different perceptual cues used by native listeners of different languages including tonal languages, it is of interest to examine whether there are any language effects when the same noise-reduction algorithm is used to process noisy speech in different languages. A comparative evaluation and investigation is taken in this study of various single-channel noise-reduction algorithms applied to noisy speech taken from three languages: Chinese, Japanese, and English. Clean speech signals (Chinese words and Japanese words) were first corrupted by three types of noise at two signal-to-noise ratios and then processed by five single-channel noise-reduction algorithms. The processed signals were finally presented to normal-hearing listeners for recognition. Intelligibility evaluation showed that the majority of noise-reduction algorithms did not improve speech intelligibility. Consistent with a previous study with the English language, the Wiener filtering algorithm produced small, but statistically significant, improvements in intelligibility for car and white noise conditions. Significant differences between the performances of noise-reduction algorithms across the three languages were observed.  相似文献   

10.
Printed English is highly redundant as demonstrated by readers' facility at guessing which letter comes next in text. However, such findings have been generalized to perception of connected speech without any direct assessment of phonemic redundancy. Here, participants guessed which phoneme or printed character came next throughout each of four unrelated sentences. Phonemes displayed significantly lower redundancy than letters, and possible contributing factors (task difficulty, experience, context) are discussed. Of three models tested, phonemic guessing was best approximated by word-initial and transitional probabilities between phonemes. Implications for information-theoretic accounts of speech perception are considered.  相似文献   

11.
A systematic study of nontrivial cubic extensions of the Poincaré algebra in four dimensions is undertaken. Explicit examples are given with various techniques (Young tableau, characters, etc.). The text was submitted by the author in English.  相似文献   

12.
A recently introduced set of N-dimensional quasi-maximally superintegrable Hamiltonian systems describing geodesic motions that can be used to generate “dynamically” a large family of curved spaces is revisited. From an algebraic viewpoint, such spaces are obtained through kinetic energy Hamiltonians defined on either the sl(2) Poisson coalgebra or a quantum deformation of it. Certain potentials on these spaces and endowed with the same underlying coalgebra symmetry have also been introduced in such a way that the superintegrability properties of the full system are preserved. Several new N = 2 examples of this construction are explicitly given, and specific Hamiltonians leading to spaces of nonconstant curvature are emphasized. The text was submitted by the authors in English.  相似文献   

13.
Edge-detection algorithm based on DCT continuous extension technique   总被引:1,自引:0,他引:1  
A new computational approach to the edge-detection problem, based on the continuous extension of discrete cosine transform (CEDCT) technique is proposed. This technique has some attractive properties, and other things being equal, it has more precise results than the usual discrete Fourier or discrete cosine transforms, especially at the intermediate points. That is why this technique allows one to estimate numerically a finite number of a derivatives of a discrete set of multidimensional points, using some specified properties of CEDCT. Because of using the spectrum of a given set of points, this approach is applicable to a wide area of signal-and image-processing problems. The results obtained by the proposed approach are compared with the well-known and widely used Canny algorithm. Some 1D and 2D numerical examples are given. The text was submitted by the authors in English.  相似文献   

14.
We consider the exactly solvable model of interaction of zero-duration electromagnetic pulses with an atom. The model has a number of peculiar properties which are outlined in the cases of a single pulse and two opposite pulses. In perspective, it can be useful in different fields of physics involving interaction of attosecond laser pulses with quantum systems. The text was submitted by the authors in English.  相似文献   

15.
During the last decade, the field of quantum computation has attracted a lot of interest and motivated many theoretical and experimental studies of n-qubit quantum systems. But apart from the promise of more efficient quantum algorithms, these investigations also revealed a number of obstacles which still have to be overcome in practice. In this context, the use of simulation programs has proved to be an appropriate method. In order to facilitate the simulation of n-qubit quantum systems, we present the Feynman software program to provide the necessary tools to define and to deal with quantum registers as well as the operators acting on them. Using an interactive design within the framework of the computer algebra system Maple, we hope that the Feynman software program will be useful not only for teaching the basic elements of quantum computing but also for studying their physical realization in the future. The text was submitted by the authors in English.  相似文献   

16.
I.IntroductionResearchesonChinesesynthesisdisclosethatonlywhenboththesegmentalandsupraseg-melltalfeaturesofthesyntheticspeecharesimilartothoseofthellaturalone,thesyntheticspeechwillsoundintelligibleandnatural[1].Amongekistingsynthetictechniques,theapproachbasedonacousticparametersca-nadustboththesegmentalandsuprasegmentalfeaturesofsyntheticunitsfiekiblyandcanbeconsideredasthemostreasonablesynthetictechniqueintheory.However,theparameterbasedsynthesizerisoverAfependentonthedevelopmentsofparamet…  相似文献   

17.
Recently a method has been developed by Jen to enumerate limit cycles in cellular automata (CA) with periodic boundary conditions. This involves operations on a connectivity matrix whose elements are related to the invariance of a site in a particular neighborhood to application of the CA rule. We extend this method to the case of fixed boundary conditions, of interest in simulations. In this case, translational invariance is lost, and the enumeration procedure is much more tedious than with periodic boundary conditions. We show examples for a fixed-point, a period-two, and a period-three enumeration in considerable detail, and give results-in agreement with simulations—for the number of fixed points and period-two cycles in selected two-state, nearest-neighbor CA rules.  相似文献   

18.
A procedure for taking the irreducible representations of subperiodic rod groups from tables of irreducible representations of three-periodical space groups is derived. Examples demonstrating the use of this procedure and derivation of selection rules for direct and phonon assisted electrical dipole transitions are presented. The text was submitted by the authors in English.  相似文献   

19.
采用心理统计方法对中等规模语料库进行分析,探讨句法、韵律及其声学相关物之间的关系,根据汉语口语常规重音分布的规律,研究普通话常规重音分布规则及其在实际话语中应用的先后次序,最终建立适用于汉语文语转换系统的常规重音分布规则系统。  相似文献   

20.
The approach is based on a paradigm of self-organized criticality proposed for experimental investigation and theoretical modeling of software evolution. The dynamics of modifications is studied for three free, open source programs Mozilla, Free-BSD, and Emacs using the data from version control systems. Scaling laws typical for the self-organization criticality are found. The model of software evolution presenting the natural selection principle is proposed. The results of numerical and analytical investigation of the model are presented. They are in good agreement with the data collected for the real-world software. The text was submitted by the authors in English.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号