首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 93 毫秒
1.
说话人辨认中有效参数的研究   总被引:2,自引:0,他引:2       下载免费PDF全文
说话人辨认技术是语音识别中的一项重要应用,在我们研究的系统中,LPC参数并不是都很有效,我们用统计分析方法对12个预测系数、12个部分相关系数、12个对数面积比系数、12个倒谱系数、12个相关系数、短时能量、短时平均过零率及基音共63个参数,进行方差比检验,用10名男青年的三个元音的发音,在半年内采集97种语音作为试验材料,选出15个方差比比较大的作为识别参数,识别率为89.19%,采用样本刷新技术后,识别率达到97.3%。  相似文献   

2.
张全 《应用声学》2002,21(1):35-39
本文涉及语言声学的语音学研究、文语转换技术、语音识别技术及自然语言处理等方面,简要介绍了有关汉语在前三方面的进展和发展方向;重点介绍了面向整个自然语言理解处理的新理论一概念层次网络(HNC)理论的主要内容及其进展,试图在理论层面上给出HNC理论的基本概貌。  相似文献   

3.
汉语和外语的基频模式的对比研究   总被引:3,自引:1,他引:2  
利用声门电阻抗仪(Laryngograph)精确地提取嗓音基频,在宏观(篇章)和微观(基音周期)两个层次上,对汉语和英语、德语、日语的基频模式进行了对比研究。发音人是14名中国人,14名外国人。试验结果表明:1.90%音域汉语与外语无大差别;2.连续言语的基频正负颤动对于不同的语言和男女发音人来说,是差不多相同的;3、在连续言语中负颤动因数大于正颤动因数;4.以音节每秒计的说话速度,中国人要比外国人低。  相似文献   

4.
本文讨论了反剪切位形的特性和研究近况。从MHD稳定性和自举电流份额之间折衷考虑推导出它们之间的相互关系并得到先进托卡马克运行空间图,找到了合理的FEB反剪切位形(FEB-RS)工作范围及相应的温度、密度分布,计算了FEB-RS的堆芯物理参数。研究表明,在同样尺寸下堆芯等离子体性能大大提高。  相似文献   

5.
MPEG-4静态纹理BQ模式编码算法的改进   总被引:1,自引:0,他引:1  
张亚妮  苗润才 《光子学报》2005,34(10):1593-1596
根据MPEG-4静态纹理编码工具中所定义的解码过程,建立了与标准相容的静态纹理编解码框架,然后从零树符号的信息含义及其与编码层目标图像的一致性出发,研究MPEG-4 VTC工具中位平面模式编码的改进问题,提出一种基于符号分解的预测性嵌入式小波零树编码方法,并用SP-PEZW方法对彩色参考图像Lena进行了压缩和解码实验研究.实验表明,SP-PEZW方法在低比特率编码层上比PEZW有更好的压缩效率,同时对高分辨率编码层压缩比并无大的影响,甚至在高分辨率空间层以指定码率解码时图像质量略有提高.  相似文献   

6.
普通话带鼻尾零声母音节中的协同发音   总被引:1,自引:0,他引:1       下载免费PDF全文
普通话有/n/和/η/两种鼻尾.带鼻尾的零声母音节用(V)VN表示.有8种(V)Vn,8种(V)Vη.本研究用的实验材料是由15位男人念的带声调的所有(V)VN音节.本实验看到,(V)VN中元音部分终点的舌位不仅受到/n/和/η/不同发育部位的逆向协同发音作用,而且受到(V)V中不同主要元音的顺向协同发音作用。普通话跟英语等语言一样,鼻辅音前面元音共振峰过渡是区分/n/和/η/的最重要依据.根据(V)VN中(V)V终点或/和起点共振峰频率,可以把其(V)V在一定程度上跟其在共振峰模式相似的单元音,或复合元音加以区分.本实验还看到,(V)V和N的时长都受到声调的协同发音作用.上声使(V)V最长,阳平的次之,阴平和去声的最短;阴平,阳平和上声的/n/和/η/时长大致相同,它们都比去声的长;上声使(V)VN最长,去声的最短,阳平比阴平的长.本实验还看到,鼻韵尾时长以及(V)V/N的时长比值与其前面主要元音为低的和非低的特征有关.  相似文献   

7.
M矮星是银河系中最普遍的恒星,它们的运动状况能提供银河系演化的线索,视向速度(RV)是反映M矮星运动状况的重要参数之一。我国的大科学工程LAMOST巡天项目已经获得了数十万M型星光谱,测量这些恒星的视向速度需要自动、高效的程序。计算M矮星视向速度的一般方法是将观测光谱与模板光谱进行交叉相关得出。然而在实际处理过程中,由于本质上的不同或者噪声的影响,一些观测光谱和模板光谱错误匹配,从而使得这些光谱的视向速度测量产生偏差。为了减少噪声等因素的影响,对于信噪比较高、但局部有较强噪声的光谱,采用统计与经验特征相结合的方法选取光谱中的有效特征段、避开噪声污染的波段计算M型星的视向速度。利用该方法对LAMOST DR3 M型星星表中的部分实测光谱测量了视向速度,将之与APOGEE星表中的对应视向速度进行了对比。结果表明该方法有效地减少了局部噪声对视向速度的影响,提高了视向速度测量的准确率。  相似文献   

8.
全球导航卫星系统(Global Navigation Satellite System,GNSS)时间传递技术以其低成本、高精度、广覆盖范围等特点,广泛应用到高精度时频领域。传统卫星共视技术利用全球卫星导航时间比对标准(Common GNSS Generic Time Transfer Standard,CGGTTS)共视文件实现事后高精度时间传递,很难实现实时时间传递。为满足数字换流站、电力物联网、移动通信等对实时、高精度时间传递的需求,研究了基于北斗三号全球卫星导航定位系统(BDS-3)伪距观测数据的实时卫星共视技术,开展了短基线和西安-三亚长基线北斗实时卫星共视时间传递实验来评估实时共视时间传递性能。实验结果表明北斗实时卫星共视时间传递精度优于1 ns,可为时频系统、数字换流站等应用领域提供纳秒级时间同步和纳秒级时间溯源服务。  相似文献   

9.
采用一种直观解析法,从研究高速运动细棒的视状出发来说明在相对论意义下运动物体(如正方体,球体等)的视状畸变,结果表明,高速运动物体的视状一般不仅仅是发生“表现旋转”而是依具体的条件而发生不同的形变,在特殊条件下,洛伦兹收缩也是可见的。  相似文献   

10.
狄慧鸽  钱芸生  赵爽 《光学学报》2005,25(10):357-1360
微光夜视仪的最大作用距离是它主要的技术指标和进行系统设计的主要依据。根据前人的研究,系统总结了微光夜视系统的视距公式。为改变微光夜视系统易受外界影响的缺点,一般需要再加入激光助视系统。加入激光助视系统之后光源的类型发生变化,视距公式也会随之发生改变,其中最主要的是光谱匹配系数的变化,还有照度、对比度、反射率等都随之发生变化。详细分析了加入激光之后视距公式中各因子发生的变化,为合理选择激光类型提供理论参考。并且估算了头盔式微光夜间驾驶仪(其中所用光电阴极为超二代管)在激光助视下的视距。验证了选取激光类型的合理性。  相似文献   

11.
Speech waveform envelope cues for consonant recognition   总被引:4,自引:0,他引:4  
This study investigated the cues for consonant recognition that are available in the time-intensity envelope of speech. Twelve normal-hearing subjects listened to three sets of spectrally identical noise stimuli created by multiplying noise with the speech envelopes of 19(aCa) natural-speech nonsense syllables. The speech envelope for each of the three noise conditions was derived using a different low-pass filter cutoff (20, 200, and 2000 Hz). Average consonant identification performance was above chance for the three noise conditions and improved significantly with the increase in envelope bandwidth from 20-200 Hz. SINDSCAL multidimensional scaling analysis of the consonant confusions data identified three speech envelope features that divided the 19 consonants into four envelope feature groups ("envemes"). The enveme groups in combination with visually distinctive speech feature groupings ("visemes") can distinguish most of the 19 consonants. These results suggest that near-perfect consonant identification performance could be attained by subjects who receive only enveme and viseme information and no spectral information.  相似文献   

12.
Although many audio-visual speech experiments have focused on situations where the presence of an incongruent visual speech signal influences the perceived utterance heard by an observer, there are also documented examples of a related effect in which the presence of an incongruent audio speech signal influences the perceived utterance seen by an observer. This study examined the effects that different distracting audio signals had on performance in a color and number keyword speechreading task. When the distracting sound was noise, time-reversed speech, or continuous speech, it had no effect on speechreading. However, when the distracting audio signal consisted of speech that started at the same time as the visual stimulus, speechreading performance was substantially degraded. This degradation did not depend on the semantic similarity between the target and masker speech, but it was substantially reduced when the onset of the audio speech was shifted relative to that of the visual stimulus. Overall, these results suggest that visual speech perception is impaired by the presence of a simultaneous mismatched audio speech signal, but that other types of audio distracters have little effect on speechreading performance.  相似文献   

13.
MPEG_4作为一种适合各种多媒体应用的图像压缩编码标准,已经得到广泛应用。介绍了I帧、P帧、B帧的运动图像编码。论述了影响图像恢复质量的几个因素,包括以下几个方面:DCT(Discrete cosine Transform)离散余弦变换;QP(Quantize Parameter)量化系数;SR(Search Range)运动估计中搜索范围以及半像素搜索(half_pixel searching)等。通过压缩比,峰值信噪比,编解码时间等大量实验数据比较各因素影响大小。实验证明量化系数对图形恢复质量影响最大,其余几个因素也有一定影响。  相似文献   

14.
单通道语音增强算法对汉语语音可懂度影响的研究   总被引:1,自引:0,他引:1  
考察了当前常用的几种单通道语音增强算法对汉语语音可懂度的影响。受不同类型噪音干扰的语音经过5种单通道语音增强算法的处理后,播放给具有正常听力水平的被试进行听辩,考察增强后语音的可懂度。实验结果表明,语音增强算法并不能改进语音的可懂度水平;通过分析具体的错误原因,发现听辩错误主要来自于音素错误,与声调关系不大;而且,同英文的辨识结果相比,一些增强算法对于中、英文可懂度影响差异显著。  相似文献   

15.
Both dyslexics and auditory neuropathy (AN) subjects show inferior consonant-vowel (CV) perception in noise, relative to controls. To better understand these impairments, natural acoustic speech stimuli that were masked in speech-shaped noise at various intensities were presented to dyslexic, AN, and control subjects either in isolation or accompanied by visual articulatory cues. AN subjects were expected to benefit from the pairing of visual articulatory cues and auditory CV stimuli, provided that their speech perception impairment reflects a relatively peripheral auditory disorder. Assuming that dyslexia reflects a general impairment of speech processing rather than a disorder of audition, dyslexics were not expected to similarly benefit from an introduction of visual articulatory cues. The results revealed an increased effect of noise masking on the perception of isolated acoustic stimuli by both dyslexic and AN subjects. More importantly, dyslexics showed less effective use of visual articulatory cues in identifying masked speech stimuli and lower visual baseline performance relative to AN subjects and controls. Last, a significant positive correlation was found between reading ability and the ameliorating effect of visual articulatory cues on speech perception in noise. These results suggest that some reading impairments may stem from a central deficit of speech processing.  相似文献   

16.
Visual information from a speaker's face profoundly influences auditory perception of speech. However, relatively little is known about the extent to which visual influences may depend on experience, and extent to which new sources of visual speech information can be incorporated in speech perception. In the current study, participants were trained on completely novel visual cues for phonetic categories. Participants learned to accurately identify phonetic categories based on novel visual cues. These newly-learned visual cues influenced identification responses to auditory speech stimuli, but not to the same extent as visual cues from a speaker's face. The novel methods and results of the current study raise theoretical questions about the nature of information integration in speech perception, and open up possibilities for further research on learning in multimodal perception, which may have applications in improving speech comprehension among the hearing-impaired.  相似文献   

17.
To determine whether expert fluency ratings of read speech can be predicted on the basis of automatically calculated temporal measures of speech quality, an experiment was conducted with read speech of 20 native and 60 non-native speakers of Dutch. The speech material was scored for fluency by nine experts and was then analyzed by means of an automatic speech recognizer in terms of quantitative measures such as speech rate, articulation rate, number and length of pauses, number of dysfluencies, mean length of runs, and phonation/time ratio. The results show that expert ratings of fluency in read speech are reliable (Cronbach's alpha varies between 0.90 and 0.96) and that these ratings can be predicted on the basis of quantitative measures: for six automatic measures the magnitude of the correlations with the fluency scores varies between 0.81 and 0.93. Rate of speech appears to be the best predictor: correlations vary between 0.90 and 0.93. Two other important determinants of reading fluency are the rate at which speakers articulate the sounds and the number of pauses they make. Apparently, rate of speech is such a good predictor of perceived fluency because it incorporates these two aspects.  相似文献   

18.
In this paper, we proposed a novel approach to enhance the compression rate of integral images by combined use of the residual images generated from the sub-images and the MPEG-4 algorithm. In the proposed method, elemental images picked up from a three-dimensional object are transformed into sub-images, and these sub-images are sequentially rearranged with a spiral scanning topology and the first sub-image is assigned as the reference image. Then, by sequentially computing the differences between the reference image and other consecutive sub-images, a sequence of residual images is generated. Here, the residual images together with the reference image are modeled as the consecutive video frames just like a conventional moving picture. Finally, these residual images are compressed with the MPEG-4 algorithm. Experimental results show that compression efficiency of the proposed method has been improved up to 61.56% as compared to those of the JPEG-based compression scheme and up to 151.54% as compared to those of the conventional method averagely.  相似文献   

19.
The question of whether visual information can affect ongoing speech production arises from numerous studies demonstrating an interaction between auditory and visual information during speech perception. In a preliminary study, the effect of delayed visual feedback on speech production was examined. Two of the 13 subjects demonstrated speech errors that were directly related to the delayed visual signal. However, in the main experiment, providing immediate visual feedback of the articulators did not diminish the effects of delayed auditory feedback for 11 speakers.  相似文献   

20.
目的 当前社会新型犯罪中电信诈骗案件频发,急需一种能够自动有效区分语音真伪的方法。为进一步增强目前深度学习领域识别合成语音的能力,为保障语音信息安全提供技术上的支持。方法 本文针对合成语音声学特性上异于真实语音的特点,分析对比合成语音和真实语音的声学特性,设计了一种声学特征RMSA量化语音音强变化程度,结合FFV和SNS声学特征进行融合,量化了声学特性差异,聚焦了合成语音中关键声学信息。结果 在神经网络模型中融合输入声学特征,在FoR数据集的验证集上得到了0.6%的等错误率,在测试集上最好结果达到了10.8%的等错误率。结论 成功实现了对合成语音的识别,证实了声学特征的有效性和本文研究方案的可行性,在一定程度上拓宽了合成语音特征设计的研究思路。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号