期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

孙金城倪宏莫福源李昌立《应用声学》1995,14(3):35-41

本文对普通话书面语中声母、韵母的动态与静态分布特性及其差异作了统计分析，结果表明：普通话声母间的、韵母的动态与表态的相对分布关系一致，语音间的相对分布主要与发声系统有关，不受频度的影响。普通话声母、韵母的动态与静态的出现率差异，与声母发音方法和韵线组合结构、声母发音部位与韵母四呼的配合关系、音节的成字率和字的频度有关，主要受送气与不送气声母、韵母的动态与静态的出现率差异最大，多音节词中的韵母的动态相似文献

2.

汉语普通话区别特征系统 总被引：3，自引：2，他引：1

张家騄《声学学报》2005,(6)

语音区别特征是构成语音信号的基本元素。它不但是语音学和音系学研究的重要内容,也是语音信号处理技术所面对的重要处理对象。本文首先根据汉语普通话语音知觉混淆的群集分析结果,确定汉语的声韵调音位系统。继而按照Jakobson,Fant和Halle的语音区别特征划分的二分法原理,以声学参数为基础结合普通话的特点,建立了声韵调体系的区别特征系统。文中给出了区别特征在声学上的定义和发音生理上的说明。为便于理解声母韵母音位与其区别特征之间的关系和在语音处理中的应用,在附录中还列出了普通话声母(包括变体)、韵母的国际音标和计算机可输入/输出的SAMPA音标,以及声调的SAMPROSA音标。相似文献

3.

采用损失函数和声学特征切分声韵母的方法

下载免费PDF全文

李皓唐朝京《声学学报》2012,37(3):339-345

为实现鲁棒的声韵母切分,以满足大词汇量连续语音识别系统的需求,提出一种建立损失函数,并利用浊音的“准”周期性和声母时长进行声韵母切分的方法。首先计算语音的自相关函数,接着建立代价损失函数,对计算结果采用动态规划方法检测浊音,然后根据声母时长分布规律确定声母的检测范围,最后在检测范围内对浊音段起始点前后采用听觉事件检测方法分割出声韵母。实验结果表明,采用动态规划方法相对于阈值方法提高了浊音段的检测性能,在浊音段的基础上对声韵母进行切分能够提高切分的正确率,减少噪声及汉语音变现象的影响,切分性能受声母发音方式影响较小。相似文献

4.

结合卷积神经网络与混响时间注意力机制的混响抑制

下载免费PDF全文

孙兴伟李军锋颜永红《声学学报》2021,46(6):1234-1241

提出一种结合卷积神经网络的编解码器模型和混响时间注意力机制的混响抑制算法,该算法通过编解码器模型实现混响抑制,并利用混响时间注意力机制克服混响环境变化对混响抑制效果的影响。该算法在编码器中使用具有不同大小的卷积核来处理混响语音幅度谱,从而获得包含多尺度上下文信息的编码特征;通过引入注意力模块,实现在不同的混响时间环境中选择性地使用不同权重的编码特征生成加权特征;最后,在解码器中使用加权特征来重建混响抑制后的语音信号幅度谱。在模拟和真实的混响环境下,该算法相对于基线系统在语音混响调制能量比上分别取得了0.36 dB和0.66 dB的提升。实验结果表明,该算法可以适应不同混响环境的变化,相对基线系统在真实混响环境下具有更高的鲁棒性。相似文献

5.

汉语普通话区别特征系统树状图 总被引：9，自引：2，他引：7

张家騄《声学学报》2006,31(3):193-198

本文是“汉语普通话区别特征系统”的继续与发展。文中对汉语区别特征的定义及其在语音中的具体的物理的和心理的体现作了进一步的说明。对汉语普通话区别特征表进行了修订。对发音可能有动程的单韵母／er／做了专门的讨论。为了更直观地表现各音位间区别特征的关系,同时也为了便于语音工程的应用,分别设计绘出了普通话声母和韵母的树状图。文中对音位区别特征的分配及其间的关系作了说明。相似文献

6.

全音节汉语语音识别系统的声学模型研究

关存太陈永彬吴伯修《声学学报》1994,(5)

本文在分析了汉语单音节发音的音节结构的基础上，定义了基于声韵母类的语音识别单元，从声学角度确定了用于汉语全音节识别系统的声学单元，并研究了这些声学单元的检测一致性及基于这些单元的识别系统的鲁棒性。文中还对大量的发音人的声母类发音的长度作了统计，给出了基于本文给出的分割算法和本文定义的声母单元下的声母长度预分类方法。通过在非特定人全音节汉语语音识别系统上的应用表明，本文定义的语音识别单元具有很高的检测一致性，建立在其上的识别，系统也具有很高的鲁棒性；文中给出的预选方法在最好的情况下，可以减少一半以上的运算量，而预选精度几乎达到１００％。相似文献

7.

汉语普通话语音数据库

孙金城陈希清李昌立莫福源倪宏李彤《声学学报》1991,(6)

中国科学院声学研究所建立了一个汉语普通话语音数据库,这个语音数据库由声母、韵母、1282个单音节、几百个双音词和三音词、语音试验句、短文及数字0—9等构成。该语音数据库的发音人有六位(三男三女),他们是广播学院的教师和职业播音员,讲标准的汉语普通话。语音材料录制在高质量的磁带上,其中有一部分已数字化。已有许多汉语语音研究部门使用该语音数据库。相似文献

8.

基于网格维数的汉语语音分形特征研究 总被引：4，自引：1，他引：3

陈国胡修林曹鹏朱耀庭《声学学报》2001,(1)

应用分形理论来研究汉语语音信号的分形特征。木文首先在传统盘维数基础上提出了一种等差尺度网格维数算法来快速计算语音信号的分维数;在此基础上,对汉语男女声的２１种声母和３８种韵母语音信号的分维数进行了计算和统计分析,得到了汉语语音分维数的统计分布规律。本文实验结果表明,汉语语音信号具有分形标度不变性,网格维数能够反映语音信号波形的复杂程度。相似文献

9.

一种基于音素模型感知度的发音质量评价方法 总被引：1，自引：1，他引：0

张茹韩纪庆《声学学报》2013,38(2):201-207

为了提高发音质量判别精度,提出了一种基于音素模型感知度的发音质量评价方法。它采用不同语音样本集合下样本声学特征的对数后验概率期望差作为音素模型对变异发音的感知度,并以此为基础,生成各音素对应的识别模型候选集。实验表明,所提出的方法使语音识别网络候选音素模型集合尺寸减少约95%;在非母语语音数据库上,该方法评分与人工专家打分相关性为0.828,基于该方法得到的声韵母错误检出率为70.8%,声调错误检出率为42.5%,均优于其它方法。相似文献

10.

基于过零点间时间间隔对P，T，K，Z，ZH，H等六个声母的识别

欧贵文《声学学报》1994,(3)

语音波形过零率是一个很重要的语音特征．它通常是语音端点判别和音素粗分类的基本要素．但是，它也有不稳定的一面，所以很少看到有人将它用到语音细分类和识别中．本文提出了把一串过零点间时间间隔作为辨认普通话声母的语音特征的方法．本文做了以过零点间时间间隔作为ＨＭＭ的输入符号，对送气塞音类声母Ｐ、Ｔ、Ｋ，不送气塞擦音Ｚ、ＺＨ和喉擦音Ｈ等六个声母进行辨认实验．在连机的环境下，我们取得７８％的声母正确辨认率．以过零点间时间间隔作为语音特征的优点是计算简单，易于实现．此外，它可以不经转换直接作为离散ＨＭＭ的输入符号．本文介绍过零点间时间间隔的计算方法和使用ＨＭＭ对上述六个声母进行识别的实验过程．相似文献

11.

The distinctive features for standard Chinese （Putonghua）

ZHANG Jialu 《声学学报：英文版》2006,25(3):193-203

The distinctive features, which axe one of the important research subjects in Phonetics and Phonology and in speech technology also, are the ultimate units of speech. Firstly a phoneme system of the standard Chinese-Putonghua was determined based on the results of cluster analysis of perceptual confusion of speech sounds of Putonghua. Then according to the principle of choice between the two opposites proposed by Jakobson, Fant and Halle, considering the characteristics of Putonghua the distinctive feature values for Initials, Finals and Tones were determined in this paper. And the features have been formulated at both acoustic level and genetic level. The distinctive feature trees of Chinese initials and finals were drawn in addition to the feature tables, in order to understand the distinctive features for individual phoneme easily. 相似文献

12.

Recognition of Putonghua voiceless stop like initials based on speech main periods

OU Guiwen 《声学学报：英文版》1994,(1)

I.Intr0ductionNowadays,thereismuchadvancemcntinthcrcsearchintospeechrecognition.Manyresearchershavebecninterestedintheimplementationofareliab1crealtimerec-ognitionsystemofunlimitedv0cabu1ary.Thercareafewproductsconversingsyl1ablesintoChinesecharactersinthemarket.However,theimp1ementationofarobustrealtAnerecognitionsystemofunlimitcdvocabularyisvcrydifficu1t,anditisthcgreataimofourresearch.WehaveaTMS32O-C25signa1processingboardattachedtoacomputerofthM-PC/AT80386.Wehopcthatourspeechrecognit… 相似文献

13.

基于听觉事件检测的汉语语音声韵切分 总被引：2，自引：0，他引：2

张宝奇张连海屈丹《声学学报》2010,35(6):701-707

提出了一种基于听觉事件检测的汉语声韵母切分方法。该方法首先使用耳蜗滤波器组对语音进行滤波,然后在每个频带上检测对应于能量突变的听觉事件,最后在不同频率范围对听觉事件进行融合以确定声韵母边界。实验结果表明,对8 kHz采样的干净语音切分准确率可达到88.9%;信噪比10 dB的语音切分准确率可达到82.9%以上。相似文献

14.

基于发音特征的汉语普通话语音声学建模 总被引：3，自引：0，他引：3

张晴晴潘接林颜永红《声学学报》2010,35(2):254-260

将表征汉语普通话语音特点的发音特征引入汉语普通话语音识别的声学建模中,根据普通话发音特点,确定了用于区别普通话元音、辅音以及声调信息的9种发音特征,并以此为目标值训练神经网络得到语音信号属于各类发音特征的后验概率,将此概率作为语音识别的输入特征建立声学模型。在汉语普通话非特定人大词表自然口语对话识别系统中进行了实验验证,并与基于频谱特征的声学模型进行了比较,在相同解码速度下,由此方法建立的声学模型汉字错误率相对下降6.8%;将发音特征和频谱特征进行了融合实验,融合以后的识别系统相对基于频谱特征系统的汉字错误率相对下降10.1%。上述结果表明,基于发音特征的声学模型更加有效的实现了对语音特性的表征,通过利用发音特征和频谱特征的互补性,能够进一步实现对语音识别性能的提高。相似文献

15.

基于Seneff听觉谱特征的汉语连续语音声韵母边界检测

下载免费PDF全文

陈斌张连海王波屈丹《声学学报》2012,37(1):104-112

提出了一种基于声韵母能量分布和共振峰结构特性的汉语连续语音声韵母边界检测方法。该方法首先将语音经过Seneff听觉感知模型得到听觉谱,然后基于听觉谱,选取全频带能量、低频带能量、谱重心、高低频能量比、中高频能量等特征参数对各声韵母类别能量分布和共振峰结构特性进行描述,最后根据特征参数变化剧烈的点确定出声韵母边界,并采用包络的一阶差分和基于样点的Kullback-Leibler距离对得到的边界进行修正。实验结果表明,对8 kHz采样的语音边界检测准确率可达到93.7%;信噪比10dB的语音边界检测准确率可达到85.3%以上;经过参数编码后语音边界检测准确率可达86 7%以上。相似文献

16.

The role of perceived spatial separation in the unmasking of speech 总被引：12，自引：0，他引：12

Freyman RL Helfer KS McCall DD Clifton RK 《The Journal of the Acoustical Society of America》1999,106(6):3578-3588

Spatial separation of speech and noise in an anechoic space creates a release from masking that often improves speech intelligibility. However, the masking release is severely reduced in reverberant spaces. This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections. Listeners' identification of nonsense sentences spoken by a female talker was measured in the presence of either speech-spectrum noise or other sentences spoken by a second female talker. Target and interference stimuli were presented in an anechoic chamber from loudspeakers directly in front and 60 degrees to the right in single-source and precedence-effect (lead-lag) conditions. For speech-spectrum noise, the spatial separation advantage for speech recognition (8 dB) was predictable from articulation index computations based on measured release from masking for narrow-band stimuli. The spatial separation advantage was only 1 dB in the lead-lag condition, despite the fact that a large perceptual separation was produced by the precedence effect. For the female talker interference, a much larger advantage occurred, apparently because informational masking was reduced by differences in perceived locations of target and interference. 相似文献

17.

Generation of articulatory movements by using a kinematic triphone model

Okadome T Honda M 《The Journal of the Acoustical Society of America》2001,110(1):453-463

The method described here predicts the trajectories of articulatory movements for continuous speech by using a kinematic triphone model and the minimum-acceleration model. The kinematic triphone model, which is constructed from articulatory data obtained from experiments using an electro-magnetic articulographic system, is characterized by three kinematic features of a triphone and by the intervals between two successive phonemes in the triphone. After a kinematic feature of a phoneme in a given sentence is extracted, the minimum-acceleration trajectory that coincides with the extremum of the time integral of the squared magnitude of the articulator acceleration is formulated. The calculation of the minimum acceleration requires only linear computation. The method predicts both the qualitative features and the quantitative details of experimentally observed articulation. 相似文献

18.

Acoustical Analysis of Speech in Progressive Supranuclear Palsy

Sabine Skodda Wenke Visser Uwe Schlegel 《Journal of voice》2011,25(6):725-731

Background

Dysarthria often is an early and prominent clinical feature of progressive supranuclear palsy (PSP). Based on perceptual analyses, speech impairment in PSP reportedly consists of prominent hypokinetic and spastic components with occasional ataxic features.

Objective

To measure objectively and quantitatively different speech parameters in PSP as compared with Parkinson's disease (PD) by acoustical analysis and to correlate these parameters with disease duration, global motor, and speech impairment and with the subtype of disease (Richardson's syndrome [RS] vs parkinsonian type of PSP [PSP-P]).

Patients and Methods

Twenty-six patients with clinical diagnosis of PSP (n = 14 classified as RS and n = 12 classified as PSP-P) and 30 age- and gender-matched patients with clinical diagnosis of PD were tested. Speech examination was based on the acoustical analysis of a standardized four-sentence reading task. Several speech variables were measured to assess phonation, intonation variability, speech velocity, and articulatory precision. All participants were tested according to Unified Parkinson's Disease Rating Scale/Motor Score (UPDRS-III) and staged according to Hoehn and Yahr stages. Global speech intelligibility was evaluated on the basis of the UPDRS-III speech item.

Results

In the PSP group, speech velocity, intonation variability, and the fraction of intraword pauses as a measure of articulatory precision were significantly reduced, whereas the percentage of speech pauses was prolonged as compared with the PD group. Only in the male PSP patients, vowel articulation was found to be impaired. Global speech performance was worse in the PSP group in comparison with the PD group and showed a correlation to some distinct speech dimensions. No differences of speech variables were seen between RS and PSP-P patients.

Conclusions

PSP patients feature a mixed type of dysarthria with hypokinetic and spastic components that differ significantly from the speech performance of PD speakers. This probably reflects the widespread neuropathological changes in PSP comprising basal ganglia as well as pontine and further brainstem regions. 相似文献