首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The purpose of this exploratory study was to examine the relationship between undergraduate vocal music majors' diction acquisition abilities for singing in a nonnative language (as rated both by themselves and by their studio voice teachers) and their scores on an objective test of phonemic and stress perception. Ten students with varying levels of university voice training served as participants. The results showed significant negative correlations between each of the teachers' four ratings and the students' scores on the phonemic awareness subtest. In addition, 20% of the students demonstrated evidence of underdeveloped phonemic awareness skills, as indicated by their below average test performance. Considerable individual differences were also observed in the students' abilities to track phonemes within a sequence of phonemes, count and track syllables within a sequence of syllables, and track combinations of phoneme and syllable changes in sequence, as evidenced by subtest performance scores. These findings corroborate existing reports which indicate that approximately 30% of the population does not fully develop phonemic awareness skills in the absence of special training. The findings support the utility of this objective test of phonemic and stress perception as a means of identifying students who will have difficulty with diction acquisition, and point to possibilities for pretraining to improve their response to diction instruction.  相似文献   

2.
The search for the acoustic properties useful to the listener in extracting the linguistic message from a speech signal is often construed as the task of matching invariant physical properties to invariant phonological percepts; the discovery of the former will explain the latter. These phonological percepts are essentially the phonemes of pregenerative phonology, and they are more or less faithfully reflected in standard alphabetic writing. Thus English deep and doom are supposed to be perceptually identical in their initial /d/s; the orthographic similarity is in agreement with the linguist's "representation" of these forms. The partial identity in spelling is only weak evidence for perceptual invariance, however. First, while some phonemes may comprise a single "sound," others are said by linguists to include phonetically distinct ones. Thus English /p/includes both aspirated and unaspirated voiceless labial stops. The view that it is not the phoneme, but rather the phonetic feature, to which an acoustic invariant might be attributed, raises two questions: (a) Since segments sharing a feature are rarely judged to constitute a single sound, the search for a feature-specific invariant, whose function is to explain perceptual constancy, is deprived of its essential motivation, and (2) there is no more reason to expect the acoustic cues to a feature to be context-independent than is the case with the phoneme. What seems more likely is to find that some phonemes, and some features, are more invariantly marked in the speech signal than others.  相似文献   

3.
Some features of emotional prosody in human speech may be traced back to affect cues in mammalian vocalizations. The present study addresses the question whether affect intensity, as expressed by the intensity of behavioral displays, is encoded in vocal cues, i.e., changes in the structure of associated calls, in bats, a group evolutionarily remote from primates. A frame-by-frame video analysis of 109 dyadic agonistic interactions recorded in approach situations was performed to categorize displays into two intensity levels based on a cost-benefit estimate. M. lyra showed graded visual displays accompanied by specific calls and response calls of the second bat. A sound analysis revealed systematic changes of call sequence parameters with display level. At the high intensity level, total call duration, number of syllables within a call, and the number of calls within a sequence were increased, while intervals between call syllables were decreased for both call types. In addition, the latency of the response call was shorter, and its main syllable-type durations and fundamental frequency were increased. These systematic changes of vocal parameters with affect intensity correspond to prosodic changes in human speech, suggesting that emotion-related acoustic cues are a common feature of vocal communication in mammals.  相似文献   

4.
谷东  简志华 《声学学报》2018,43(5):864-872
针对目标说话人可能存在语料不足的情况,本文提出了一种有限语料下的统一张量字典语音转换算法。从语料库中选取N个说话人作为语音张量字典的基础说话人,通过多序列动态时间规整算法使这N个说话人的平行语音段对齐,从而建立由N个二维基础字典构成的张量字典。在语音转换阶段,源、目标说话人语音都可以通过张量字典中各基础字典的线性组合,构造出各自的语音字典,实现了语音转换。实验结果表明,当基础说话人个数达到14时,只需要极少的目标说话人语料,便可获得与传统的基于非负矩阵分解转换算法相当的转换效果,这极大地方便了语音转换系统的应用。  相似文献   

5.
Under the condition of limited target speaker's corpus, this paper proposed an algorithm for voice conversion using unified tensor dictionary with limited corpus. Firstly,parallel speech of N speakers was selected randomly from the speech corpus to build the base of tensor dictionary. And then, after the operation of multi-series dynamic time warping for those chosen speech, N two-dimension basic dictionaries can be generated which constituted the unified tensor dictionary. During the conversion stage, the two dictionaries of source and target speaker were established by linear combination of the N basic dictionaries using the two speakers' speech. The experimental results showed that when the number of the basic speaker was 14, our algorithm can obtain the compared performance of the traditional NMFbased method with few target speaker corpus, which greatly facilitate the application of voice conversion system.  相似文献   

6.
A number of neural networks, with direct lateral couplings, have been simulated digitally. The operation of each of these networks has been investigated with regard to the form of interconnection between neurons and the shape of input excitation front to the network. One multilayer network investigated had properties which may duplicate the frequency and intensity coding properties of the mammalian auditory system. As such this network forms the basis of a conceptual model of the processing of sound signals in the auditory system.It is a source of great controversy as to whether the organisation of the auditory system can be copied and understood. However, the neural networks investigated could produce output excitation patterns in which frequency and intensity information was coded with position. Such a system of speech coding, if not a true analogue of the auditory system, has immense potential in the field of speech recognition.  相似文献   

7.
In normal speech, coordinated activities of intrinsic laryngeal muscles suspend a glottal sound at utterance of voiceless consonants, automatically realizing a voicing control. In electrolaryngeal speech, however, the lack of voicing control is one of the causes of unclear voice, voiceless consonants tending to be misheard as the corresponding voiced consonants. In the present work, we developed an intra-oral vibrator with an intra-oral pressure sensor that detected utterance of voiceless phonemes during the intra-oral electrolaryngeal speech, and demonstrated that an intra-oral pressure-based voicing control could improve the intelligibility of the speech. The test voices were obtained from one electrolaryngeal speaker and one normal speaker. We first investigated on the speech analysis software how a voice onset time (VOT) and first formant (F1) transition of the test consonant-vowel syllables contributed to voiceless/voiced contrasts, and developed an adequate voicing control strategy. We then compared the intelligibility of consonant-vowel syllables among the intra-oral electrolaryngeal speech with and without online voicing control. The increase of intra-oral pressure, typically with a peak ranging from 10 to 50 gf/cm2, could reliably identify utterance of voiceless consonants. The speech analysis and intelligibility test then demonstrated that a short VOT caused the misidentification of the voiced consonants due to a clear F1 transition. Finally, taking these results together, the online voicing control, which suspended the prosthetic tone while the intra-oral pressure exceeded 2.5 gf/cm2 and during the 35 milliseconds that followed, proved efficient to improve the voiceless/voiced contrast.  相似文献   

8.
The relative importance of different parts of the auditory spectrum to recognition of the Diagnostic Rhyme Test (DRT) and its six speech feature subtests was determined. Three normal hearing subjects were tested twice in each of 70 experimental conditions. The analytical procedures of French and Steinberg [J. Acoust. Soc. Am. 19, 90-119 (1947)] were applied to the data to derive frequency importance functions for each of the DRT subtests and the test as a whole over the frequency range 178-8912 Hz. For the DRT as a whole, the low frequencies were found to be more important than is the case for nonsense syllables. Importance functions for the feature subtests also differed from those for nonsense syllables and from each other as well. These results suggest that test materials loaded with different proportions of particular phonemes have different frequency importance functions. Comparison of the results with those from other studies suggests that importance functions depend to a degree on the available response options as well.  相似文献   

9.
Neural network-based image processing algorithms present numerous advantages due to their supervised adjustable properties. Among various neural network architectures, dynamic neural networks, Hopfield and Cellular networks, have been found inherently suitable for filtering applications. Combining supervised and filtering features of dynamic neural networks, this paper presents dynamic neural filtering technique based on Hopfield neural network architecture. The filtering technique has also been implemented by using phase-only joint transform correlation (POJTC) for optical image processing applications. Filtering structure is basically similar to the Hopfield neural network structure except for the adjustable filter mask and 2D convolution operation instead of weight matrix operations. The dynamic neural filtering architecture has learnable properties by back-propagation learning algorithm. POJTC presents significant advantages to achieve the operation of summing the cross-correlation of bipolar data by phase-encoding bipolar data in parallel. The image feature extraction performance of the proposed optical system is reported for various image processing applications using a simulation program.  相似文献   

10.

Background

Statistical learning is a candidate for one of the basic prerequisites underlying the expeditious acquisition of spoken language. Infants from 8 months of age exhibit this form of learning to segment fluent speech into distinct words. To test the statistical learning skills at birth, we recorded event-related brain responses of sleeping neonates while they were listening to a stream of syllables containing statistical cues to word boundaries.

Results

We found evidence that sleeping neonates are able to automatically extract statistical properties of the speech input and thus detect the word boundaries in a continuous stream of syllables containing no morphological cues. Syllable-specific event-related brain responses found in two separate studies demonstrated that the neonatal brain treated the syllables differently according to their position within pseudowords.

Conclusion

These results demonstrate that neonates can efficiently learn transitional probabilities or frequencies of co-occurrence between different syllables, enabling them to detect word boundaries and in this way isolate single words out of fluent natural speech. The ability to adopt statistical structures from speech may play a fundamental role as one of the earliest prerequisites of language acquisition.  相似文献   

11.
Speechreading supplemented with auditorily presented speech parameters   总被引:2,自引:0,他引:2  
Results are reported from two experiments in which the benefit of supplementing speechreading with auditorily presented information about the speech signal was investigated. In experiment I, speechreading was supplemented with information about the prosody of the speech signal. For ten normal-hearing subjects with no experience in speechreading, the intelligibility score for sentences increased significantly when speechreading was supplemented with information about the overall amplitude of the speech signal, information about the fundamental frequency, or both. Binary information about voicing appeared not to be a significant supplement. In experiment II, the best-scoring supplements of experiment I were compared with two supplementary signals from our previous studies, i.e., information about the sound-pressure levels in two 1-oct filter bands centered at 500 and 3160 Hz, or information about the frequencies of the first and second formants from voiced speech segments. Sentence-intelligibility scores were measured for 24 normal-hearing subjects with no experience in speechreading, and for 12 normal-hearing experienced speechreaders. For the inexperienced speechreaders, the sound-pressure levels appeared to be the best supplement (87.1% correct syllables). For the experienced speechreaders, the formant-frequency information (88.6% correct), and the fundamental-frequency plus amplitude information (86.0% correct), were equally efficient supplements as the sound-pressure information (86.1% correct). Discrimination of phonemes (both consonants and vowels) was measured for the group of 24 inexperienced speechreaders. Percentage correct responses, confusion among phonemes, and the percentage of transmitted information about different types of manner and place of articulation and about the feature voicing are presented.  相似文献   

12.
Traditional accounts of speech perception generally hold that listeners use isolable acoustic "cues" to label phonemes. For syllable-final stops, duration of the preceding vocalic portion and formant transitions at syllable's end have been considered the primary cues to voicing decisions. The current experiment tried to extend traditional accounts by asking two questions concerning voicing decisions by adults and children: (1) What weight is given to vocalic duration versus spectral structure, both at syllable's end and across the syllable? (2) Does the naturalness of stimuli affect labeling? Adults and children (4, 6, and 8 years old) labeled synthetic stimuli that varied in vocalic duration and spectral structure, either at syllable's end or earlier in the syllable. Results showed that all listeners weighted dynamic spectral structure, both at syllable's end and earlier in the syllable, more than vocalic duration, and listeners performed with these synthetic stimuli as listeners had performed previously with natural stimuli. The conclusion for accounts of human speech perception is that rather than simply gathering acoustic cues and summing them to derive strings of phonemic segments, listeners are able to attend to global spectral structure, and use it to help recover explicitly phonetic structure.  相似文献   

13.
重音是重要的语调特征,重音合成技术可以提高语音的自然度和表现力。针对重音的局部凸显性,该文提出了声学特征凸显度的表示方法,分析了不同韵律位置(韵律词首、中、尾,韵律短语首、中、尾等)重音音节的声学特征凸显度,发现在韵律单元末(韵律词末音节和韵律短语末韵律词)的重音其基频最大值凸显度要低于非韵律单元末重音,提出了基于声学特征凸显度的非线性的重音声学参数生成算法,解决了传统重音声学参数线性修改算法的修改幅度不足或过大的问题。采用该算法建立了基于隐Markov模型的支持重音合成的语音合成系统。实验表明,该系统可以有效合成带有重音的语音,提高了合成语音的自然度和表现力。  相似文献   

14.
The utility of phonetic features versus acoustic properties for describing perceptual relations among speech sounds was evaluated with a multidimensional scaling analysis of Miller and Nicely's [J. Acoust. Soc. Am. 27, 338-352 (1955)] consonant confusions data. The INDSCAL method and program were employed with the original data log transformed to enhance consistency with the linear INDSCAL model. A four-dimensional solution accounted for 69% of the variance and was best characterized in terms of acoustic properties of the speech signal, viz., temporal relationship of periodicity and burst onset, shape of voiced first formanant transition, shape of voiced second formanant transition, and amount of initial spectral dispersion, rather than in terms of phonetic features. The amplitude and spectral location of acoustic energy specifying each perceptual dimension were found to determine a dimension's perceptual effect as the signal was degraded by masking noise and bandpass filtering. Consequently, the perceptual bases of identification confusions between pairs of syllables were characterized in terms of the shared acoustic properties which remained salient in the degraded speech. Implications of these findings for feature-based accounts of perceptual relationships between phonemes are considered.  相似文献   

15.
Empirical analysis of a scale-free railway network in China   总被引:1,自引:0,他引:1  
W. Li  X. Cai 《Physica A》2007,382(2):693-703
We present a detailed, empirical analysis of the statistical properties of the China Railway Network (CRN) consisting of 3915 nodes (train stations) and 22 259 edges (railways). Based on this, CRN displays two explicit features already observed in numerous real-world and artificial networks. One feature, the small-world property, has the fingerprint of a small characteristic shortest-path length, 3.5, accompanied by a high degree of clustering, 0.835. Another feature is characterized by the scale-free distributions of both degrees and weighted degrees, namely strengths. Correlations between strength and degree, degree and degree, and clustering coefficient and degree have been studied and the forms of such behaviors have been identified. In addition, we investigate distributions of clustering coefficients, topological distances, and spatial distances.  相似文献   

16.
Haitao Liu 《Physica A》2008,387(12):3048-3058
This paper proposes how to build a syntactic network based on syntactic theory and presents some statistical properties of Chinese syntactic dependency networks based on two Chinese treebanks with different genres. The results show that the two syntactic networks are small-world networks, and their degree distributions obey a power law. The finding, that the two syntactic networks have the same diameter and different average degrees, path lengths, clustering coefficients and power exponents, can be seen as an indicator that complexity theory can work as a means of stylistic study. The paper links the degree of a vertex with a valency of a word, the small world with the minimized average distance of a language, that reinforces the explanations of the findings from linguistics.  相似文献   

17.
A dynamic model of articulatory movements is introduced. The research presented herein focuses on the method of representing the phonemic tasks, i.e., phoneme-specific articulatory targets. Phonemic tasks in our model are formally defined using invariant features of articulatory posture. The invariant features used in the model are characterized by the linear transformation of articulatory variables and found using a statistical analysis of measured articulatory movements, in which the articulatory features with minimum variability are taken to be the invariant features. Articulatory movements making vocal-tract constrictions or relative movements among articulators reflecting task-sharing structures are typical examples of the features found to have low variability. In the trajectory formation of articulatory movements, the dimension number of the phonemic task is set at a smaller value than that of articulatory variables. Consequently, the kinematic states of the articulators are partly constrained at given time instants by a sequence of phonemic tasks, and there remain unconstrained degrees of freedom of articulatory variables. Articulatory movements are determined so that they simultaneously satisfy given phonemic tasks and dynamic smoothness constraints. The dynamic smoothness constraints coupled with the underspecified phonemic targets allow our model to explain contextual articulatory variability using context-independent phonemic tasks. Finally, the capability of the model for predicting actual articulatory movements is quantitatively investigated using empirical articulatory data.  相似文献   

18.
A controversial issue in neurolinguistics is whether basic neural auditory representations found in many animals can account for human perception of speech. This question was addressed by examining how a population of neurons in the primary auditory cortex (A1) of the naive awake ferret encodes phonemes and whether this representation could account for the human ability to discriminate them. When neural responses were characterized and ordered by spectral tuning and dynamics, perceptually significant features including formant patterns in vowels and place and manner of articulation in consonants, were readily visualized by activity in distinct neural subpopulations. Furthermore, these responses faithfully encoded the similarity between the acoustic features of these phonemes. A simple classifier trained on the neural representation was able to simulate human phoneme confusion when tested with novel exemplars. These results suggest that A1 responses are sufficiently rich to encode and discriminate phoneme classes and that humans and animals may build upon the same general acoustic representations to learn boundaries for categorical and robust sound classification.  相似文献   

19.
Review of text-to-speech conversion for English   总被引:7,自引:0,他引:7  
The automatic conversion of English text to synthetic speech is presently being performed, remarkably well, by a number of laboratory systems and commercial devices. Progress in this area has been made possible by advances in linguistic theory, acoustic-phonetic characterization of English sound patterns, perceptual psychology, mathematical modeling of speech production, structured programming, and computer hardware design. This review traces the early work on the development of speech synthesizers, discovery of minimal acoustic cues for phonetic contrasts, evolution of phonemic rule programs, incorporation of prosodic rules, and formulation of techniques for text analysis. Examples of rules are used liberally to illustrate the state of the art. Many of the examples are taken from Klattalk, a text-to-speech system developed by the author. A number of scientific problems are identified that prevent current systems from achieving the goal of completely human-sounding speech. While the emphasis is on rule programs that drive a format synthesizer, alternatives such as articulatory synthesis and waveform concatenation are also reviewed. An extensive bibliography has been assembled to show both the breadth of synthesis activity and the wealth of phenomena covered by rules in the best of these programs. A recording of selected examples of the historical development of synthetic speech, enclosed as a 33 1/3-rpm record, is described in the Appendix.  相似文献   

20.
Systems designed to recognize continuous speech must be able to adapt to many types of acoustic variation, including variations in stress. A speaker-dependent recognition study was conducted on a group of stressed and destressed syllables. These syllables, some containing the short vowel /I/ and others the long vowel /ae/, were excised from continuous speech and transformed into arrays of cepstral coefficients at two levels of precision. From these data, four types of template dictionaries varying in size and stress composition were formed by a time-warping procedure. Recognition performance data were gathered from listeners and from a computer recognition algorithm that also employed warping. It was found that for a significant portion of the data base, stressed and destressed versions of the same syllable are sufficiently different from one another as to justify the use of separate dictionary templates. Second, destressed syllables exhibit roughly the same acoustic variance as their stressed counterparts. Third, long vowels tend to be involved in proportionally fewer cross-vowel errors but tend to diminish the warping algorithm's ability to discriminate consonantal information. Finally, the pattern of consonant errors that listeners make as a function of vowel length shows significant differences from that produced by the computer.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号