首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Zipf’s law of abbreviation, which posits a negative correlation between word frequency and length, is one of the most famous and robust cross-linguistic generalizations. At the same time, it has been shown that contextual informativity (average surprisal given previous context) is more strongly correlated with word length, although this tendency is not observed consistently, depending on several methodological choices. The present study examines a more diverse sample of languages than the previous studies (Arabic, Finnish, Hungarian, Indonesian, Russian, Spanish and Turkish). I use large web-based corpora from the Leipzig Corpora Collection to estimate word lengths in UTF-8 characters and in phonemes (for some of the languages), as well as word frequency, informativity given previous word and informativity given next word, applying different methods of bigrams processing. The results show different correlations between word length and the corpus-based measure for different languages. I argue that these differences can be explained by the properties of noun phrases in a language, most importantly, by the order of heads and modifiers and their relative morphological complexity, as well as by orthographic conventions.  相似文献   

2.
Mathematical treatment of context effects in phoneme and word recognition   总被引:2,自引:0,他引:2  
Percent recognition of phonemes and whole syllables, measured in both consonant-vowel-consonant (CVC) words and CVC nonsense syllables, is reported for normal young adults listening at four signal-to-noise (S/N) ratios. Similar data are reported for the recognition of words and whole sentences in three types of sentence: high predictability (HP) sentences, with both semantic and syntactic constraints; low predictability (LP) sentences, with primarily syntactic constraints; and zero predictability (ZP) sentences, with neither semantic nor syntactic constraints. The probability of recognition of speech units in context (pc) is shown to be related to the probability of recognition without context (pi) by the equation pc = 1 - (1-pi)k, where k is a constant. The factor k is interpreted as the amount by which the channels of statistically independent information are effectively multiplied when contextual constraints are added. Empirical values of k are approximately 1.3 and 2.7 for word and sentence context, respectively. In a second analysis, the probability of recognition of wholes (pw) is shown to be related to the probability of recognition of the constituent parts (pp) by the equation pw = pjp, where j represents the effective number of statistically independent parts within a whole. The empirically determined mean values of j for nonsense materials are not significantly different from the number of parts in a whole, as predicted by the underlying theory. In CVC words, the value of j is constant at approximately 2.5. In the four-word HP sentences, it falls from approximately 2.5 to approximately 1.6 as the inherent recognition probability for words falls from 100% to 0%, demonstrating an increasing tendency to perceive HP sentences either as wholes, or not at all, as S/N ratio deteriorates.  相似文献   

3.
4.
Semantic memory is the cognitive system devoted to storage and retrieval of conceptual knowledge. Empirical data indicate that semantic memory is organized in a network structure. Everyday experience shows that word search and retrieval processes provide fluent and coherent speech, i.e. are efficient. This implies either that semantic memory encodes, besides thousands of words, different kind of links for different relationships (introducing greater complexity and storage costs), or that the structure evolves facilitating the differentiation between long-lasting semantic relations from incidental, phenomenological ones. Assuming the latter possibility, we explore a mechanism to disentangle the underlying semantic backbone which comprises conceptual structure (extraction of categorical relations between pairs of words), from the rest of information present in the structure. To this end, we first present and characterize an empirical data set modeled as a network, then we simulate a stochastic cognitive navigation on this topology. We schematize this latter process as uncorrelated random walks from node to node, which converge to a feature vectors network. By doing so we both introduce a novel mechanism for information retrieval, and point at the problem of category formation in close connection to linguistic and non-linguistic experience.  相似文献   

5.
This study investigated the relative contributions of consonants and vowels to the perceptual intelligibility of monosyllabic consonant-vowel-consonant (CVC) words. A noise replacement paradigm presented CVCs with only consonants or only vowels preserved. Results demonstrated no difference between overall word accuracy in these conditions; however, different error patterns were observed. A significant effect of lexical difficulty was demonstrated for both types of replacement, whereas the noise level used during replacement did not influence results. The contribution of consonant and vowel transitional information present at the consonant-vowel boundary was also explored. The proportion of speech presented, regardless of the segmental condition, overwhelmingly predicted performance. Comparisons were made with previous segment replacement results using sentences [Fogerty, and Kewley-Port (2009). J. Acoust. Soc. Am. 126, 847-857]. Results demonstrated that consonants contribute to intelligibility equally in both isolated CVC words and sentences. However, vowel contributions were mediated by context, with greater contributions to intelligibility in sentence contexts. Therefore, it appears that vowels in sentences carry unique speech cues that greatly facilitate intelligibility which are not informative and/or present during isolated word contexts. Consonants appear to provide speech cues that are equally available and informative during sentence and isolated word presentations.  相似文献   

6.
张晓丹  王震  郑非非  杨淼 《中国物理 B》2012,21(3):30205-030205
In this paper, we introduce word diversity that reflects the inhomogeneity of words in a communication into the naming game. Diversity is realized by assigning a weight factor to each word. The weight is determined by three different distributions (uniform, exponential, and power-law distributions). During the communication, the probability that a word is selected from speaker's memory depends on the introduced word diversity. Interestingly, we find that the word diversity following three different distributions can remarkably promote the final convergency, which is of high importance in the self-organized system. In particular, for all the ranges of amplitude of distribution, the power-law distribution enables the fastest consensus, while uniform distribution gives the slowest consensus. We provide an explanation of this effect based on both the number of different names and the number of total names, and find that a wide spread of names induced by the segregation of words is the main promotion factor. Other quantities, including the evolution of the averaging success rate of negotiation and the scaling behavior of consensus time, are also studied. These results are helpful for better understanding the dynamics of the naming game with word diversity.  相似文献   

7.
In this paper,we introduce word diversity that reflects the inhomogeneity of words in a communication into the naming game.Diversity is realized by assigning a weight factor to each word.The weight is determined by three different distributions(uniform,exponential,and power-law distributions).During the communication,the probability that a word is selected from speaker’s memory depends on the introduced word diversity.Interestingly,we find that the word diversity following three different distributions can remarkably promote the final convergency,which is of high importance in the self-organized system.In particular,for all the ranges of amplitude of distribution,the powerlaw distribution enables the fastest consensus,while uniform distribution gives the slowest consensus.We provide an explanation of this effect based on both the number of different names and the number of total names,and find that a wide spread of names induced by the segregation of words is the main promotion factor.Other quantities,including the evolution of the averaging success rate of negotiation and the scaling behavior of consensus time,are also studied.These results are helpful for better understanding the dynamics of the naming game with word diversity.  相似文献   

8.
We proposed and evaluated an estimation method for the forced selection speech intelligibility tests. Our proposal takes into account the forced selection manner of the Diagnostic Rhyme Test (DRT), which forces selection from a pair of rhyming words. A distance measure is calculated between the test word and the two candidate words, respectively, and the distance is compared to select the most likely word. We compared two distance measures. The first objective distance measure used here was based on the Articulation index Band Correlation (ABC). The ABC is the correlation of time–frequency (T–F) patterns between the test word and the template word speech of the two words in the candidate word pair. The word with the higher correlation was decided to be the likely candidate word. The T–F pattern was calculated in the Articulation Index (AI) bands, and the correlation was calculated between the corresponding bands of the test and candidate word sample. In order to estimate the intelligibility, we calculate the ratio of the number of bands in which higher correlation is seen for the correct word vs. the total number of bands (named ABC-est). This ratio quantifies how well the test word matches the correct word in the word pair. For the second objective distance, we used a measure based on the frequency-weighted segmental SNR (fwSNRseg). Segmental SNR (SNRseg) was calculated in AI bands, and compared among the candidate word templates. We then calculated the frequency-weighted ratio of the number of bands in which higher SNRseg was observed for the correct word vs. the total number of bands (named fwSNRseg-est), again to quantify how well the test word matches the selected candidate word in the pair. We estimated a logistic mapping function from the above two ratios to intelligibility scores using speech mixed with known noise. The mapping functions were then used to estimate the intelligibility of speech mixed with unknown noise. This estimation was compared to another measure that we previously evaluated, the conventional fwSNRseg, which directly maps the measure to intelligibility. Both proposed measures were proven to be significantly more accurate than conventional fwSNRseg. For most cases, the accuracy was comparable between the two proposed distance measures, ABC-est and fwSNRseg-est, with the latter showing correlation between the subjective and estimated intelligibility as high as 0.97, and root mean square as low as 0.11 for one of the test sets, but not as accurate for other sets. The ABC-est showed more stable accuracy for all sets. However, both measures show practical accuracies in all conditions tested. Thus, it should be possible to “screen” the intelligibility in many of the noise conditions to be tested, and cut down on the scale of the subjective test needed.  相似文献   

9.
Three experiments were conducted to examine the effects of trial-to-trial variations in speaking style, fundamental frequency, and speaking rate on identification of spoken words. In addition, the experiments investigated whether any effects of stimulus variability would be modulated by phonetic confusability (i.e., lexical difficulty). In Experiment 1, trial-to-trial variations in speaking style reduced the overall identification performance compared with conditions containing no speaking-style variability. In addition, the effects of variability were greater for phonetically confusable words than for phonetically distinct words. In Experiment 2, variations in fundamental frequency were found to have no significant effects on spoken word identification and did not interact with lexical difficulty. In Experiment 3, two different methods for varying speaking rate were found to have equivalent negative effects on spoken word recognition and similar interactions with lexical difficulty. Overall, the findings are consistent with a phonetic-relevance hypothesis, in which accommodating sources of acoustic-phonetic variability that affect phonetically relevant properties of speech signals can impair spoken word identification. In contrast, variability in parameters of the speech signal that do not affect phonetically relevant properties are not expected to affect overall identification performance. Implications of these findings for the nature and development of lexical representations are discussed.  相似文献   

10.
Previous research has shown that speech recognition differences between native and proficient non-native listeners emerge under suboptimal conditions. Current evidence has suggested that the key deficit that underlies this disproportionate effect of unfavorable listening conditions for non-native listeners is their less effective use of compensatory information at higher levels of processing to recover from information loss at the phoneme identification level. The present study investigated whether this non-native disadvantage could be overcome if enhancements at various levels of processing were presented in combination. Native and non-native listeners were presented with English sentences in which the final word varied in predictability and which were produced in either plain or clear speech. Results showed that, relative to the low-predictability-plain-speech baseline condition, non-native listener final word recognition improved only when both semantic and acoustic enhancements were available (high-predictability-clear-speech). In contrast, the native listeners benefited from each source of enhancement separately and in combination. These results suggests that native and non-native listeners apply similar strategies for speech-in-noise perception: The crucial difference is in the signal clarity required for contextual information to be effective, rather than in an inability of non-native listeners to take advantage of this contextual information per se.  相似文献   

11.
Function words, especially frequently occurring ones such as (the, that, and, and of), vary widely in pronunciation. Understanding this variation is essential both for cognitive modeling of lexical production and for computer speech recognition and synthesis. This study investigates which factors affect the forms of function words, especially whether they have a fuller pronunciation (e.g., thi, thaet, aend, inverted-v v) or a more reduced or lenited pronunciation (e.g., thax, thixt, n, ax). It is based on over 8000 occurrences of the ten most frequent English function words in a 4-h sample from conversations from the Switchboard corpus. Ordinary linear and logistic regression models were used to examine variation in the length of the words, in the form of their vowel (basic, full, or reduced), and whether final obstruents were present or not. For all these measures, after controlling for segmental context, rate of speech, and other important factors, there are strong independent effects that made high-frequency monosyllabic function words more likely to be longer or have a fuller form (1) when neighboring disfluencies (such as filled pauses uh and um) indicate that the speaker was encountering problems in planning the utterance; (2) when the word is unexpected, i.e., less predictable in context; (3) when the word is either utterance initial or utterance final. Looking at the phenomenon in a different way, frequent function words are more likely to be shorter and to have less-full forms in fluent speech, in predictable positions or multiword collocations, and utterance internally. Also considered are other factors such as sex (women are more likely to use fuller forms, even after controlling for rate of speech, for example), and some of the differences among the ten function words in their response to the factors.  相似文献   

12.
Semantic communication is a promising technology used to overcome the challenges of large bandwidth and power requirements caused by the data explosion. Semantic representation is an important issue in semantic communication. The knowledge graph, powered by deep learning, can improve the accuracy of semantic representation while removing semantic ambiguity. Therefore, we propose a semantic communication system based on the knowledge graph. Specifically, in our system, the transmitted sentences are converted into triplets by using the knowledge graph. Triplets can be viewed as basic semantic symbols for semantic extraction and restoration and can be sorted based on semantic importance. Moreover, the proposed communication system adaptively adjusts the transmitted contents according to channel quality and allocates more transmission resources to important triplets to enhance communication reliability. Simulation results show that the proposed system significantly enhances the reliability of the communication in the low signal-to-noise regime compared to the traditional schemes.  相似文献   

13.
Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.  相似文献   

14.
Recent work at Bell Laboratories has demonstrated the utility of applying sophisticated pattern recognition techniques to obtain a set of speaker-independent word templates for an isolated word recognition system [Levinson et al.,IEEE Trans. Acoust. Speech Signal Process. ASSP-27 (2), 134--141 (1979); Rabiner et al., IEEE Trans. Acoust. Speech Signal Process.(in press)]. In these studies, it was shown that a careful experimenter could guide the clustering algorithms to choose a small set of templates that were representative of a large number of replications for each word in the vocabulary. Subsequent word recognition tests verified that the templates chosen were indeed representative of a fairly large population of talkers. Given the success of this approach, the next important step is to investigate fully automatic techniques for clustering multiple versions of a single word into a set of speaker-independent word templates. Two such techniques are described in this paper. The first method uses distance data (between replications of a word) to segment the population into stable clusters. The word template is obtained as either the cluster minimax, or as an averaged version of all the elements in the cluster. The second method is a variation of the one described by Rabiner [IEEE Trans. Acoust. Speech Signal Process. ASSP-26 (3), 34--42 (1978)] in which averaging techniques are directly combined with the nearest neighbor rule to simultaneously define both the word template (i.e., the cluster center) and the elements in the cluster. Experimental data show the first method to be superior to the second method when three or more clusters per word are used in the recognition task.  相似文献   

15.
Older adults are known to benefit from supportive context in order to compensate for age-related reductions in perceptual and cognitive processing, including when comprehending spoken language in adverse listening conditions. In the present study, we examine how younger and older adults benefit from two types of contextual support, predictability from sentence context and priming, when identifying target words in noise-vocoded sentences. In the first part of the experiment, benefit from context based on primarily semantic knowledge was evaluated by comparing the accuracy of identification of sentence-final target words that were either highly predictable or not predictable from the sentence context. In the second part of the experiment, benefit from priming was evaluated by comparing the accuracy of identification of target words when noise-vocoded sentences were either primed or not by the presentation of the sentence context without noise vocoding and with the target word replaced with white noise. Younger and older adults benefited from each type of supportive context, with the most benefit realized when both types were combined. Supportive context reduced the number of noise-vocoded bands needed for 50% word identification more for older adults than their younger counterparts.  相似文献   

16.
Some effects of talker variability on spoken word recognition   总被引:2,自引:0,他引:2  
The perceptual consequences of trial-to-trial changes in the voice of the talker on spoken word recognition were examined. The results from a series of experiments using perceptual identification and naming tasks demonstrated that perceptual performance decreases when the voice of the talker changes from trial to trial compared to performance when the voice on each trial remains the same. In addition, the effects of talker variability on word recognition appeared to be more robust and less dependent on task than the effects of word frequency and lexical structure. Possible hypotheses regarding the nature of the processes giving rise to these effects are discussed, with particular attention to the idea that the processing of information about the talker's voice is intimately related to early perceptual processes that extract acoustic-phonetic information from the speech signal.  相似文献   

17.
汉语耳语音孤立字识别研究   总被引:6,自引:0,他引:6       下载免费PDF全文
杨莉莉  林玮  徐柏龄 《应用声学》2006,25(3):187-192
耳语音识别有着广泛的应用前景,是一个全新的课题.但是由于耳语音本身的特点,如声级低、没有基频等,给耳语音识别研究带来了困难.本文根据耳语音信号发音模型,结合耳语音的声学特性,建立了一个汉语耳语音孤立字识别系统.由于耳语音信噪比低,必须对其进行语音增强处理,同时在识别系统中应用声调信息提高了识别性能.实验结果说明了MFCC结合幅值包络可作为汉语耳语音自动识别的特征参数,在小字库内用HMM模型识别得出的识别率为90.4%.  相似文献   

18.
Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition.  相似文献   

19.
The influence of the precedence effect on word identification was investigated binaurally and monaurally with normally hearing and hearing-impaired subjects. The Modified Rhyme Test was processed through a PDP-12 computer to produce delay times of 0, 5, 10, 20, 40, 80, or 160 ms. The sounds were reproduced in a room by two loudspeakers positioned at +/-30 degrees azimuths in front of a subject at 50 dB SPL for normals and at the most comfortable level for impaireds. A babble of eight voices was added to reduce scores about 15% from the best values measured in quiet. Binaural and monaural word identification remained constant over a range of delays from 0 to 20 ms and declined for longer delays for both groups of subjects. The shapes of the word-identification curves were explained by self-masking (an overlap of consonants with their own repetitions) and masking (an overlap of consonants with preceding vowels or preceding and following words in sentence). Binaural responses for ten selected initial and final consonants showed various patterns of perception with delay. Some hearing impaireds showed more deterioration in word identification than others which might indicate that they experience more perceptual difficulties than normal listeners in places with reverberation or sound amplification.  相似文献   

20.
In mammals individual distinctiveness in vocalizations provides the basis for individual recognition and thus plays an important role in social behavior. In this study, first evidence is provided for a nocturnal primate that variation in individual distinctiveness across the vocal repertoire is to some extent determined by the context and the acoustic structure of the call types. Individual distinctiveness was investigated across call types in the gray mouse lemur, a nocturnal primate, living in a dispersed multi-male multi-female social system. To explore to what degree context and acoustic structure predict variations in individual distinctiveness, four major call types were examined (grunts, tsaks, short whistles, and trills). Call types differed in context and acoustic structure and were recorded under controlled experimental conditions. A discriminant function analysis revealed that all call types are individually distinct, but not to the same degree. The findings suggest that variations in individual distinctiveness can to some extent be explained by the context and the acoustic structure of the call types.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号