首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 312 毫秒
1.
The Headturn Preference Paradigm was used to examine infants' use of prosodically conditioned acoustic-phonetic cues to find words in speech. Twelve-month-olds were familiarized to one passage containing an intended target (e.g., toga from toga#lore) and one passage containing an unintended target (e.g., dogma from dog#maligns). Infants were tested on the familiarized intended word (e.g., toga), familiarized unintended word (e.g., dogma), and two unfamiliar words. Infants listened longer to familiar intended words than to familiar unintended or unfamiliar words, demonstrating their use of word-level prosodically conditioned cues to segment words from speech. Implications for models of developmental speech perception are discussed.  相似文献   

2.
Detectability of words and nonwords in two kinds of noise   总被引:1,自引:0,他引:1  
Recent models of speech perception emphasize the possibility of interactions among different processing levels. There is evidence that the lexical status of an utterance (i.e., whether it is a meaningful word or not) may influence earlier stages of perceptual analysis. To test how far down such "top-down" influences might penetrate, an investigation was conducted to determine whether there is a difference in detectability of words and nonwords masked by amplitude-modulated or unmodulated broadband noise. The results were negative, suggesting either that the stages of perceptual analysis engaged in the detection task are impermeable to lexical top-down effects, or that the lexical level was not sufficiently activated to have any facilitative effect on perception.  相似文献   

3.
4.
It has been suggested that pauses between words could act as indices of processes such as selection, retrieval or planning that are required before an utterance is articulated. For normal meaningful phrase utterances, there is hardly any information regarding the relationship between articulation and pause duration and their subsequent relation to the final phrase duration. Such associations could provide insights into the mechanisms underlying the planning and execution of a vocal utterance. To execute a fluent vocal utterance, children might adopt different strategies in development. We investigate this hypothesis by examining the roles of articulation time and pause duration in meaningful phrase utterances in 46 children between the ages of 4 and 8 years, learning English as a second language.Our results indicate a significant reduction in phrase, word and interword pause duration with increasing age. A comparison of pause, word and phrase duration for individual subjects belonging to different age groups indicates a changing relationship between pause and word duration for the production of fluent speech. For the youngest children, a strong correlation between pause and word duration indicates local planning at word level for speech production and thus greater dependence of pause on immediate word utterance. In contrast for the oldest children we find a significant drop in correlation between word and pause indicating the emergence of articulation and pause planning as two independent processes directed at producing a fluent utterance. Strong correlations between other temporal parameters indicate a more holistic approach being adopted by the older children for language production.  相似文献   

5.
Three investigations were conducted to determine the application of the articulation index (AI) to the prediction of speech performance of hearing-impaired subjects as well as of normal-hearing listeners. Speech performance was measured in quiet and in the presence of two interfering signals for items from the Speech Perception in Noise test in which target words are either highly predictable from contextual cues in the sentence or essentially contextually neutral. As expected, transfer functions relating the AI to speech performance were different depending on the type of contextual speech material. The AI transfer function for probability-high items rises steeply, much as for sentence materials, while the function for probability-low items rises more slowly, as for monosyllabic words. Different transfer functions were also found for tests conducted in quiet or white noise rather than in a babble background. A majority of the AI predictions for ten individuals with moderate sensorineural loss fell within +/- 2 standard deviations of normal listener performance for both quiet and babble conditions.  相似文献   

6.
Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition.  相似文献   

7.
Word frequency in a document has often been utilized in text searching and summarization. Similarly, identifying frequent words or phrases in a speech data set for searching and summarization would also be meaningful. However, obtaining word frequency in a speech data set is difficult, because frequent words are often special terms in the speech and cannot be recognized by a general speech recognizer. This paper proposes another approach that is effective for automatic extraction of such frequent word sections in a speech data set. The proposed method is applicable to any domain of monologue speech, because no language models or specific terms are required in advance. The extracted sections can be regarded as speech labels of some kind or a digest of the speech presentation. The frequent word sections are determined by detecting similar sections, which are sections of audio data that represent the same word or phrase. The similar sections are detected by an efficient algorithm, called Shift Continuous Dynamic Programming (Shift CDP), which realizes fast matching between arbitrary sections in the reference speech pattern and those in the input speech, and enables frame-synchronous extraction of similar sections. In experiments, the algorithm is applied to extract the repeated sections in oral presentation speeches recorded in academic conferences in Japan. The results show that Shift CDP successfully detects similar sections and identifies the frequent word sections in individual presentation speeches, without prior domain knowledge, such as language models and terms.  相似文献   

8.
The speech signal contains many acoustic properties that may contribute differently to spoken word recognition. Previous studies have demonstrated that the importance of properties present during consonants or vowels is dependent upon the linguistic context (i.e., words versus sentences). The current study investigated three potentially informative acoustic properties that are present during consonants and vowels for monosyllabic words and sentences. Natural variations in fundamental frequency were either flattened or removed. The speech envelope and temporal fine structure were also investigated by limiting the availability of these cues via noisy signal extraction. Thus, this study investigated the contribution of these acoustic properties, present during either consonants or vowels, to overall word and sentence intelligibility. Results demonstrated that all processing conditions displayed better performance for vowel-only sentences. Greater performance with vowel-only sentences remained, despite removing dynamic cues of the fundamental frequency. Word and sentence comparisons suggest that the speech envelope may be at least partially responsible for additional vowel contributions in sentences. Results suggest that speech information transmitted by the envelope is responsible, in part, for greater vowel contributions in sentences, but is not predictive for isolated words.  相似文献   

9.
本文对普通话书面语中声母、韵母的动态与静态分布特性及其差异作了统计分析,结果表明:普通话声母间的、韵母的动态与表态的相对分布关系一致,语音间的相对分布主要与发声系统有关,不受频度的影响。普通话声母、韵母的动态与静态的出现率差异,与声母发音方法和韵线组合结构、声母发音部位与韵母四呼的配合关系、音节的成字率和字的频度有关,主要受送气与不送气声母、韵母的动态与静态的出现率差异最大,多音节词中的韵母的动态  相似文献   

10.
In VCV nonsense forms (such as /epsilondepsilon/, while both the CV transition and the VC transition are perceptible in isolation, the CV transition dominates identification of the stop consonant. Thus, the question arises, what role, if any, do VC transitions play in word perception? Stimuli were two-syllable English words in which the medial consonant was either a stop or a fricative (e.g., "feeding" and "gravy"). Each word was constructed in three ways: (1) the VC transition was incompatible with the CV in either place, manner of articulation, or both; (2) the VC transition was eliminated and the steady-state portion of first vowel was substituted in its place; and (3) the original word. All versions of a particular word were identical with respect to duration, pitch contour, and amplitude envelope. While an intelligibility test revealed no differences among the three conditions, data from a paired comparison preference task and an unspeeded lexical decision task indicated that incompatible VC transitions hindered word perception, but lack of VC transitions did not. However, there were clear differences among the three conditions in the speeded lexical decision task for word stimuli, but not for nonword stimuli that were constructed in an analogous fashion. We discuss the use of lexical tasks for speech quality assessment and possible processes by which listeners recognize spoken words.  相似文献   

11.
This study investigates whether the mora is used in controlling timing in Japanese speech, or is instead a structural unit in the language not involved in timing. Unlike most previous studies of mora-timing in Japanese, this article investigates timing in spontaneous speech. Predictability of word duration from number of moras is found to be much weaker than in careful speech. Furthermore, the number of moras predicts word duration only slightly better than number of segments. Syllable structure also has a significant effect on word duration. Finally, comparison of the predictability of whole words and arbitrarily truncated words shows better predictability for truncated words, which would not be possible if the truncated portion were compensating for remaining moras. The results support an accumulative model of variance with a final lengthening effect, and do not indicate the presence of any compensation related to mora-timing. It is suggested that the rhythm of Japanese derives from several factors about the structure of the language, not from durational compensation.  相似文献   

12.
This study examined whether cochlear implant users must perceive differences along phonetic continua in the same way as do normal hearing listeners (i.e., sharp identification functions, poor within-category sensitivity, high between-category sensitivity) in order to recognize speech accurately. Adult postlingually deafened cochlear implant users, who were heterogeneous in terms of their implants and processing strategies, were tested on two phonetic perception tasks using a synthetic /da/-/ta/ continuum (phoneme identification and discrimination) and two speech recognition tasks using natural recordings from ten talkers (open-set word recognition and forced-choice /d/-/t/ recognition). Cochlear implant users tended to have identification boundaries and sensitivity peaks at voice onset times (VOT) that were longer than found for normal-hearing individuals. Sensitivity peak locations were significantly correlated with individual differences in cochlear implant performance; individuals who had a /d/-/t/ sensitivity peak near normal-hearing peak locations were most accurate at recognizing natural recordings of words and syllables. However, speech recognition was not strongly related to identification boundary locations or to overall levels of discrimination performance. The results suggest that perceptual sensitivity affects speech recognition accuracy, but that many cochlear implant users are able to accurately recognize speech without having typical normal-hearing patterns of phonetic perception.  相似文献   

13.
Noise vocoding was used to investigate the ability of younger and older adults with normal audiometric thresholds in the speech range to use amplitude envelope cues to identify words. In Experiment 1, four 50-word lists were tested, with each word presented initially with one frequency band and the number of bands being incremented until it was correctly identified by the listener. Both age groups required an average of 5.25 bands for 50% correct word identification and performance improved across the four lists. In Experiment 2, the same participants who completed Experiment 1 identified words in four blocked noise-vocoded conditions (16, 8, 4, 2 bands). Compared to Experiment 1, both age groups required more bands to reach the 50% correct word identification threshold in Experiment 2, 6.13, and 8.55 bands, respectively, with younger adults outperforming older adults. Experiment 3 was identical to Experiment 2 except the participants had no prior experience with noise-vocoded speech. Again, younger adults outperformed older adults, with thresholds of 6.67 and 8.97 bands, respectively. The finding of age effects in Experiments 2 and 3, but not in Experiment 1, seems more likely to be related to differences in the presentation methods than to experience with noise vocoding.  相似文献   

14.
Pitch accent in spoken-word recognition in Japanese   总被引:2,自引:0,他引:2  
Three experiments addressed the question of whether pitch-accent information may be exploited in the process of recognizing spoken words in Tokyo Japanese. In a two-choice classification task, listeners judged from which of two words, differing in accentual structure, isolated syllables had been extracted (e.g., ka from baka HL or gaka LH); most judgments were correct, and listeners' decisions were correlated with the fundamental frequency characteristics of the syllables. In a gating experiment, listeners heard initial fragments of words and guessed what the words were; their guesses overwhelmingly had the same initial accent structure as the gated word even when only the beginning CV of the stimulus (e.g., na- from nagasa HLL or nagashi LHH) was presented. In addition, listeners were more confident in guesses with the same initial accent structure as the stimulus than in guesses with different accent. In a lexical decision experiment, responses to spoken words (e.g., ame HL) were speeded by previous presentation of the same word (e.g., ame HL) but not by previous presentation of a word differing only in accent (e.g., ame LH). Together these findings provide strong evidence that accentual information constrains the activation and selection of candidates for spoken-word recognition.  相似文献   

15.
Previous research shows that listeners are sensitive to talker differences in phonetic properties of speech, including voice-onset-time (VOT) in word-initial voiceless stop consonants, and that learning how a talker produces one voiceless stop transfers to another word with the same voiceless stop [Allen, J. S., and Miller, J. L. (2004). J. Acoust. Soc. Am. 115, 3171-3183]. The present experiments examined whether transfer extends to words that begin with different voiceless stops. During training, listeners heard two talkers produce a given voiceless-initial word (e.g., pain). VOTs were manipulated such that one talker produced the voiceless stop with relatively short VOTs and the other with relatively long VOTs. At test, listeners heard a short- and long-VOT variant of the same word (e.g., pain) or a word beginning with a different voiceless stop (e.g., cane or coal), and were asked to select which of the two VOT variants was most representative of a given talker. In all conditions, which variant was selected at test was in line with listeners' exposure during training, and the effect was equally strong for the novel word and the training word. These findings suggest that accommodating talker-specific phonetic detail does not require exposure to each individual phonetic segment.  相似文献   

16.
Three experiments were conducted to examine the effects of trial-to-trial variations in speaking style, fundamental frequency, and speaking rate on identification of spoken words. In addition, the experiments investigated whether any effects of stimulus variability would be modulated by phonetic confusability (i.e., lexical difficulty). In Experiment 1, trial-to-trial variations in speaking style reduced the overall identification performance compared with conditions containing no speaking-style variability. In addition, the effects of variability were greater for phonetically confusable words than for phonetically distinct words. In Experiment 2, variations in fundamental frequency were found to have no significant effects on spoken word identification and did not interact with lexical difficulty. In Experiment 3, two different methods for varying speaking rate were found to have equivalent negative effects on spoken word recognition and similar interactions with lexical difficulty. Overall, the findings are consistent with a phonetic-relevance hypothesis, in which accommodating sources of acoustic-phonetic variability that affect phonetically relevant properties of speech signals can impair spoken word identification. In contrast, variability in parameters of the speech signal that do not affect phonetically relevant properties are not expected to affect overall identification performance. Implications of these findings for the nature and development of lexical representations are discussed.  相似文献   

17.
This study investigates the effects of lexical frequency on the durational reduction of morphologically complex words in spoken Dutch. The hypothesis that high-frequency words are more reduced than low-frequency words was tested by comparing the durations of affixes occurring in different carrier words. Four Dutch affixes were investigated, each occurring in a large number of words with different frequencies. The materials came from a large database of face-to-face conversations. For each word containing a target affix, one token was randomly selected for acoustic analysis. Measurements were made of the duration of the affix as a whole and the durations of the individual segments in the affix. For three of the four affixes, a higher frequency of the carrier word led to shorter realizations of the affix as a whole, individual segments in the affix, or both. Other relevant factors were the sex and age of the speaker, segmental context, and speech rate. To accommodate for these findings, models of speech production should allow word frequency to affect the acoustic realizations of lower-level units, such as individual speech sounds occurring in affixes.  相似文献   

18.
This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps: (1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of "articulator-bound" features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.  相似文献   

19.
Can native listeners rapidly adapt to suprasegmental mispronunciations in foreign-accented speech? To address this question, an exposure-test paradigm was used to test whether Dutch listeners can improve their understanding of non-canonical lexical stress in Hungarian-accented Dutch. During exposure, one group of listeners heard a Dutch story with only initially stressed words, whereas another group also heard 28 words with canonical second-syllable stress (e.g., EEKhorn, "squirrel" was replaced by koNIJN "rabbit"; capitals indicate stress). The 28 words, however, were non-canonically marked by the Hungarian speaker with high pitch and amplitude on the initial syllable, both of which are stress cues in Dutch. After exposure, listeners' eye movements were tracked to Dutch target-competitor pairs with segmental overlap but different stress patterns, while they listened to new words from the same Hungarian speaker (e.g., HERsens, herSTEL, "brain," "recovery"). Listeners who had previously heard non-canonically produced words distinguished target-competitor pairs better than listeners who had only been exposed to Hungarian accent with canonical forms of lexical stress. Even a short exposure thus allows listeners to tune into speaker-specific realizations of words' suprasegmental make-up, and use this information for word recognition.  相似文献   

20.
An eye-tracking experiment examined contextual flexibility in speech processing in response to distortions in spoken input. Dutch participants heard Dutch sentences containing critical words and saw four-picture displays. The name of one picture either had the same onset phonemes as the critical word or had a different first phoneme and rhymed. Participants fixated on onset-overlap more than rhyme-overlap pictures, but this tendency varied with speech quality. Relative to a baseline with noise-free sentences, participants looked less at onset-overlap and more at rhyme-overlap pictures when phonemes in the sentences (but not in the critical words) were replaced by noises like those heard on a badly tuned AM radio. The position of the noises (word-initial or word-medial) had no effect. Noises elsewhere in the sentences apparently made evidence about the critical word less reliable: Listeners became less confident of having heard the onset-overlap name but also less sure of having not heard the rhyme-overlap name. The same acoustic information has different effects on spoken-word recognition as the probability of distortion changes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号