期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Listener sensitivity to individual talker differences in voice-onset-time

Allen JS Miller JL 《The Journal of the Acoustical Society of America》2004,115(6):3171-3183

Recent findings in the domains of word and talker recognition reveal that listeners use previous experience with an individual talker's voice to facilitate subsequent perceptual processing of that talker's speech. These findings raise the possibility that listeners are sensitive to talker-specific acoustic-phonetic properties. The present study tested this possibility directly by examining listeners' sensitivity to talker differences in the voice-onset-time (VOT) associated with a word-initial voiceless stop consonant. Listeners were trained on the speech of two talkers. Speech synthesis was used to manipulate the VOTs of these talkers so that one had short VOTs and the other had long VOTs (counterbalanced across listeners). The results of two experiments using a paired-comparison task revealed that, when presented with a short- versus long-VOT variant of a given talker's speech, listeners could select the variant consistent with their experience of that talker's speech during training. This was true when listeners were tested on the same word heard during training and when they were tested on a different word spoken by the same talker, indicating that listeners generalized talker-specific VOT information to a novel word. Such sensitivity to talker-specific acoustic-phonetic properties may subserve at least in part listeners' capacity to benefit from talker-specific experience. 相似文献

2.

Effects of cross-language voice training on speech perception: whose familiar voices are more intelligible?

Levi SV Winters SJ Pisoni DB 《The Journal of the Acoustical Society of America》2011,130(6):4053-4062

Previous research has shown that familiarity with a talker's voice can improve linguistic processing (herein, "Familiar Talker Advantage"), but this benefit is constrained by the context in which the talker's voice is familiar. The current study examined how familiarity affects intelligibility by manipulating the type of talker information available to listeners. One group of listeners learned to identify bilingual talkers' voices from English words, where they learned language-specific talker information. A second group of listeners learned the same talkers from German words, and thus only learned language-independent talker information. After voice training, both groups of listeners completed a word recognition task with English words produced by both familiar and unfamiliar talkers. Results revealed that English-trained listeners perceived more phonemes correct for familiar than unfamiliar talkers, while German-trained listeners did not show improved intelligibility for familiar talkers. The absence of a processing advantage in speech intelligibility for the German-trained listeners demonstrates limitations on the Familiar Talker Advantage, which crucially depends on the language context in which the talkers' voices were learned; knowledge of how a talker produces linguistically relevant contrasts in a particular language is necessary to increase speech intelligibility for words produced by familiar talkers. 相似文献

3.

Extrinsic context affects perceptual normalization of lexical tone

Francis AL Ciocca V Wong NK Leung WH Chu PC 《The Journal of the Acoustical Society of America》2006,119(3):1712-1726

The present study explores the use of extrinsic context in perceptual normalization for the purpose of identifying lexical tones in Cantonese. In each of four experiments, listeners were presented with a target word embedded in a semantically neutral sentential context. The target word was produced with a mid level tone and it was never modified throughout the study, but on any given trial the fundamental frequency of part or all of the context sentence was raised or lowered to varying degrees. The effect of perceptual normalization of tone was quantified as the proportion of non-mid level responses given in F0-shifted contexts. Results showed that listeners' tonal judgments (i) were proportional to the degree of frequency shift, (ii) were not affected by non-pitch-related differences in talker, (iii) and were affected by the frequency of both the preceding and following context, although (iv) following context affected tonal decisions more strongly than did preceding context. These findings suggest that perceptual normalization of lexical tone may involve a "moving window" or "running average" type of mechanism, that selectively weights more recent pitch information over older information, but does not depend on the perception of a single voice. 相似文献

4.

Effects of fundamental frequency and vocal-tract length changes on attention to one of two simultaneous talkers

Darwin CJ Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2003,114(5):2913-2922

Three experiments used the Coordinated Response Measure task to examine the roles that differences in F0 and differences in vocal-tract length have on the ability to attend to one of two simultaneous speech signals. The first experiment asked how increases in the natural F0 difference between two sentences (originally spoken by the same talker) affected listeners' ability to attend to one of the sentences. The second experiment used differences in vocal-tract length, and the third used both F0 and vocal-tract length differences. Differences in F0 greater than 2 semitones produced systematic improvements in performance. Differences in vocal-tract length produced systematic improvements in performance when the ratio of lengths was 1.08 or greater, particularly when the shorter vocal tract belonged to the target talker. Neither of these manipulations produced improvements in performance as great as those produced by a different-sex talker. Systematic changes in both F0 and vocal-tract length that simulated an incremental shift in gender produced substantially larger improvements in performance than did differences in F0 or vocal-tract length alone. In general, shifting one of two utterances spoken by a female voice towards a male voice produces a greater improvement in performance than shifting male towards female. The increase in performance varied with the intonation patterns of individual talkers, being smallest for those talkers who showed most variability in their intonation patterns between different utterances. 相似文献

5.

Specification of cross-modal source information in isolated kinematic displays of speech

Lachs L Pisoni DB 《The Journal of the Acoustical Society of America》2004,116(1):507-518

Information about the acoustic properties of a talker's voice is available in optical displays of speech, and vice versa, as evidenced by perceivers' ability to match faces and voices based on vocal identity. The present investigation used point-light displays (PLDs) of visual speech and sinewave replicas of auditory speech in a cross-modal matching task to assess perceivers' ability to match faces and voices under conditions when only isolated kinematic information about vocal tract articulation was available. These stimuli were also used in a word recognition experiment under auditory-alone and audiovisual conditions. The results showed that isolated kinematic displays provide enough information to match the source of an utterance across sensory modalities. Furthermore, isolated kinematic displays can be integrated to yield better word recognition performance under audiovisual conditions than under auditory-alone conditions. The results are discussed in terms of their implications for describing the nature of speech information and current theories of speech perception and spoken word recognition. 相似文献

6.

Temporal scales of auditory objects underlying birdsong vocal recognition

Gentner TQ 《The Journal of the Acoustical Society of America》2008,124(2):1350-1359

Vocal recognition is common among songbirds, and provides an excellent model system to study the perceptual and neurobiological mechanisms for processing natural vocal communication signals. Male European starlings, a species of songbird, learn to recognize the songs of multiple conspecific males by attending to stereotyped acoustic patterns, and these learned patterns elicit selective neuronal responses in auditory forebrain neurons. The present study investigates the perceptual grouping of spectrotemporal acoustic patterns in starling song at multiple temporal scales. The results show that permutations in sequencing of submotif acoustic features have significant effects on song recognition, and that these effects are specific to songs that comprise learned motifs. The observations suggest that (1) motifs form auditory objects embedded in a hierarchy of acoustic patterns, (2) that object-based song perception emerges without explicit reinforcement, and (3) that multiple temporal scales within the acoustic pattern hierarchy convey information about the individual identity of the singer. The authors discuss the results in the context of auditory object formation and talker recognition. 相似文献

7.

Spatial release from energetic and informational masking in a divided speech identification task

Ihlefeld A Shinn-Cunningham B 《The Journal of the Acoustical Society of America》2008,123(6):4380-4392

When listening selectively to one talker in a two-talker environment, performance generally improves with spatial separation of the sources. The current study explores the role of spatial separation in divided listening, when listeners reported both of two simultaneous messages processed to have little spectral overlap (limiting "energetic masking" between the messages). One message was presented at a fixed level, while the other message level varied from equal to 40 dB less than that of the fixed-level message. Results demonstrate that spatial separation of the competing messages improved divided-listening performance. Most errors occurred because listeners failed to report the content of the less-intense talker. Moreover, performance generally improved as the broadband energy ratio of the variable-level to the fixed-level talker increased. The error patterns suggest that spatial separation improves the intelligibility of the less-intense talker by improving the ability to (1) hear portions of the signal that would otherwise be masked, (2) segregate the two talkers properly into separate perceptual streams, and (3) selectively focus attention on the less-intense talker. Spatial configuration did not noticeably affect the ability to report the more-intense talker, suggesting that it was processed differently than the less-intense talker, which was actively attended. 相似文献

8.

Effect of stimulus bandwidth on the perception of /s/ in normal- and hearing-impaired children and adults.

P G Stelmachowicz A L Pittman B M Hoover D E Lewis 《The Journal of the Acoustical Society of America》2001,110(4):2183-2190

Recent studies with adults have suggested that amplification at 4 kHz and above fails to improve speech recognition and may even degrade performance when high-frequency thresholds exceed 50-60 dB HL. This study examined the extent to which high frequencies can provide useful information for fricative perception for normal-hearing and hearing-impaired children and adults. Eighty subjects (20 per group) participated. Nonsense syllables containing the phonemes /s/, /f/, and /O/, produced by a male, female, and child talker, were low-pass filtered at 2, 3, 4, 5, 6, and 9 kHz. Frequency shaping was provided for the hearing-impaired subjects only. Results revealed significant differences in recognition between the four groups of subjects. Specifically, both groups of children performed more poorly than their adult counterparts at similar bandwidths. Likewise, both hearing-impaired groups performed more poorly than their normal-hearing counterparts. In addition, significant talker effects for /s/ were observed. For the male talker, optimum performance was reached at a bandwidth of approximately 4-5 kHz, whereas optimum performance for the female and child talkers did not occur until a bandwidth of 9 kHz. 相似文献

9.

Sizing up the competition: quantifying the influence of the mental lexicon on auditory and visual spoken word recognition

Strand JF Sommers MS 《The Journal of the Acoustical Society of America》2011,130(3):1663-1672

Much research has explored how spoken word recognition is influenced by the architecture and dynamics of the mental lexicon (e.g., Luce and Pisoni, 1998; McClelland and Elman, 1986). A more recent question is whether the processes underlying word recognition are unique to the auditory domain, or whether visually perceived (lipread) speech may also be sensitive to the structure of the mental lexicon (Auer, 2002; Mattys, Bernstein, and Auer, 2002). The current research was designed to test the hypothesis that both aurally and visually perceived spoken words are isolated in the mental lexicon as a function of their modality-specific perceptual similarity to other words. Lexical competition (the extent to which perceptually similar words influence recognition of a stimulus word) was quantified using metrics that are well-established in the literature, as well as a statistical method for calculating perceptual confusability based on the phi-square statistic. Both auditory and visual spoken word recognition were influenced by modality-specific lexical competition as well as stimulus word frequency. These findings extend the scope of activation-competition models of spoken word recognition and reinforce the hypothesis (Auer, 2002; Mattys et al., 2002) that perceptual and cognitive properties underlying spoken word recognition are not specific to the auditory domain. In addition, the results support the use of the phi-square statistic as a better predictor of lexical competition than metrics currently used in models of spoken word recognition. 相似文献

10.

The role of perceived spatial separation in the unmasking of speech 总被引：12，自引：0，他引：12

Freyman RL Helfer KS McCall DD Clifton RK 《The Journal of the Acoustical Society of America》1999,106(6):3578-3588

Spatial separation of speech and noise in an anechoic space creates a release from masking that often improves speech intelligibility. However, the masking release is severely reduced in reverberant spaces. This study investigated whether the distinct and separate localization of speech and interference provides any perceptual advantage that, due to the precedence effect, is not degraded by reflections. Listeners' identification of nonsense sentences spoken by a female talker was measured in the presence of either speech-spectrum noise or other sentences spoken by a second female talker. Target and interference stimuli were presented in an anechoic chamber from loudspeakers directly in front and 60 degrees to the right in single-source and precedence-effect (lead-lag) conditions. For speech-spectrum noise, the spatial separation advantage for speech recognition (8 dB) was predictable from articulation index computations based on measured release from masking for narrow-band stimuli. The spatial separation advantage was only 1 dB in the lead-lag condition, despite the fact that a large perceptual separation was produced by the precedence effect. For the female talker interference, a much larger advantage occurred, apparently because informational masking was reduced by differences in perceived locations of target and interference. 相似文献

11.

Cochlear implant speech recognition with speech maskers

Stickney GS Zeng FG Litovsky R Assmann P 《The Journal of the Acoustical Society of America》2004,116(2):1081-1091

Speech recognition performance was measured in normal-hearing and cochlear-implant listeners with maskers consisting of either steady-state speech-spectrum-shaped noise or a competing sentence. Target sentences from a male talker were presented in the presence of one of three competing talkers (same male, different male, or female) or speech-spectrum-shaped noise generated from this talker at several target-to-masker ratios. For the normal-hearing listeners, target-masker combinations were processed through a noise-excited vocoder designed to simulate a cochlear implant. With unprocessed stimuli, a normal-hearing control group maintained high levels of intelligibility down to target-to-masker ratios as low as 0 dB and showed a release from masking, producing better performance with single-talker maskers than with steady-state noise. In contrast, no masking release was observed in either implant or normal-hearing subjects listening through an implant simulation. The performance of the simulation and implant groups did not improve when the single-talker masker was a different talker compared to the same talker as the target speech, as was found in the normal-hearing control. These results are interpreted as evidence for a significant role of informational masking and modulation interference in cochlear implant speech recognition with fluctuating maskers. This informational masking may originate from increased target-masker similarity when spectral resolution is reduced. 相似文献

12.

The relative roles of vowels and consonants in discriminating talker identity versus word meaning

Owren MJ Cardillo GC 《The Journal of the Acoustical Society of America》2006,119(3):1727-1739

Three experiments tested the hypothesis that vowels play a disproportionate role in hearing talker identity, while consonants are more important in perceiving word meaning. In each study, listeners heard 128 stimuli consisting of two different words. Stimuli were balanced for same/different meaning, same/different talker, and male/female talker. The first word in each was intact, while the second was either intact (Experiment 1), or had vowels ("Consonants-Only") or consonants wels-Only") replaced by silence (Experiments 2, 3). Different listeners performed a same/ different judgment of either talker identity (Talker) or word meaning (Meaning). Baseline testing in Experiment 1 showed above-chance performance in both, with greater accuracy for Meaning. In Experiment 2, Talker identity was more accurately judged from Vowels-Only stimuli, with modestly better overall Meaning performance with Consonants-Only stimuli. However, performance with vowel-initial Vowels-Only stimuli in particular was most accurate of all. Editing Vowels-Only stimuli further in Experiment 3 had no effect on Talker discrimination, while dramatically reducing accuracy in the Meaning condition, including both vowel-initial and consonant-initial Vowels-Only stimuli. Overall, results confirmed a priori predictions, but are largely inconsistent with recent tests of vowels and consonants in sentence comprehension. These discrepancies and possible implications for the evolutionary origins of speech are discussed. 相似文献

13.

Evaluating the function of phonetic perceptual phenomena within speech recognition: an examination of the perception of /d/-/t/ by adult cochlear implant users

Iverson P 《The Journal of the Acoustical Society of America》2003,113(2):1056-1064

This study examined whether cochlear implant users must perceive differences along phonetic continua in the same way as do normal hearing listeners (i.e., sharp identification functions, poor within-category sensitivity, high between-category sensitivity) in order to recognize speech accurately. Adult postlingually deafened cochlear implant users, who were heterogeneous in terms of their implants and processing strategies, were tested on two phonetic perception tasks using a synthetic /da/-/ta/ continuum (phoneme identification and discrimination) and two speech recognition tasks using natural recordings from ten talkers (open-set word recognition and forced-choice /d/-/t/ recognition). Cochlear implant users tended to have identification boundaries and sensitivity peaks at voice onset times (VOT) that were longer than found for normal-hearing individuals. Sensitivity peak locations were significantly correlated with individual differences in cochlear implant performance; individuals who had a /d/-/t/ sensitivity peak near normal-hearing peak locations were most accurate at recognizing natural recordings of words and syllables. However, speech recognition was not strongly related to identification boundary locations or to overall levels of discrimination performance. The results suggest that perceptual sensitivity affects speech recognition accuracy, but that many cochlear implant users are able to accurately recognize speech without having typical normal-hearing patterns of phonetic perception. 相似文献

14.

Multiple routes to the perceptual learning of speech

Loebach JL Bent T Pisoni DB 《The Journal of the Acoustical Society of America》2008,124(1):552-561

相似文献

15.

Sine-wave speech recognition in a tonal language

Feng YM Xu L Zhou N Yang G Yin SK 《The Journal of the Acoustical Society of America》2012,131(2):EL133-EL138

It is hypothesized that in sine-wave replicas of natural speech, lexical tone recognition would be severely impaired due to the loss of F0 information, but the linguistic information at the sentence level could be retrieved even with limited tone information. Forty-one native Mandarin-Chinese-speaking listeners participated in the experiments. Results showed that sine-wave tone-recognition performance was on average only 32.7% correct. However, sine-wave sentence-recognition performance was very accurate, approximately 92% correct on average. Therefore the functional load of lexical tones on sentence recognition is limited, and the high-level recognition of sine-wave sentences is likely attributed to the perceptual organization that is influenced by top-down processes. 相似文献

16.

Effect of target-masker similarity on across-ear interference in a dichotic cocktail-party listening task

Brungart DS Simpson BD 《The Journal of the Acoustical Society of America》2007,122(3):1724

Similarity between the target and masking voices is known to have a strong influence on performance in monaural and binaural selective attention tasks, but little is known about the role it might play in dichotic listening tasks with a target signal and one masking voice in the one ear and a second independent masking voice in the opposite ear. This experiment examined performance in a dichotic listening task with a target talker in one ear and same-talker, same-sex, or different-sex maskers in both the target and the unattended ears. The results indicate that listeners were most susceptible to across-ear interference with a different-sex within-ear masker and least susceptible with a same-talker within-ear masker, suggesting that the amount of across-ear interference cannot be predicted from the difficulty of selectively attending to the within-ear masking voice. The results also show that the amount of across-ear interference consistently increases when the across-ear masking voice is more similar to the target speech than the within-ear masking voice is, but that no corresponding decline in across-ear interference occurs when the across-ear voice is less similar to the target than the within-ear voice. These results are consistent with an "integrated strategy" model of speech perception where the listener chooses a segregation strategy based on the characteristics of the masker present in the target ear and the amount of across-ear interference is determined by the extent to which this strategy can also effectively be used to suppress the masker in the unattended ear. 相似文献

17.

Vongpaisal T Trehub SE Glenn Schellenberg E van Lieshout P 《The Journal of the Acoustical Society of America》2012,131(1):501-508

Temporal information provided by cochlear implants enables successful speech perception in quiet, but limited spectral information precludes comparable success in voice perception. Talker identification and speech decoding by young hearing children (5-7 yr), older hearing children (10-12 yr), and hearing adults were examined by means of vocoder simulations of cochlear implant processing. In Experiment 1, listeners heard vocoder simulations of sentences from a man, woman, and girl and were required to identify the talker from a closed set. Younger children identified talkers more poorly than older listeners, but all age groups showed similar benefit from increased spectral information. In Experiment 2, children and adults provided verbatim repetition of vocoded sentences from the same talkers. The youngest children had more difficulty than older listeners, but all age groups showed comparable benefit from increasing spectral resolution. At comparable levels of spectral degradation, performance on the open-set task of speech decoding was considerably more accurate than on the closed-set task of talker identification. Hearing children's ability to identify talkers and decode speech from spectrally degraded material sheds light on the difficulty of these domains for child implant users. 相似文献

18.

Vocal aging and the impact on daily life: a longitudinal study

Irma M. Verdonck-de Leeuw Hans F. Mahieu 《Journal of voice》2004,18(2):193-202

Longitudinal studies on vocal aging are scarce, and information on the impact of age-related voice changes on daily life is lacking. This longitudinal study reports on age-related voice changes and the impact on daily life over a time period of 5 years on 11 healthy male speakers, age ranging from 50 to 81 years. All males completed a questionnaire on vocal performance in daily life, and perceptual and acoustical analyses of vocal quality and analyses of maximum performance tasks of vocal function (voice range profile) were performed. Results showed a significant deterioration of the acoustic voice signal as well as increased ratings on vocal roughness judged by experts after the time period of 5 years. An increase of self-reported voice instability and the tendency to avoid social parties supported these findings. Smoking males had a lower speaking fundamental frequency compared with nonsmoking males, and this seemed reversible for males who stop smoking. This study suggests a normal gradual vocal aging process with clear consequences in daily life, which should be taken into consideration in clinical practice as well as in studies concerning communication in social life. 相似文献

19.

Acoustic Measures of Symptoms in Abductor Spasmodic Dysphonia

Julia D. Edgar Christine M. Sapienza Kimberly Bidus Christy L. Ludlow 《Journal of voice》2001,15(3):362

Speech of patients with abductor spasmodic dysphonia (ABSD) was analyzed using acoustic analyses to determine: (1) which acoustic measures differed from controls and were independent factors representing patients' voice control difficulties, and (2) whether acoustic measures related to blinded perceptual counts of the symptom frequency in the same patients. Patients' voice onset time for voiceless consonants in speech were significantly longer than the controls (p = 0.015). A principle components analysis identified three factors that accounted for 95% of the variance: the first factor included sentence and word duration, frequency shifts, and aperiodic instances; the second was phonatory breaks; and the third was voice onset time. Significant relationships with perceptual counts of symptoms were found for the measures of acoustic disruptions in sentences and sentence duration. Finally, a multiple regression demonstrated that the acoustic measures related well with the perceptual counts (r² = 0.84) with word duration most highly related and none of the other measures contributing once the effect of word duration was partialed out. The results indicate that some of the voice motor control deficits, namely aperiodicity, phonatory breaks, and frequency shifts, which occur in patients with ABSD, are similar to those previously found in adductor spasmodic dysphonia. Results also indicate that acoustic measures of intermittent disruptions in speech, voice onset time, and speech duration are closely related to the perception of symptom frequency in the disorder. 相似文献

20.

On the perception of similarity among talkers

Remez RE Fellowes JM Nagel DS 《The Journal of the Acoustical Society of America》2007,122(6):3688-3696

A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers. 相似文献