期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

The processing and perception of size information in speech sounds

Smith DR Patterson RD Turner R Kawahara H Irino T 《The Journal of the Acoustical Society of America》2005,117(1):305-318

There is information in speech sounds about the length of the vocal tract; specifically, as a child grows, the resonators in the vocal tract grow and the formant frequencies of the vowels decrease. It has been hypothesized that the auditory system applies a scale transform to all sounds to segregate size information from resonator shape information, and thereby enhance both size perception and speech recognition [Irino and Patterson, Speech Commun. 36, 181-203 (2002)]. This paper describes size discrimination experiments and vowel recognition experiments designed to provide evidence for an auditory scaling mechanism. Vowels were scaled to represent people with vocal tracts much longer and shorter than normal, and with pitches much higher and lower than normal. The results of the discrimination experiments show that listeners can make fine judgments about the relative size of speakers, and they can do so for vowels scaled well beyond the normal range. Similarly, the recognition experiments show good performance for vowels in the normal range, and for vowels scaled well beyond the normal range of experience. Together, the experiments support the hypothesis that the auditory system automatically normalizes for the size information in communication sounds. 相似文献

2.

Discrimination of speaker size from syllable phrases

Ives DT Smith DR Patterson RD 《The Journal of the Acoustical Society of America》2005,118(6):3816-3822

The length of the vocal tract is correlated with speaker size and, so, speech sounds have information about the size of the speaker in a form that is interpretable by the listener. A wide range of different vocal tract lengths exist in the population and humans are able to distinguish speaker size from the speech. Smith et al. [J. Acoust. Soc. Am. 117, 305-318 (2005)] presented vowel sounds to listeners and showed that the ability to discriminate speaker size extends beyond the normal range of speaker sizes which suggests that information about the size and shape of the vocal tract is segregated automatically at an early stage in the processing. This paper reports an extension of the size discrimination research using a much larger set of speech sounds, namely, 180 consonant-vowel and vowel-consonant syllables. Despite the pronounced increase in stimulus variability, there was actually an improvement in discrimination performance over that supported by vowel sounds alone. Performance with vowel-consonant syllables was slightly better than with consonant-vowel syllables. These results support the hypothesis that information about the length of the vocal tract is segregated at an early stage in auditory processing. 相似文献

3.

On the perception of similarity among talkers

Remez RE Fellowes JM Nagel DS 《The Journal of the Acoustical Society of America》2007,122(6):3688-3696

A listener who recognizes a talker notices characteristic attributes of the talker's speech despite the novelty of each utterance. Accounts of talker perception have often presumed that consistent aspects of an individual's speech, termed indexical properties, are ascribable to a talker's unique anatomy or consistent vocal posture distinct from acoustic correlates of phonetic contrasts. Accordingly, the perception of a talker is acknowledged to occur independently of the perception of a linguistic message. Alternatively, some studies suggest that attention to attributes of a talker includes indexical linguistic attributes conveyed in the articulation of consonants and vowels. This investigation sought direct evidence of attention to phonetic attributes of speech in perceiving talkers. Natural samples and sinewave replicas derived from them were used in three experiments assessing the perceptual properties of natural and sine-wave sentences; of temporally veridical and reversed natural and sine-wave sentences; and of an acoustic correlate of vocal tract scale to judgments of sine-wave talker similarity. The results revealed that the subjective similarity of individual talkers is preserved in the absence of natural vocal quality; and that local phonetic segmental attributes as well as global characteristics of speech can be exploited when listeners notice characteristics of talkers. 相似文献

4.

The mutual roles of temporal glimpsing and vocal characteristics in cocktail-party listening

Vestergaard MD Fyson NR Patterson RD 《The Journal of the Acoustical Society of America》2011,130(1):429-439

At a cocktail party, listeners must attend selectively to a target speaker and segregate their speech from distracting speech sounds uttered by other speakers. To solve this task, listeners can draw on a variety of vocal, spatial, and temporal cues. Recently, Vestergaard et al. [J. Acoust. Soc. Am. 125, 1114-1124 (2009)] developed a concurrent-syllable task to control temporal glimpsing within segments of concurrent speech, and this allowed them to measure the interaction of glottal pulse rate and vocal tract length and reveal how the auditory system integrates information from independent acoustic modalities to enhance recognition. The current paper shows how the interaction of these acoustic cues evolves as the temporal overlap of syllables is varied. Temporal glimpses as short as 25 ms are observed to improve syllable recognition substantially when the target and distracter have similar vocal characteristics, but not when they are dissimilar. The effect of temporal glimpsing on recognition performance is strongly affected by the form of the syllable (consonant-vowel versus vowel-consonant), but it is independent of other phonetic features such as place and manner of articulation. 相似文献

5.

The effects of production and presentation level on the auditory distance perception of speech

Brungart DS Scott KR 《The Journal of the Acoustical Society of America》2001,110(1):425-440

Although both perceived vocal effort and intensity are known to influence the perceived distance of speech, little is known about the processes listeners use to integrate these two parameters into a single estimate of talker distance. In this series of experiments, listeners judged the distances of prerecorded speech samples presented over headphones in a large open field. In the first experiment, virtual synthesis techniques were used to simulate speech signals produced by a live talker at distances ranging from 0.25 to 64 m. In the second experiment, listeners judged the apparent distances of speech stimuli produced over a 60-dB range of different vocal effort levels (production levels) and presented over a 34-dB range of different intensities (presentation levels). In the third experiment, the listeners judged the distances of time-reversed speech samples. The results indicate that production level and presentation level influence distance perception differently for each of three distinct categories of speech. When the stimulus was high-level voiced speech (produced above 66 dB SPL 1 m from the talker's mouth), the distance judgments doubled with each 8-dB increase in production level and each 12-dB decrease in presentation level. When the stimulus was low-level voiced speech (produced at or below 66 dB SPL at 1 m), the distance judgments doubled with each 15-dB increase in production level but were relatively insensitive to changes in presentation level at all but the highest intensity levels tested. When the stimulus was whispered speech, the distance judgments were unaffected by changes in production level and only decreased with increasing presentation level when the intensity of the stimulus exceeded 66 dB SPL. The distance judgments obtained in these experiments were consistent across a range of different talkers, listeners, and utterances, suggesting that voice-based distance cueing could provide a robust way to control the apparent distances of speech sounds in virtual audio displays. 相似文献

6.

The synergy between speech production and perception

Ru P Chi T Shamma S 《The Journal of the Acoustical Society of America》2003,113(1):498-515

Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed. 相似文献

7.

Acoustic analysis of trill sounds

Dhananjaya N Yegnanarayana B Bhaskararao P 《The Journal of the Acoustical Society of America》2012,131(4):3141-3152

In this paper, the acoustic-phonetic characteristics of steady apical trills--trill sounds produced by the periodic vibration of the apex of the tongue--are studied. Signal processing methods, namely, zero-frequency filtering and zero-time liftering of speech signals, are used to analyze the excitation source and the resonance characteristics of the vocal tract system, respectively. Although it is natural to expect the effect of trilling on the resonances of the vocal tract system, it is interesting to note that trilling influences the glottal source of excitation as well. The excitation characteristics derived using zero-frequency filtering of speech signals are glottal epochs, strength of impulses at the glottal epochs, and instantaneous fundamental frequency of the glottal vibration. Analysis based on zero-time liftering of speech signals is used to study the dynamic resonance characteristics of vocal tract system during the production of trill sounds. Qualitative analysis of trill sounds in different vowel contexts, and the acoustic cues that may help spotting trills in continuous speech are discussed. 相似文献

8.

Musical background not associated with self-perceived hearing performance or speech perception in postlingual cochlear-implant users

C Fuller R Free B Maat D Başkent 《The Journal of the Acoustical Society of America》2012,132(2):1009-1016

In normal-hearing listeners, musical background has been observed to change the sound representation in the auditory system and produce enhanced performance in some speech perception tests. Based on these observations, it has been hypothesized that musical background can influence sound and speech perception, and as an extension also the quality of life, by cochlear-implant users. To test this hypothesis, this study explored musical background [using the Dutch Musical Background Questionnaire (DMBQ)], and self-perceived sound and speech perception and quality of life [using the Nijmegen Cochlear Implant Questionnaire (NCIQ) and the Speech Spatial and Qualities of Hearing Scale (SSQ)] in 98 postlingually deafened adult cochlear-implant recipients. In addition to self-perceived measures, speech perception scores (percentage of phonemes recognized in words presented in quiet) were obtained from patient records. The self-perceived hearing performance was associated with the objective speech perception. Forty-one respondents (44% of 94 respondents) indicated some form of formal musical training. Fifteen respondents (18% of 83 respondents) judged themselves as having musical training, experience, and knowledge. No association was observed between musical background (quantified by DMBQ), and self-perceived hearing-related performance or quality of life (quantified by NCIQ and SSQ), or speech perception in quiet. 相似文献

9.

Categorical perception of conspecific communication sounds by Japanese macaques, Macaca fuscata

B May D B Moody W C Stebbins 《The Journal of the Acoustical Society of America》1989,85(2):837-847

Field studies indicate that Japanese macaque (Macaca fuscata) communication signals vary with the social situation in which they occur [S. Green, "Variation of vocal pattern with social situation in the Japanese monkey (Macaca fuscata): A field study," in Primate Behavior, edited by L. A. Rosenblum (Academic, New York, 1975), Vol. 4]. A significant acoustic property of the contact calls produced by these primates is the temporal position of a frequency peak within the vocalization, that is, an inflection from rising to falling frequency [May et al., "Significant features of Japanese macaque communication sounds: A psychophysical study," Anim. Behav. 36, 1432-1444 (1988)]. The experiments reported here are based on the hypothesis that Japanese macaques derive meaning from this temporally graded feature by parceling the acoustic variation inherent in natural contact calls into two functional categories, and thus exhibit behavior that is analogous to the categorical perception of speech sounds by humans. To test this hypothesis, Japanese macaques were trained to classify natural contact calls by performing operant responses that signified either an early or late frequency peak position. Then, the subjects were tested in a series of experiments that required them to generalize this behavior to synthetic calls representing a continuum of peak positions. Demonstration of the classical perceptual effects noted for human listeners suggests that categorical perception reflects a principle of auditory information processing that influences the perception of sounds in the communication systems not only of humans, but of animals as well. 相似文献

10.

Detection of random alterations to time-varying musical instrument spectra

Horner A Beauchamp J So R 《The Journal of the Acoustical Society of America》2004,116(3):1800-1810

The time-varying spectra of eight musical instrument sounds were randomly altered by a time-invariant process to determine how detection of spectral alteration varies with degree of alteration, instrument, musical experience, and spectral variation. Sounds were resynthesized with centroids equalized to the original sounds, with frequencies harmonically flattened, and with average spectral error levels of 8%, 16%, 24%, 32%, and 48%. Listeners were asked to discriminate the randomly altered sounds from reference sounds resynthesized from the original data. For all eight instruments, discrimination was very good for the 32% and 48% error levels, moderate for the 16% and 24% error levels, and poor for the 8% error levels. When the error levels were 16%, 24%, and 32%, the scores of musically experienced listeners were found to be significantly better than the scores of listeners with no musical experience. Also, in this same error level range, discrimination was significantly affected by the instrument tested. For error levels of 16% and 24%, discrimination scores were significantly, but negatively correlated with measures of spectral incoherence and normalized centroid deviation on unaltered instrument spectra, suggesting that the presence of dynamic spectral variations tends to increase the difficulty of detecting spectral alterations. Correlation between discrimination and a measure of spectral irregularity was comparatively low. 相似文献

11.

Dynamic-range compression affects the lateral position of sounds

Wiggins IM Seeber BU 《The Journal of the Acoustical Society of America》2011,130(6):3939-3953

Dynamic-range compression acting independently at each ear in a bilateral hearing-aid or cochlear-implant fitting can alter interaural level differences (ILDs) potentially affecting spatial perception. The influence of compression on the lateral position of sounds was studied in normal-hearing listeners using virtual acoustic stimuli. In a lateralization task, listeners indicated the leftmost and rightmost extents of the auditory event and reported whether they heard (1) a single, stationary image, (2) a moving/gradually broadening image, or (3) a split image. Fast-acting compression significantly affected the perceived position of high-pass sounds. For sounds with abrupt onsets and offsets, compression shifted the entire image to a more central position. For sounds containing gradual onsets and offsets, including speech, compression increased the occurrence of moving and split images by up to 57 percentage points and increased the perceived lateral extent of the auditory event. The severity of the effects was reduced when undisturbed low-frequency binaural cues were made available. At high frequencies, listeners gave increased weight to ILDs relative to interaural time differences carried in the envelope when compression caused ILDs to change dynamically at low rates, although individual differences were apparent. Specific conditions are identified in which compression is likely to affect spatial perception. 相似文献

12.

The influence of actual and imputed talker gender on fricative perception, revisited (L)

Munson B 《The Journal of the Acoustical Society of America》2011,130(5):2631-2634

To examine the role of perceived gender on fricative identification, a study was conducted in which listeners identified /s/-/∫/ and /s/-/θ/ continua combined with vowels produced by a man and a woman. These were acoustically modified to be consistent with different-sized vocal tracts (VT), and were presented with pictures of men or women. Listeners identified more tokens of /s/ in the /s/-/∫/ and more tokens of /θ/ in the /s/-/θ/ continuum when these sounds were combined with men's vowels, with vowels consistent with a 17 cm VT, and with pictures of men. Results support the hypothesis that listeners incorporate information about talker gender during fricative perception. 相似文献

13.

Effects of cochlear implant processing and fundamental frequency on the intelligibility of competing sentences

Stickney GS Assmann PF Chang J Zeng FG 《The Journal of the Acoustical Society of America》2007,122(2):1069-1078

Speech perception in the presence of another competing voice is one of the most challenging tasks for cochlear implant users. Several studies have shown that (1) the fundamental frequency (F0) is a useful cue for segregating competing speech sounds and (2) the F0 is better represented by the temporal fine structure than by the temporal envelope. However, current cochlear implant speech processing algorithms emphasize temporal envelope information and discard the temporal fine structure. In this study, speech recognition was measured as a function of the F0 separation of the target and competing sentence in normal-hearing and cochlear implant listeners. For the normal-hearing listeners, the combined sentences were processed through either a standard implant simulation or a new algorithm which additionally extracts a slowed-down version of the temporal fine structure (called Frequency-Amplitude-Modulation-Encoding). The results showed no benefit of increasing F0 separation for the cochlear implant or simulation groups. In contrast, the new algorithm resulted in gradual improvements with increasing F0 separation, similar to that found with unprocessed sentences. These results emphasize the importance of temporal fine structure for speech perception and demonstrate a potential remedy for difficulty in the perceptual segregation of competing speech sounds. 相似文献

14.

冲击阻尼板的尺寸辨识与声线索提取

张冰瑞陈克安梁雍赵华勇《声学学报》2014,39(5):605-612

针对冲击板的物理属性辨识问题,研究了尺寸辨识的恒定声线索提取及其在听觉感知中所起的作用。设计并完成了三组主观评价实验,实验1分析了录音和合成声作为声刺激对尺寸辨识的影响。实验2和实验3分别针对铝板和木板的冲击声,通过不相似主观评价实验获得尺寸辨识的感知空间和对应力学空间维度,计算了不同声特征的信息精确度。最后,对比实验2和实验3的结果给出了尺寸辨识的恒定声线索,根据声线索与感知结果的相关性分析了听者的尺寸感知策略。结果表明,听者利用录音和合成声均获得了较好的尺寸辨识结果,且听者趋向于利用与尺寸有关的恒定声线索来完成感知任务,而忽略那些容易受其他声源属性影响的声信息。相似文献

15.

Phonemic and phonetic factors in adult cross-language speech perception 总被引：5，自引：0，他引：5

J F Werker R C Tees 《The Journal of the Acoustical Society of America》1984,75(6):1866-1878

Previous research has indicated that young infants can discriminate speech sounds across phonetic boundaries regardless of specific relevant experience, and that there is a modification in this ability during ontogeny such that adults often have difficulty discriminating phonetic contrasts which are not used contrastively in their native language. This pattern of findings has often been interpreted as suggesting that humans are endowed with innate auditory sensitivities which enable them to discriminate speech sounds according to universal phonetic boundaries and that there is a decline or loss in this ability after being exposed to a language which contrasts only a subset of those distinctions. The present experiments were designed to determine whether this modification represents a loss of sensorineural response capabilities or whether it shows a shift in attentional focus and/or processing strategies. In experiment 1, adult English-speaking subjects were tested on their ability to discriminate two non-English speech contrasts in a category-change discrimination task after first being predisposed to adopt one of four perceptual sets. In experiments 2, 3, and 4 subjects were tested in an AX (same/different) procedure, and the effects of both limited training and duration of the interstimulus interval were assessed. Results suggest that the previously observed ontogenetic modification in the perception of non-native phonetic contrasts involves a change in processing strategies rather than a sensorineural loss. Adult listeners can discriminate sounds across non-native phonetic categories in some testing conditions, but are not able to use that ability in testing conditions which have demands similar to those required in natural language processing. 相似文献

16.

Vowel intelligibility in clear and conversational speech for normal-hearing and hearing-impaired listeners

Ferguson SH Kewley-Port D 《The Journal of the Acoustical Society of America》2002,112(1):259-271

Several studies have demonstrated that when talkers are instructed to speak clearly, the resulting speech is significantly more intelligible than speech produced in ordinary conversation. These speech intelligibility improvements are accompanied by a wide variety of acoustic changes. The current study explored the relationship between acoustic properties of vowels and their identification in clear and conversational speech, for young normal-hearing (YNH) and elderly hearing-impaired (EHI) listeners. Monosyllabic words excised from sentences spoken either clearly or conversationally by a male talker were presented in 12-talker babble for vowel identification. While vowel intelligibility was significantly higher in clear speech than in conversational speech for the YNH listeners, no clear speech advantage was found for the EHI group. Regression analyses were used to assess the relative importance of spectral target, dynamic formant movement, and duration information for perception of individual vowels. For both listener groups, all three types of information emerged as primary cues to vowel identity. However, the relative importance of the three cues for individual vowels differed greatly for the YNH and EHI listeners. This suggests that hearing loss alters the way acoustic cues are used for identifying vowels. 相似文献

17.

Categorization and discrimination of nonspeech sounds: differences between steady-state and rapidly-changing acoustic cues

Mirman D Holt LL McClelland JL 《The Journal of the Acoustical Society of America》2004,116(2):1198-1207

Different patterns of performance across vowels and consonants in tests of categorization and discrimination indicate that vowels tend to be perceived more continuously, or less categorically, than consonants. The present experiments examined whether analogous differences in perception would arise in nonspeech sounds that share critical transient acoustic cues of consonants and steady-state spectral cues of simplified synthetic vowels. Listeners were trained to categorize novel nonspeech sounds varying along a continuum defined by a steady-state cue, a rapidly-changing cue, or both cues. Listeners' categorization of stimuli varying on the rapidly changing cue showed a sharp category boundary and posttraining discrimination was well predicted from the assumption of categorical perception. Listeners more accurately discriminated but less accurately categorized steady-state nonspeech stimuli. When listeners categorized stimuli defined by both rapidly-changing and steady-state cues, discrimination performance was accurate and the categorization function exhibited a sharp boundary. These data are similar to those found in experiments with dynamic vowels, which are defined by both steady-state and rapidly-changing acoustic cues. A general account for the speech and nonspeech patterns is proposed based on the supposition that the perceptual trace of rapidly-changing sounds decays faster than the trace of steady-state sounds. 相似文献

18.

Development and validation of the Mandarin speech perception test

Fu QJ Zhu M Wang X 《The Journal of the Acoustical Society of America》2011,129(6):EL267-EL273

Currently there are few standardized speech testing materials for Mandarin-speaking cochlear implant (CI) listeners. In this study, Mandarin speech perception (MSP) sentence test materials were developed and validated in normal-hearing subjects listening to acoustic simulations of CI processing. Percent distribution of vowels, consonants, and tones within each MSP sentence list was similar to that observed across commonly used Chinese characters. There was no significant difference in sentence recognition across sentence lists. Given the phonetic balancing within lists and the validation with spectrally degraded speech, the present MSP test materials may be useful for assessing speech performance of Mandarin-speaking CI listeners. 相似文献

19.

Acoustic correlates of talker sex and individual talker identity are present in a short vowel segment produced in running speech.

J A Bachorowski M J Owren 《The Journal of the Acoustical Society of America》1999,106(2):1054-1063

Although listeners routinely perceive both the sex and individual identity of talkers from their speech, explanations of these abilities are incomplete. Here, variation in vocal production-related anatomy was assumed to affect vowel acoustics thought to be critical for indexical cueing. Integrating this approach with source-filter theory, patterns of acoustic parameters that should represent sex and identity were identified. Due to sexual dimorphism, the combination of fundamental frequency (F0, reflecting larynx size) and vocal tract length cues (VTL, reflecting body size) was predicted to provide the strongest acoustic correlates of talker sex. Acoustic measures associated with presumed variations in supralaryngeal vocal tract-related anatomy occurring within sex were expected to be prominent in individual talker identity. These predictions were supported by results of analyses of 2500 tokens of the /epsilon/ phoneme, extracted from the naturally produced speech of 125 subjects. Classification by talker sex was virtually perfect when F0 and VTL were used together, whereas talker classification depended primarily on the various acoustic parameters associated with vocal-tract filtering. 相似文献

20.

The acoustic bases for gender identification from children's voices.

T L Perry R N Ohde D H Ashmead 《The Journal of the Acoustical Society of America》2001,109(6):2988-2998

The purpose of this study was to examine the acoustic characteristics of children's speech and voices that account for listeners' ability to identify gender. In Experiment I, vocal recordings and gross physical measurements of 4-, 8-, 12-, and 16-year olds were taken (10 girls and 10 boys per age group). The speech sample consisted of seven nondiphthongal vowels of American English (/ae/ "had," /E/ "head," /i/ "heed," /I/ "hid," /a/ "hod," /inverted v/ "hud," and /u/ "who'd") produced in the carrier phrase, "Say /hVd/ again." Fundamental frequency (f0) and formant frequencies (F1, F2, F3) were measured from these syllables. In Experiment II, 20 adults rated the syllables produced by the children in Experiment I based on a six-point gender rating scale. The results from these experiments indicate (1) vowel formant frequencies differentiate gender for children as young as four years of age, while formant frequencies and f0 differentiate gender after 12 years of age, (2) the relationship between gross measures of physical size and vocal characteristics is apparent for at least 12- and 16-year olds, and (3) listeners can identify gender from the speech and voice of children as young as four years of age, and with respect to young children, listeners appear to base their gender ratings on vowel formant frequencies. The findings are discussed in relation to the development of gender identity and its perceptual representation in speech and voice. 相似文献