期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Learning to use an artificial visual cue in speech identification

Stephens JD Holt LL 《The Journal of the Acoustical Society of America》2010,128(4):2138-2149

Visual information from a speaker's face profoundly influences auditory perception of speech. However, relatively little is known about the extent to which visual influences may depend on experience, and extent to which new sources of visual speech information can be incorporated in speech perception. In the current study, participants were trained on completely novel visual cues for phonetic categories. Participants learned to accurately identify phonetic categories based on novel visual cues. These newly-learned visual cues influenced identification responses to auditory speech stimuli, but not to the same extent as visual cues from a speaker's face. The novel methods and results of the current study raise theoretical questions about the nature of information integration in speech perception, and open up possibilities for further research on learning in multimodal perception, which may have applications in improving speech comprehension among the hearing-impaired. 相似文献

2.

Auditory-visual speech perception in normal-hearing and cochlear-implant listeners

Desai S Stickney G Zeng FG 《The Journal of the Acoustical Society of America》2008,123(1):428-440

The present study evaluated auditory-visual speech perception in cochlear-implant users as well as normal-hearing and simulated-implant controls to delineate relative contributions of sensory experience and cues. Auditory-only, visual-only, or auditory-visual speech perception was examined in the context of categorical perception, in which an animated face mouthing ba, da, or ga was paired with synthesized phonemes from an 11-token auditory continuum. A three-alternative, forced-choice method was used to yield percent identification scores. Normal-hearing listeners showed sharp phoneme boundaries and strong reliance on the auditory cue, whereas actual and simulated implant listeners showed much weaker categorical perception but stronger dependence on the visual cue. The implant users were able to integrate both congruent and incongruent acoustic and optical cues to derive relatively weak but significant auditory-visual integration. This auditory-visual integration was correlated with the duration of the implant experience but not the duration of deafness. Compared with the actual implant performance, acoustic simulations of the cochlear implant could predict the auditory-only performance but not the auditory-visual integration. These results suggest that both altered sensory experience and improvised acoustic cues contribute to the auditory-visual speech perception in cochlear-implant users. 相似文献

3.

Beneficial acoustic speech cues for cochlear implant users with residual acoustic hearing

Visram AS Azadpour M Kluk K McKay CM 《The Journal of the Acoustical Society of America》2012,131(5):4042-4050

This study investigated which acoustic cues within the speech signal are responsible for bimodal speech perception benefit. Seven cochlear implant (CI) users with usable residual hearing at low frequencies in the non-implanted ear participated. Sentence tests were performed in near-quiet (some noise on the CI side to reduce scores from ceiling) and in a modulated noise background, with the implant alone and with the addition, in the hearing ear, of one of four types of acoustic signals derived from the same sentences: (1) a complex tone modulated by the fundamental frequency (F0) and amplitude envelope contours; (2) a pure tone modulated by the F0 and amplitude contours; (3) a noise-vocoded signal; (4) unprocessed speech. The modulated tones provided F0 information without spectral shape information, whilst the vocoded signal presented spectral shape information without F0 information. For the group as a whole, only the unprocessed speech condition provided significant benefit over implant-alone scores, in both near-quiet and noise. This suggests that, on average, F0 or spectral cues in isolation provided limited benefit for these subjects in the tested listening conditions, and that the significant benefit observed in the full-signal condition was derived from implantees' use of a combination of these cues. 相似文献

4.

Processing speech signal using auditory-like filterbank provides least uncertainty about articulatory gestures

Ghosh PK Goldstein LM Narayanan SS 《The Journal of the Acoustical Society of America》2011,129(6):4014-4022

Understanding how the human speech production system is related to the human auditory system has been a perennial subject of inquiry. To investigate the production-perception link, in this paper, a computational analysis has been performed using the articulatory movement data obtained during speech production with concurrently recorded acoustic speech signals from multiple subjects in three different languages: English, Cantonese, and Georgian. The form of articulatory gestures during speech production varies across languages, and this variation is considered to be reflected in the articulatory position and kinematics. The auditory processing of the acoustic speech signal is modeled by a parametric representation of the cochlear filterbank which allows for realizing various candidate filterbank structures by changing the parameter value. Using mathematical communication theory, it is found that the uncertainty about the articulatory gestures in each language is maximally reduced when the acoustic speech signal is represented using the output of a filterbank similar to the empirically established cochlear filterbank in the human auditory system. Possible interpretations of this finding are discussed. 相似文献

5.

The role of visual speech cues in reducing energetic and informational masking

Helfer KS Freyman RL 《The Journal of the Acoustical Society of America》2005,117(2):842-849

Two experiments compared the effect of supplying visual speech information (e.g., lipreading cues) on the ability to hear one female talker's voice in the presence of steady-state noise or a masking complex consisting of two other female voices. In the first experiment intelligibility of sentences was measured in the presence of the two types of maskers with and without perceived spatial separation of target and masker. The second study tested detection of sentences in the same experimental conditions. Results showed that visual cues provided more benefit for both recognition and detection of speech when the masker consisted of other voices (versus steady-state noise). Moreover, visual cues provided greater benefit when the target speech and masker were spatially coincident versus when they appeared to arise from different spatial locations. The data obtained here are consistent with the hypothesis that lipreading cues help to segregate a target voice from competing voices, in addition to the established benefit of supplementing masked phonetic information. 相似文献

6.

Bimodal cues for speech loudness

R D Glave A C Rietveld 《The Journal of the Acoustical Society of America》1979,66(4):1018-1022

This paper presents a bimodal (audio-visual) study of speech loudness. The same acoustic stimuli (three sustained vowels of the articulatory qualities "effort" and "noneffort") are first presented in isolation, and then simultaneously together with an appropriate optical stimulus (the speaker's face on a video screen, synchronously producing the vowels). By the method of paired comparisons (law of comparative judgment) subjective loudness differences could be represented by different intervals between scale values. By this method previous results of effort-dependent speech loudness could be verified. In the bimodal study the optical cues have a measurable effect, but the acoustic cues are still dominant. Visual cues act most effectively if they are presented naturally, i.e., if acoustic and optical effort cues vary in the same direction. The experiments provide some evidence that speech loudness can be influenced by other than acoustic variables. 相似文献

7.

Tactile enhancement of auditory and visual speech perception in untrained perceivers

Gick B Jóhannsdóttir KM Gibraiel D Mühlbauer J 《The Journal of the Acoustical Society of America》2008,123(4):EL72-EL76

A single pool of untrained subjects was tested for interactions across two bimodal perception conditions: audio-tactile, in which subjects heard and felt speech, and visual-tactile, in which subjects saw and felt speech. Identifications of English obstruent consonants were compared in bimodal and no-tactile baseline conditions. Results indicate that tactile information enhances speech perception by about 10 percent, regardless of which other mode (auditory or visual) is active. However, within-subject analysis indicates that individual subjects who benefit more from tactile information in one cross-modal condition tend to benefit less from tactile information in the other. 相似文献

8.

A modeling investigation of articulatory variability and acoustic stability during American English /r/ production

Nieto-Castanon A Guenther FH Perkell JS Curtin HD 《The Journal of the Acoustical Society of America》2005,117(5):3196-3212

This paper investigates the functional relationship between articulatory variability and stability of acoustic cues during American English /r/ production. The analysis of articulatory movement data on seven subjects shows that the extent of intrasubject articulatory variability along any given articulatory direction is strongly and inversely related to a measure of acoustic stability (the extent of acoustic variation that displacing the articulators in this direction would produce). The presence and direction of this relationship is consistent with a speech motor control mechanism that uses a third formant frequency (F3) target; i.e., the final articulatory variability is lower for those articulatory directions most relevant to determining the F3 value. In contrast, no consistent relationship across speakers and phonetic contexts was found between hypothesized vocal-tract target variables and articulatory variability. Furthermore, simulations of two speakers' productions using the DIVA model of speech production, in conjunction with a novel speaker-specific vocal-tract model derived from magnetic resonance imaging data, mimic the observed range of articulatory gestures for each subject, while exhibiting the same articulatory/acoustic relations as those observed experimentally. Overall these results provide evidence for a common control scheme that utilizes an acoustic, rather than articulatory, target specification for American English /r/. 相似文献

9.

The relative phonetic contributions of a cochlear implant and residual acoustic hearing to bimodal speech perception

Sheffield BM Zeng FG 《The Journal of the Acoustical Society of America》2012,131(1):518-530

The addition of low-passed (LP) speech or even a tone following the fundamental frequency (F0) of speech has been shown to benefit speech recognition for cochlear implant (CI) users with residual acoustic hearing. The mechanisms underlying this benefit are still unclear. In this study, eight bimodal subjects (CI users with acoustic hearing in the non-implanted ear) and eight simulated bimodal subjects (using vocoded and LP speech) were tested on vowel and consonant recognition to determine the relative contributions of acoustic and phonetic cues, including F0, to the bimodal benefit. Several listening conditions were tested (CI/Vocoder, LP, T(F0-env), CI/Vocoder + LP, CI/Vocoder + T(F0-env)). Compared with CI/Vocoder performance, LP significantly enhanced both consonant and vowel perception, whereas a tone following the F0 contour of target speech and modulated with an amplitude envelope of the maximum frequency of the F0 contour (T(F0-env)) enhanced only consonant perception. Information transfer analysis revealed a dual mechanism in the bimodal benefit: The tone representing F0 provided voicing and manner information, whereas LP provided additional manner, place, and vowel formant information. The data in actual bimodal subjects also showed that the degree of the bimodal benefit depended on the cutoff and slope of residual acoustic hearing. 相似文献

10.

Hautus MJ Johnson BW 《The Journal of the Acoustical Society of America》2005,117(1):275-280

The cortical mechanisms of perceptual segregation of concurrent sound sources were examined, based on binaural detection of interaural timing differences. Auditory event-related potentials were measured from 11 healthy subjects. Binaural stimuli were created by introducing a dichotic delay of 500-ms duration to a narrow frequency region within a broadband noise, and resulted in a perception of a centrally located noise and a right-lateralized pitch (dichotic pitch). In separate listening conditions, subjects actively discriminated and responded to randomly interleaved binaural and control stimuli, or ignored random stimuli while watching silent cartoons. In a third listening condition subjects ignored stimuli presented in homogenous blocks. For all listening conditions, the dichotic pitch stimulus elicited an object-related negativity (ORN) at a latency of about 150-250 ms after stimulus onset. When subjects were required to actively respond to stimuli, the ORN was followed by a P400 wave with a latency of about 320-420 ms. These results support and extend a two-stage model of auditory scene analysis in which acoustic streams are automatically parsed into component sound sources based on source-relevant cues, followed by a controlled process involving identification and generation of a behavioral response. 相似文献

11.

The synergy between speech production and perception

Ru P Chi T Shamma S 《The Journal of the Acoustical Society of America》2003,113(1):498-515

Speech intelligibility is known to be relatively unaffected by certain deformations of the acoustic spectrum. These include translations, stretching or contracting dilations, and shearing of the spectrum (represented along the logarithmic frequency axis). It is argued here that such robustness reflects a synergy between vocal production and auditory perception. Thus, on the one hand, it is shown that these spectral distortions are produced by common and unavoidable variations among different speakers pertaining to the length, cross-sectional profile, and losses of their vocal tracts. On the other hand, it is argued that these spectral changes leave the auditory cortical representation of the spectrum largely unchanged except for translations along one of its representational axes. These assertions are supported by analyses of production and perception models. On the production side, a simplified sinusoidal model of the vocal tract is developed which analytically relates a few "articulatory" parameters, such as the extent and location of the vocal tract constriction, to the spectral peaks of the acoustic spectra synthesized from it. The model is evaluated by comparing the identification of synthesized sustained vowels to labeled natural vowels extracted from the TIMIT corpus. On the perception side a "multiscale" model of sound processing is utilized to elucidate the effects of the deformations on the representation of the acoustic spectrum in the primary auditory cortex. Finally, the implications of these results for the perception of generally identifiable classes of sound sources beyond the specific case of speech and the vocal tract are discussed. 相似文献

12.

声源追踪训练对语音型噪声中语音识别的影响

下载免费PDF全文

杜衣杭方卫宁《声学学报》2019,44(5):945-950

听觉训练可以提升人在噪声环境中语音识别的绩效.首先设计了一种以稳定声源为刺激的听觉追踪任务,在20个训练单元后,采用由干扰语音类型和信噪比两个因素构成3×5语音型噪声掩蔽下的语音识别测试验证了该训练方法的有效性.结果发现,训练组的语音识别率显著高于对照组,证明听觉注意力可以通过声源追踪任务的训练得到提高。实验结果表明,声源追踪训练可以使人在语音型噪声掩蔽下的听觉注意力水平趋于稳定。相似文献

13.

Speech processing studies using an acoustic model of a multiple-channel cochlear implant 总被引：1，自引：0，他引：1

P J Blamey R C Dowell Y C Tong A M Brown S M Luscombe G M Clark 《The Journal of the Acoustical Society of America》1984,76(1):104-110

The speech perception of two multiple-channel cochlear implant patients was compared with that of three normally hearing listeners using an acoustic model of the implant for 22 different speech tests. The tests used included a minimal auditory capabilities battery, both closed-set and open-set word and sentence tests, speech tracking and a 12-consonant confusion study using nonsense syllables. The acoustic model represented electrical current pulses by bursts of noise and the effects of different electrodes were represented by using bandpass filters with different center frequencies. All subjects used a speech processor that coded the fundamental voicing frequency of speech as a pulse rate and the second formant frequency of speech as the electrode position in the cochlea, or the center frequency of the bandpass filter. Very good agreement was found for the two groups of subjects, indicating that the acoustic model is a useful tool for the development and evaluation of alternative cochlear implant speech processing strategies. 相似文献

14.

On the perception of voicing in syllable-initial plosives in noise

Jiang J Chen M Alwan A 《The Journal of the Acoustical Society of America》2006,119(2):1092-1105

Previous studies [Lisker, J. Acoust. Soc. Am. 57, 1547-1551 (1975); Summerfield and Haggard, J. Acoust. Soc. Am. 62, 435-448 (1977)] have shown that voice onset time (VOT) and the onset frequency of the first formant are important perceptual cues of voicing in syllable-initial plosives. Most prior work, however, has focused on speech perception in quiet environments. The present study seeks to determine which cues are important for the perception of voicing in syllable-initial plosives in the presence of noise. Perceptual experiments were conducted using stimuli consisting of naturally spoken consonant-vowel syllables by four talkers in various levels of additive white Gaussian noise. Plosives sharing the same place of articulation and vowel context (e.g., /pa,ba/) were presented to subjects in two alternate forced choice identification tasks, and a threshold signal-to-noise-ratio (SNR) value (corresponding to the 79% correct classification score) was estimated for each voiced/voiceless pair. The threshold SNR values were then correlated with several acoustic measurements of the speech tokens. Results indicate that the onset frequency of the first formant is critical in perceiving voicing in syllable-initial plosives in additive white Gaussian noise, while the VOT duration is not. 相似文献

15.

Integral processing of phonemes: evidence for a phonetic mode of perception 总被引：1，自引：0，他引：1

G R Tomiak J W Mullennix J R Sawusch 《The Journal of the Acoustical Society of America》1987,81(3):755-764

To investigate the extent and locus of integral processing in speech perception, a speeded classification task was utilized with a set of noise-tone analogs of the fricative-vowel syllables (fae), (integral of ae), (fu), and (integral of u). Unlike the stimuli used in previous studies of selective perception of syllables, these stimuli did not contain consonant-vowel transitions. Subjects were asked to classify on the basis of one of the two syllable components. Some subjects were told that the stimuli were computer generated noise-tone sequences. These subjects processed the noise and tone separably. Irrelevant variation of the noise did not affect reaction times (RTs) for the classification of the tone, and vice versa. Other subjects were instructed to treat the stimuli as speech. For these subjects, irrelevant variation of the fricative increased RTs for the classification of the vowel, and vice versa. A second experiment employed naturally spoken fricative-vowel syllables with the same task. Classification RTs showed a pattern of integrality in that irrelevant variation of either component increased RTs to the other. These results indicate that knowledge of coarticulation (or its acoustic consequences) is a basic element of speech perception. Furthermore, the use of this knowledge in phonetic coding is mandatory, even in situations where the stimuli do not contain coarticulatory information. 相似文献

16.

Efficacy of auditory target tracking training to enhance selective attention in continuous speech-shaped noise environment

《声学学报：英文版》2020,(1)

The study focuses on the effect of auditory target tracking training on selective attention in continuous speech-shaped noise environment.Firstly,a short-term and simplified training method was designed,which adopted stable stimuli tracking to train the participants.After twenty trials,the validity of training method was verified by speech-shaped noise perception under the condition of 3 x 5 noise which is composed of two factors,the type of speech interference and the signal-to-noise ratio(SNR).The experimental results show that the speech perception performance of the training group is significantly better than that of the control group,which proves that the speech perception performance and auditory attention in speechshaped noise environment can be improved through the training of auditory target tracking.This indicates that the auditory target tracking training can stabilize the auditory attention level under the condition of speech-shaped noise.The work provides an effective method for promoting auditory selective attention in the real world and can be extended to specialized vocational training. 相似文献

17.

Fundamental frequency is critical to speech perception in noise in combined acoustic and electric hearing

Carroll J Tiaden S Zeng FG 《The Journal of the Acoustical Society of America》2011,130(4):2054-2062

Cochlear implant (CI) users have been shown to benefit from residual low-frequency hearing, specifically in pitch related tasks. It remains unclear whether this benefit is dependent on fundamental frequency (F0) or other acoustic cues. Three experiments were conducted to determine the role of F0, as well as its frequency modulated (FM) and amplitude modulated (AM) components, in speech recognition with a competing voice. In simulated CI listeners, the signal-to-noise ratio was varied to estimate the 50% correct response. Simulation results showed that the F0 cue contributes to a significant proportion of the benefit seen with combined acoustic and electric hearing, and additionally that this benefit is due to the FM rather than the AM component. In actual CI users, sentence recognition scores were collected with either the full F0 cue containing both the FM and AM components or the 500-Hz low-pass speech cue containing the F0 and additional harmonics. The F0 cue provided a benefit similar to the low-pass cue for speech in noise, but not in quiet. Poorer CI users benefited more from the F0 cue than better users. These findings suggest that F0 is critical to improving speech perception in noise in combined acoustic and electric hearing. 相似文献

18.

Perception of static and dynamic acoustic cues to place of articulation in initial stop consonants 总被引：1，自引：0，他引：1

D Kewley-Port D B Pisoni M Studdert-Kennedy 《The Journal of the Acoustical Society of America》1983,73(5):1779-1793

Two recent accounts of the acoustic cues which specify place of articulation in syllable-initial stop consonants claim that they are located in the initial portions of the CV waveform and are context-free. Stevens and Blumstein [J. Acoust. Soc. Am. 64, 1358-1368 (1978)] have described the perceptually relevant spectral properties of these cues as static, while Kewley-Port [J. Acoust. Soc. Am. 73, 322-335 (1983)] describes these cues as dynamic. Three perceptual experiments were conducted to test predictions derived from these accounts. Experiment 1 confirmed that acoustic cues for place of articulation are located in the initial 20-40 ms of natural stop-vowel syllables. Next, short synthetic CV's modeled after natural syllables were generated using either a digital, parallel-resonance synthesizer in experiment 2 or linear prediction synthesis in experiment 3. One set of synthetic stimuli preserved the static spectral properties proposed by Stevens and Blumstein. Another set of synthetic stimuli preserved the dynamic properties suggested by Kewley-Port. Listeners in both experiments identified place of articulation significantly better from stimuli which preserved dynamic acoustic properties than from those based on static onset spectra. Evidently, the dynamic structure of the initial stop-vowel articulatory gesture can be preserved in context-free acoustic cues which listeners use to identify place of articulation. 相似文献

19.

Auditory and nonauditory factors affecting speech reception in noise by older listeners 总被引：2，自引：0，他引：2

George EL Zekveld AA Kramer SE Goverts ST Festen JM Houtgast T 《The Journal of the Acoustical Society of America》2007,121(4):2362-2375

Speech reception thresholds (SRTs) for sentences were determined in stationary and modulated background noise for two age-matched groups of normal-hearing (N = 13) and hearing-impaired listeners (N = 21). Correlations were studied between the SRT in noise and measures of auditory and nonauditory performance, after which stepwise regression analyses were performed within both groups separately. Auditory measures included the pure-tone audiogram and tests of spectral and temporal acuity. Nonauditory factors were assessed by measuring the text reception threshold (TRT), a visual analogue of the SRT, in which partially masked sentences were adaptively presented. Results indicate that, for the normal-hearing group, the variance in speech reception is mainly associated with nonauditory factors, both in stationary and in modulated noise. For the hearing-impaired group, speech reception in stationary noise is mainly related to the audiogram, even when audibility effects are accounted for. In modulated noise, both auditory (temporal acuity) and nonauditory factors (TRT) contribute to explaining interindividual differences in speech reception. Age was not a significant factor in the results. It is concluded that, under some conditions, nonauditory factors are relevant for the perception of speech in noise. Further evaluation of nonauditory factors might enable adapting the expectations from auditory rehabilitation in clinical settings. 相似文献

20.

Comparing the effects of reverberation and of noise on speech recognition in simulated electric-acoustic listening

Helms Tillery K Brown CA Bacon SP 《The Journal of the Acoustical Society of America》2012,131(1):416-423

Cochlear implant users report difficulty understanding speech in both noisy and reverberant environments. Electric-acoustic stimulation (EAS) is known to improve speech intelligibility in noise. However, little is known about the potential benefits of EAS in reverberation, or about how such benefits relate to those observed in noise. The present study used EAS simulations to examine these questions. Sentences were convolved with impulse responses from a model of a room whose estimated reverberation times were varied from 0 to 1 sec. These reverberated stimuli were then vocoded to simulate electric stimulation, or presented as a combination of vocoder plus low-pass filtered speech to simulate EAS. Monaural sentence recognition scores were measured in two conditions: reverberated speech and speech in a reverberated noise. The long-term spectrum and amplitude modulations of the noise were equated to the reverberant energy, allowing a comparison of the effects of the interferer (speech vs noise). Results indicate that, at least in simulation, (1) EAS provides significant benefit in reverberation; (2) the benefits of EAS in reverberation may be underestimated by those in a comparable noise; and (3) the EAS benefit in reverberation likely arises from partially preserved cues in this background accessible via the low-frequency acoustic component. 相似文献