Attentive Tracking of Sound Sources.

Kevin J P Woods, Josh H McDermott
Author Information
  1. Kevin J P Woods: Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, USA. Electronic address: kwoods@mit.edu.
  2. Josh H McDermott: Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, USA; Program in Speech and Hearing Bioscience and Technology, Harvard University, Cambridge, MA 02138, USA.

Abstract

Auditory scenes often contain concurrent sound sources, but listeners are typically interested in just one of these and must somehow select it for further processing. One challenge is that real-world sounds such as speech vary over time and as a consequence often cannot be separated or selected based on particular values of their features (e.g., high pitch). Here we show that human listeners can circumvent this challenge by tracking sounds with a movable focus of attention. We synthesized pairs of voices that changed in pitch and timbre over random, intertwined trajectories, lacking distinguishing features or linguistic information. Listeners were cued beforehand to attend to one of the voices. We measured their ability to extract this cued voice from the mixture by subsequently presenting the ending portion of one voice and asking whether it came from the cued voice. We found that listeners could perform this task but that performance was mediated by attention-listeners who performed best were also more sensitive to perturbations in the cued voice than in the uncued voice. Moreover, the task was impossible if the source trajectories did not maintain sufficient separation in feature space. The results suggest a locus of attention that can follow a sound's trajectory through a feature space, likely aiding selection and segregation amid similar distractors.

MeSH Term

Adult
Attention
Cues
Female
Humans
Male
Perceptual Masking
Sound Spectrography
Speech Acoustics
Speech Perception
Young Adult

Word Cloud

Created with Highcharts 10.0.0voicecuedlistenersoneoftenchallengesoundsfeaturespitchcanattentionvoicestrajectoriestaskfeaturespaceAuditoryscenescontainconcurrentsoundsourcestypicallyinterestedjustmustsomehowselectprocessingOnereal-worldspeechvarytimeconsequenceseparatedselectedbasedparticularvalueseghighshowhumancircumventtrackingmovablefocussynthesizedpairschangedtimbrerandomintertwinedlackingdistinguishinglinguisticinformationListenersbeforehandattendmeasuredabilityextractmixturesubsequentlypresentingendingportionaskingwhethercamefoundperformperformancemediatedattention-listenersperformedbestalsosensitiveperturbationsuncuedMoreoverimpossiblesourcemaintainsufficientseparationresultssuggestlocusfollowsound'strajectorylikelyaidingselectionsegregationamidsimilardistractorsAttentiveTrackingSoundSources

Similar Articles

Cited By