Visual speech segmentation: using facial cues to locate word boundaries in continuous speech.

Aaron D Mitchel, Daniel J Weiss
Author Information
  1. Aaron D Mitchel: Department of Psychology, Bucknell University, Lewisburg, PA 17837, USA.
  2. Daniel J Weiss: Department of Psychology and Program in Linguistics, The Pennsylvania State University, 643 Moore Building, University Park, PA 16802, USA.

Abstract

Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.

Keywords

References

  1. Int J Pediatr Otorhinolaryngol. 2003 May;67(5):479-95 [PMID: 12697350]
  2. Cogn Psychol. 2010 Sep;61(2):177-99 [PMID: 20573342]
  3. Cognition. 1996 Oct-Nov;61(1-2):93-125 [PMID: 8990969]
  4. J Speech Lang Hear Res. 2010 Dec;53(6):1529-42 [PMID: 20699342]
  5. Lang Learn Dev. 2009;5(1):30-49 [PMID: 24729760]
  6. J Exp Psychol Hum Learn. 1979 May;5(3):212-28 [PMID: 528913]
  7. J Exp Psychol Learn Mem Cogn. 2011 Sep;37(5):1081-91 [PMID: 21574745]
  8. Q J Exp Psychol (Hove). 2011 May;64(5):1021-40 [PMID: 21347988]
  9. Perception. 2007;36(10):1445-53 [PMID: 18265827]
  10. Q J Exp Psychol (Hove). 2010 Feb;63(2):260-74 [PMID: 19526435]
  11. Child Dev. 2005 May-Jun;76(3):598-613 [PMID: 15892781]
  12. Percept Psychophys. 1993 Sep;54(3):287-95 [PMID: 8414887]
  13. Psychon Bull Rev. 2000 Sep;7(3):504-9 [PMID: 11082857]
  14. Dev Psychol. 2000 Mar;36(2):190-201 [PMID: 10749076]
  15. Psychol Rev. 1991 Apr;98(2):164-81 [PMID: 2047512]
  16. Mem Cognit. 2009 Sep;37(6):889-94 [PMID: 19679867]
  17. Cognition. 2010 Nov;117(2):107-25 [PMID: 20832060]
  18. Nature. 1976 Dec 23-30;264(5588):746-8 [PMID: 1012311]
  19. Psychol Sci. 2004 Feb;15(2):133-7 [PMID: 14738521]
  20. Science. 2007 May 25;316(5828):1159 [PMID: 17525331]
  21. Cogn Sci. 2010 Aug;34(6):1093-106 [PMID: 21564244]
  22. Dev Psychol. 2010 Jan;46(1):66-77 [PMID: 20053007]
  23. Lang Cogn Process. 2014;29(7):771-780 [PMID: 25018577]
  24. Pediatrics. 1975 Oct;56(4):544-9 [PMID: 1165958]
  25. Science. 1982 Dec 10;218(4577):1138-41 [PMID: 7146899]
  26. Cognition. 2008 Sep;108(3):850-5 [PMID: 18590910]
  27. Cogn Psychol. 1999 Nov-Dec;39(3-4):159-207 [PMID: 10631011]
  28. J Exp Psychol Hum Percept Perform. 1996 Apr;22(2):318-31 [PMID: 8934846]
  29. J Speech Lang Hear Res. 2004 Apr;47(2):304-20 [PMID: 15157132]
  30. J Acoust Soc Am. 2000 Sep;108(3 Pt 1):1197-208 [PMID: 11008820]

Grants

  1. R01 HD067250/NICHD NIH HHS
  2. R03 HD048996/NICHD NIH HHS

Word Cloud

Created with Highcharts 10.0.0speechcuesfacialvisuallanguagecanacquisitionsegmentationlearnerswordboundariesinformationstreampaireddisplaysprosodyinformativealoneaudiovisualSpeechtypicallymultimodalphenomenonyetstudiesfocusedexclusivecontributionsaddressgapinvestigatedwhetherprosodicfacilitatePreviousresearchdemonstrateduselexicalstresspitchsegmentextracttalkingfacesThuscreatedartificialcontainedminimaltwosynchronouseitheruninformativeidentifyingAcrossthreefamiliarisationconditionsaudiostreamslearningoccurredsuggestinghelpsolveearlychallengesVisualsegmentation:usinglocatecontinuousmultisensoryintegration

Similar Articles

Cited By (23)