Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Author ORCID Identifier



Open Access Dissertation

Document Type


Degree Name

Doctor of Philosophy (PhD)

Degree Program

Neuroscience and Behavior

Year Degree Awarded


Month Degree Awarded


First Advisor

Alexandra Jesse

Second Advisor

Kyle R. Cave

Third Advisor

Lisa D. Sanders

Fourth Advisor

John Kingston

Subject Categories

Biological Psychology | Cognitive Neuroscience | Cognitive Psychology


In face-to-face conversations, listeners process and combine speech information obtained from hearing and seeing the speaker talk. Audiovisual speech typically leads to more robust recognition of speech, as it provides more information for recognition but also as it helps listeners adjust to speaker idiosyncrasies. The goal of the current thesis was to examine how certain perceptual and cognitive factors modulate how listeners use visual speech to facilitate momentary speech perception and to adjust to a speaker’s idiosyncrasies. Results showed that (older) listeners’ sensitivity to cross-modal synchrony is related to the size of the audiovisual interactions during early perceptual processing. Furthermore, when experiencing asynchrony, it was demonstrated that younger listeners adapted their speech perception to the current situation such that early neural interactions emerged. Higher-level mechanisms also modulated audiovisual speech processing. We provide evidence that when listeners fixate the speaker’s eyes, as they typically do, the gathered visual speech information can successfully facilitate early auditory processing. Allocating covert attention to the mouth area is not needed. We also demonstrated that the availability of working memory resources determines how quickly and thoroughly listeners can disambiguate visual speech to recalibrate phonetic categories to accommodate a speaker’s idiosyncrasy. In summary, the studies in this thesis thus provide valuable insight into factors affecting the mechanisms involved in the processing of audiovisual speech.