Multi-talker speech intelligibility requires successful separation of the target speech from background speech. Successful speech segregation relies on bottom-up neural coding fidelity of sensory information and top-down effortful listening. Here, we studied the interaction between temporal processing measured using Envelope Following Responses (EFRs) to amplitude modulated tones, and pupil-indexed listening effort, as it related to performance on the Quick Speech-in-Noise (QuickSIN) test in normal-hearing adults. Listening effort increased at the more difficult signal-to-noise ratios, but speech intelligibility only decreased at the hardest signal-to-noise ratio. Pupil-indexed listening effort and EFRs did not independently relate to QuickSIN performance. However, the combined effects of both EFRs and listening effort explained significant variance in QuickSIN performance. Our results suggest a synergistic interaction between sensory coding and listening effort as it relates to multi-talker speech intelligibility. These findings can inform the development of next-generation multi-dimensional approaches for testing speech intelligibility deficits in listeners with normal-hearing.