Ensemble coding - the rapid extraction of a perceptual average - has been proposed as a potential mechanism underlying face learning. We tested this proposal across five pre-registered experiments in which four ambient images of an identity were presented in the study phase. In Experiments 1 and 2a-c, participants were asked whether a test image was in the study array; these experiments examined the robustness of ensemble coding. Experiment 1 replicated ensemble coding in an online sample; participants recognize images from the study array and the average of those images. Experiments 2a-c provide evidence that ensemble coding meets several criteria of a possible learning mechanism: It is robust to changes in head orientation (�� 60), survives a short (30s) delay, and persists when images of two identities are interleaved during the study phase. Experiment 3 examined whether ensemble coding is sufficient for face learning (i.e., facilitates recognition of novel images of a target identity). Each study array comprised four ambient images (variability + average), a single image, or an average of four images (average only). Participants were asked whether a novel test image showed the identity from a study array. Performance was best in the four-image condition, with no difference between the single-image and average-only conditions. We conclude that ensemble coding of facial identity is robust but that the perceptual average per se is not sufficient for face learning.