Auditory short-term memory (STM) is a fundamental ability to make sense of auditory information as it unfolds over time. Whether separate STM systems exist for different types of auditory information (music and speech, in particular) is a matter of debate. The present paper reviews studies that have investigated both musical and verbal STM in healthy individuals and in participants with neurodevelopmental and neurological disorders. Overall, the results are in favor of only partly shared networks for musical and verbal STM. Evidence for a distinction in STM for the two materials stems from (1) behavioral studies in healthy participants, in particular from the comparison between nonmusicians and musicians; (2) behavioral studies in congenital amusia, where a selective pitch STM deficit is observed; and (3) studies in brain-damaged patients with cases of double dissociation. In this review we highlight the need for future studies comparing STM for the same perceptual dimension (e.g., pitch) in different materials (e.g., music and speech), as well as for studies aiming at a more insightful characterization of shared and distinct mechanisms for speech and music in the different components of STM, namely encoding, retention, and retrieval.