Mothers across languages change the timbre of their voice in similar ways when they speak to babies, Princeton University neuroscientists report today in the journal Current Biology. This finding will help researchers understand what kind of speech keeps a baby’s attention, which could improve how we teach children.
Timbre is the flavor of music and speech. It’s not a distinct pitch or loudness, but rather the unique collection of frequencies produced by a person or instrument. Timbre is what makes sound distinct: It’s why you can tell a violin from a guitar even if they are playing the same note, or Bob Dylan from Jimi Hendrix even if they are both singing “All Along the Watchtower.”
Timbre is tied to the physical structure of the object producing the sound. Certain tones resonate more fully on a violin than on a guitar, and that resonance allows overtones to color the sound in different ways. You can see the different resonances due to the shape of objects in this video of a classic experiment called a Chladni plate (Mind your ears!):
Each person’s voice box is also an instrument with a unique timbre, though it is malleable and can shift slightly. To imitate the distinct, nasally voice of Donald Duck, says lead author Dr. Elise Piazza, “I might draw back my lips and tighten the back of my throat to create a different tone color.”
It is known that mothers in many languages raise their pitch, slow down their speech and repeat phrases more often when they are trying to attract a baby’s attention. This is known as infant-directed speech, and Piazza and her colleagues wondered if it might cause shifts in timbre as well.
To test this, the team collected snippets of adult-directed and infant-directed speech from 24 mothers as they either talked to an adult interviewer or interacted with their baby. They chose only mothers in order to minimize the range of audio frequencies they had to deal with (though, the team believes the results extend to fathers as well).
“We usually Skype with my parents,” was one phrase spoken to an adult interviewer, while another phrase spoken to an infant was, “Let’s not eat the kitty cat.” You can almost hear the difference just by reading those quips.
LISTEN: Phrases of adult-directed and infant-directed speech from a participant of the study.
To quantify the change in timbre, Piazza and her fellow researchers converted the recorded sound into spectrograms, a measure of the strength of audio frequencies over time. These spectrograms were then analyzed by a statistical model that produces something called Mel-frequency cepstral coefficients (MFCC).
An MFCC scan is like a vocal version of a fingerprint. It deciphers the strength of audio frequencies while taking into account how the human ear hears sounds. For example, our eardrum cannot distinguish frequencies that are very close together, so we perceive them as one tone. We also perceive high-pitched and low-pitched tones as sounding quieter than middle pitches all played at the same strength. As a result, it can reveal how the makeup of vocal frequencies — timbre — is perceived by others.
Comparing the MFCCs of infant-directed phrases to those of adult-directed phrases, the researchers found a shift in timbre across ten different languages. Piazza says it’s tough to characterize, but “it likely combines several features, such as brightness, breathiness, purity, or nasality.”
Using this data, the team wrote a machine learning algorithm and trained it to use timbre to classify whether a particular phrase was infant-directed or adult-directed.
“We were most surprised that this timbre shift between adult-directed and infant-directed speech exhibited such a consistent pattern across such diverse languages,” Piazza said. “In addition to English, we included Spanish, Russian, Polish, Hungarian, German, French, Hebrew, Mandarin and Cantonese.”
This consistent pattern across languages was picked up by their algorithm even when the training data set only had English phrases. The reverse was true too. When they trained the algorithm with other languages, it was able to use timbre shifts to classify speech in English.
“That classifier is really effective,” Dr. Anne-Michelle Tessier said, a linguist with the University of Michigan’s Center for Human Growth and Development who was not involved with the study. But it is hard to tell if infants are as good as machines at picking up on these patterns, she said.
Piazza thinks it is likely that we are. “Previous studies have shown that babies can perceive timbre differences between musical instruments,” she said. “Future work will be needed to determine exactly how babies perceive and use this information, and whether babies can pick up on this shift even in foreign languages.”
Tessier agreed that this research clearly indicates that our brain is tuned to discerning language, even from an early age: “Babies are really focused on attending to speech around them, and noticing and storing patterns and distributions in that speech.”
While the researchers intend to continue exploring this newfound phenomenon, Piazza thinks the find might prove useful for educational purposes. She envisions “having virtual teachers or cartoon characters imitate infant-directed timbre to optimally engage with babies.”
“Our work also invites future explorations of how speakers adjust their timbre to accommodate a wide variety of audiences, such as superiors, political constituents, students, and romantic partners,” Piazza said.
A version of this story appeared on Miles O’Brien Productions.