This Machine Learning System Thinks About Music Like You Do

If you’ve ever let Spotify DJ your party and then found yourself asking, a half an hour in, “Spotify, what are you thinking?”—well, it actually may be thinking a lot like you. Scientists reported in a new study from the Massachusetts Institute of Technology that they’ve created a machine-learning system that processes sound just like humans, whether it’s discerning the meaning of a word or classifying music by genre.

It’s the first artificial system to mimic the way the brain interprets sounds—and it rivals humans in its accuracy. The research, published today in the journal Neuron, offers a tantalizing new way to study the brain.

Despite the ubiquity of machine-learning systems—in the software that gives you music recommendations, for example—even the engineers who design these systems often don’t know how they “think,” or how human-like their inner workings are.

The researchers’ model was based on what’s called a deep neural network, a system whose structure is loosely inspired by neurons, or brain cells. It processes information in layers, with the deepest layers doing the most complex work. Scientists can train systems like these to “learn” human tasks, such as, in this case, interpreting sounds.

The researchers gave their model two tasks. In one, they played a two-second snippet of speech and tested its ability to identify the middle word. In another, they played two seconds of music and tested how well the model could name the genre. They put background noise under each clip to complicate the job.

It took thousands of examples to train it, but by the end, the model performed as well as humans. It knew dozens of genres of music—it could tell dubstep from ska or gothic rock. It even made errors in the same places that tripped up humans, struggling the most with the clips played over city sounds.

But the researchers still weren’t sure if the model was processing these signals the same way a brain does—or if it had found its own way to solve the same problem. To settle that, they needed to look at some brains.

Lead author Alex Kell, of MIT, examined data from an fMRI scanner to see which regions of the brain were hardest at work as subjects listened to an array of natural sounds. Then he played those same sounds to their model. He found that when the model was processing relatively basic information—such as the frequency of a sound or pattern—that corresponded with one region of the brain. More complex processing—such as discerning meaning—lined up with a different region.

This suggests that the brain processes information the same way as their model, in a hierarchy that goes from simplest to most complex.

This ability to connect the inner workings of a deep neural network to the brain is exciting, said Andrew Pfalz, a Ph.D candidate in Experimental Music and Digital Media at Louisiana State University whose research applies neural networks to sound. Despite the ubiquity of machine-learning systems—in the software that gives you music recommendations, for example, or obeys spoken orders to help find you someplace to eat—even the engineers who design these systems often don’t know how they “think,” or how human-like their inner workings are.

“It’s sort of a black box,” Pfalz said. “The sort of funny part about neural networks is we train them and we see that they produce the right answers, but we don’t necessarily know what’s going on inside.”

But through many queries, the MIT researchers were able to shed light on which layers of the system were engaged when—and how that aligned with activity in brains processing the same sounds.

Pfalz appreciates the irony of a machine-learning system originally inspired by the brain (hence the name “neural network”) now helping scientists understand it.

Yet computer scientist Ching-Hua Chuan, whose research focuses on the use of machine-learning systems to generate music at the University of North Florida, emphasizes the immensity of this claim. “[Neural networks] were never intended to model how our brain works,” she said, adding that the difficulty of peering into the “black box” suggests to her that it would take more research to prove that the model truly mimics the brain.

The MIT team believes it comes close. If they’re right, it could help scientists understand and simulate how the brain processes sound and other sensory signals, said MIT’s Josh McDermott, senior author on the study. And since running tests on a simulation tends to be quicker, safer, and cheaper than experimenting on a real brain, this could boost some neuroscience research into the fast track.

Computing power and neural network technology haven’t always been up to the task of modeling parts of the human brain, but these last five years mark the beginning of a new era, Kell said. “A lot of these tasks that have historically been insurmountably difficult for machine systems have actually become solvable.”