Artificial intelligence program teaches itself to play Atari games — and it can beat your high score


Artificial intelligence program deep Q-network teaches itself to play classic Atari games like Space Invaders. Video courtesy Google DeepMind with permission from Square Enix Ltd.

A new artificial intelligence program from Google DeepMind has taught itself how to play classic Atari 2600 games. And it can probably beat your high score.

Deep Q-network, or DQN, can play 49 Atari games “right out of the box,” says Demis Hassabis, world-renowned gamer and founder of DeepMind. Overall, it performed as well as a professional human video game tester, according to a study published this week in Nature. On more than half of the games, it scored more than 75 percent of the human score.

This isn’t the first game-playing A.I. program. IBM supercomputer Deep Blue defeated world chess champion Garry Kasparov in 1997. In 2011, an artificial intelligence computer system named Watson won a game of Jeopardy against champions Ken Jennings and Brad Rutter.

Watson and Deep Blue were great achievements, but those computers were loaded with all the chess moves and trivia knowledge they could handle, Hassabis said in a news conference Tuesday. Essentially, they were trained, he explained.

But in this experiment, designers didn’t tell DQN how to win the games. They didn’t even tell it how to play or what the rules were, Hassabis said.

“(Deep Q-network) learns how to play from the ground up,” Hassabis said. “The idea is that these types of systems are more human-like in the way they learn. Our brains make models that allow us to learn and navigate the world. That’s exactly the type of system we’re trying to design here.”

To test DQN’s ability to learn and adapt, Hassabis and his team at DeepMind tried Atari 2600 games from the late 1970s and early 1980s. Atari games had the right level of complexity for the DQN software, Hassabis said. The software agent had access to the last four images on the screen and its score.

By “looking” at the pixels on the screen and moving the controls, DQN taught itself to play over the course of several weeks, said Vlad Mnih, one of the authors on the paper, at Tuesday’s conference. It’s a process called “deep reinforcement learning,” Mnih said, where the computer learns through trial and error — the same way humans and other animals learn.

“We are trying to explore the space of algorithms for intelligence. We have one example of (intelligence) — the human brain,” Hassabis said. “We can be certain that reinforced learning is something that works and something humans and animals use to learn.”

Sometimes it learned to beat the games in ways the researchers didn’t expect. In Breakout, deep Q-network figured out how to tunnel through the wall, something the research team hadn’t thought of.

Video courtesy Google DeepMind with permission from Atari Interactive Inc.

But DQN failed at other games, particularly ones that required planning and foresight, like Ms. PacMan, Mnih said. And DQN can’t transfer what it learned from one situation to the next, Hassabis said. That’s something even toddlers can do, he added.

“One of the issues is that it learns to play by pressing keys randomly, then figuring out high scores and what leads to that. In some games that strategy doesn’t work,” Mnih said.

Google DeepMind is sticking with deep Q-networks video game training for now, moving up to Nintendo games from the 1990s, Hassabis said. Eventually he would love for the software agent to crack more complicated games like Starcraft and Civilization.

Video games may be the testing ground, but this technology has real-world applications, Hassabis said. For example, if it masters driving a car in Grand Theft Auto, it could be used in self-driving cars, he said. Or it could learn how to make better predictions for the weather and financial markets. Hassabis and his team are already tinkering with parts of DQN’s algorithm to improve Google’s search function and mobile applications.

“The ultimate goal is to build smart, general purpose machines,” Hassabis said. “I think the demonstration shows that this is possible. It’s a first baby step.”