DeepMind’s AI Go playing software beat 2 games

DeepMind’s AI Go master has taught itself new tricks. The latest version of the machine learning software, dubbed AlphaZero, can now also beat the world’s best at chess and shogi – a Japanese game that is similar to chess but played on a bigger board with more pieces.

DeepMind’s AI Go playing software beat 2 games

DeepMind, a sister company to Google, claims that it is the first machine-learning system that can learn to do more than one task with superhuman ability.

AlphaGo made headlines in 2016 when it beat the world’s best players at a game long thought too hard for computers to crack. Then came AlphaGo Zero, which not only out-played AlphaGo but taught itself to do so without ever having seen a human play the game.

Instead of learning how to play by analyzing millions of games played by humans, as AlphaGo had, AlphaGo Zero did without human input. It was essentially given the rules of the game and shut up in a box until it was the best Go player in the world. It took it three days.

Chess- and shogi-playing software are typically given the rules of the game and use a brute-force search to find the best possible next move. AlphaZero instead generalises the AlphaGo Zero approach. Using the same algorithm, it taught itself to master Go, chess and shogi from scratch.

In addition to beating AlphaGo Zero at Go, it can also best leading chess and shogi software that already outperform humans. The DeepMind teams believes that AlphaZero brings us a step closer to an AI that can teach itself to play any kind of game.

“This is very good work,” says Julian Togelius at New York University, who works on game-playing AI. “It’s a clever algorithm.” Yet he thinks we need to be careful about what we mean when we talk about a general AI.

AlphaZero is general in the sense that the same software can learn different games. But once trained, the system cannot take what it has learned and apply it elsewhere.

“A network trained to play chess cannot play Go, and vice versa,” says Togelius. “To play another game, the system has to be retrained from scratch.” In other words, a single instance of AlphaZero can’t play all three games, the way a human could.

Togelius also notes that this superhuman AI requires using 5000 TPUs – or tensor processing units – which are a type of chip specially designed by Google for machine learning. “That’s an absolutely insane amount of computing power,” says Togelius. “Only major tech companies have access to that, so nobody has been able to replicate the work properly.