AI: why it’s so hard for computers to beat humans at the game of “Go”



Editor’s note: the author of this article works on a startup that provides tl:dr’s as a service, so he also provided this summary of key points.

The game of go is very interesting from an AI programmer’s point of view, because of how difficult it is to make a computer compete against a strong human and how researchers approached the problem. Paradoxically, the use of randomly played games has helped computers get much stronger.

Game of Go: a (very) brief overview

Go is wildly popular in Asia but little known in the Western world. Similar to Chess, it is a strategy game where two players take turn placing stones on a board. The key differences between these two are:

  • In Go, the board starts empty, and you add stones at each turn whereas in chess you start with an army and try to destroy the other one
  • The rules are much simpler in Go than in Chess: in particular there is only one type of stone while Chess uses six different pieces

I really encourage you to check it out, you can learn to play here and play here.

Compexity, comparison to chess

Even though the rules are simple, it has proven to be very hard to build a good Go-playing AI software. In fact, while top Chess-playing softwares can compete with the best humans (remember how Deep Blue beat Gary Kasparov in 1997), even strong amateurs can beat the top Go-playing programs. And strong amateurs are much weaker than top pros. There are three main reasons for this:

Key ideas behind current Go-playing AI

This complexity is what makes the study of Go algorithms so interesting.

A few years ago we saw a spectacular improvement, led by a program called MoGo. The key ideas behind MoGo are:

  • Position evaluation using random games: paradoxically, it was found that the best way to evaluate a position is to play, from this position, a lot of games were each player puts stones at random on the board. A position’s score is simply the percentage of won random games.
  • Use of multi-armed bandit algorithms to decrease the branching factor (the number of moves the program has to consider at each turn). The goal here is to optimize the trade-off between exploration (trying new moves) and exploitation (further testing the most promising moves to select the best one). The computer begins with a prior probability distribution for every possible move. It picks one according to this distribution, plays a random game from the corresponding position and uses the result to update the probability for this move. It then selects another possible move according to the updated distribution and so on.
  • Use of a little expert knowledge: the prior distribution used in the multi-armed bandit is calculated so as to avoid trivially stupid moves. For the opening of the game, classic starting positions are also favored. And the random games are not completely random since they avoid certain moves according to simple patterns. Nonetheless, expert knowledge is used scarcely as it really hurts the exploration part of the algorithm.
  • Parallelization: MoGo and the likes can become very good on computer clusters, since the random game part can be run on mutliple CPUs. The speedup factor is about 8x.
  • Reinforcement learning: MoGo learns during the game and the randoms play-outs that some kind of responses to his opponent’s moves are likely to have a bad result, and assigns them a lower probability.

You can learn more on MoGo in this article.

Current strength of computers

Today, Go-playing AI is able to compete with professional players on small (9×9) boards, and with strong amateurs (5d KGS) on normal boards (19×19). This is quite an achievement, but there is still a long way to go before they can face professional players on normal boards!

I am a Go player myself, even though not very good! During an internship in a Computer Science laboratory, I worked on MoGo for 2 months in 2008.
Special thanks to Arpad Rimmel for having reviewed this article. He spent 3 years working on Mogo when he was a PhD student.

Powered by Facebook Comments

About the author

Louis Chatriot

I like to build cool stuff on the web and I don't like to have to spend 5 minutes to understand whether a blogpost interests me or not. Which is why I cofounded tldr.io - a startup that wants to battle information overload by summarizing the Web. Work aside, I enjoy squash (a lot), skiing (even more) and occasionally climbing. If you play Go but are not too strong, I'm your man.