Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62845

Reinforcement Learning vs. Expectimax for a 2048 AI

$
0
0

I stumbled across a post on Stack Overflow regarding a 2048-playing AI, which seems to have made the rounds on reddit. This answer, in particular, I found interesting. The accepted answer was decent, but the answer linked above used expectimax, an efficient 64-bit board representation, lookup tables, a transposition table, and a simpler heuristic, and managed to achieve a far higher score (routinely managing to obtain at least one 8192 tile).

I recently ported this solution to Java, and with a few small modifications, it seems to make it to 16384 fairly frequently, though it runs at about half the speed of the original C++ implementation.

So, the two glaring flaws of expectimax are the gigantic branching factor and the need for a human-programmed heuristic to evaluate any given board/state. A speed of 3 moves per second isn't very good -- it maxes out my CPU for an hour to play through a single game. And though it's managed to reach 16384, the 64-bit board representation can go up to 32768 (and technically, it is possible, however unlikely, to reach 65536 and 131072).

It seems that reinforcement learning is a better solution, if I understand correctly. The AI would train itself, approximating a heuristic based on playing the game over and over, using solely the game's official score as a benchmark, and would calculate moves very quickly. The downside being the training required.

A few questions:

  • Is my analysis of the advantages/disadvantages of these two approaches reasonably accurate?
  • How might I go about utilizing reinforcement learning? I've seen Q learning with a neural network as a function approximator being brought up in my Googling -- how exactly does that work, and would it produce a better-than-expectimax AI in a reasonable amount of time?
  • Is there some other, far superior approach that I'm overlooking?

Also, any insights on the expectimax approach and 2048 in general would be awesome. Monotonicity, smoothness, and free tiles are really the only heuristics that have popped up. Other than nneonneo's various optimizations, I threw in some concurrency -- but it seems that's still not enough to overcome the disadvantage of using Java. Since the game has (in this representation) nearly 264 valid states, and reaching 16384 takes around 10000 moves, simplifications based on symmetry and similarity (in terms of things like monotonicity) would be great.

submitted by epicwisdom
[link][comment]

Viewing all articles
Browse latest Browse all 62845

Trending Articles