Reinforcement Learning vs. Expectimax for a 2048 AI

I stumbled across a post on Stack Overflow regarding a 2048-playing AI, which seems to have made the rounds on reddit. This answer, in particular, I found interesting. The accepted answer was decent, but the answer linked above used expectimax, an efficient 64-bit board representation, lookup tables, a transposition table, and a simpler heuristic, and managed to achieve a far higher score (routinely managing to obtain at least one 8192 tile).

I recently ported this solution to Java, and with a few small modifications, it seems to make it to 16384 fairly frequently, though it runs at about half the speed of the original C++ implementation.

So, the two glaring flaws of expectimax are the gigantic branching factor and the need for a human-programmed heuristic to evaluate any given board/state. A speed of 3 moves per second isn't very good -- it maxes out my CPU for an hour to play through a single game. And though it's managed to reach 16384, the 64-bit board representation can go up to 32768 (and technically, it is possible, however unlikely, to reach 65536 and 131072).

It seems that reinforcement learning is a better solution, if I understand correctly. The AI would train itself, approximating a heuristic based on playing the game over and over, using solely the game's official score as a benchmark, and would calculate moves very quickly. The downside being the training required.

A few questions:

Is my analysis of the advantages/disadvantages of these two approaches reasonably accurate?
How might I go about utilizing reinforcement learning? I've seen Q learning with a neural network as a function approximator being brought up in my Googling -- how exactly does that work, and would it produce a better-than-expectimax AI in a reasonable amount of time?
Is there some other, far superior approach that I'm overlooking?

Also, any insights on the expectimax approach and 2048 in general would be awesome. Monotonicity, smoothness, and free tiles are really the only heuristics that have popped up. Other than nneonneo's various optimizations, I threw in some concurrency -- but it seems that's still not enough to overcome the disadvantage of using Java. Since the game has (in this representation) nearly 2⁶⁴ valid states, and reaching 16384 takes around 10000 moves, simplifications based on symmetry and similarity (in terms of things like monotonicity) would be great.

submitted by epicwisdom
[link][comment]

Reinforcement Learning vs. Expectimax for a 2048 AI

Trending Articles

Ed Sheeran – Sapphire – Pre-Single [iTunes Plus M4A]

PURPLE RANGE LIVE AT GAL AMUNA 2013

Download: Dismanto Ft Rich Bizzy – Bwete (Prod by: Dismanto)

Nalgonda District Police Office Mobile Numbers List in Telangana State

Daru and Sharab Status for Sharabi Friends in Hindi, Punjabi

XAMJYSS VPN APP | Powered by XAMJYSSVPN | Sun TU CTC FLIP | GTM FB IG |

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

[Download MP3] Iyzeal Feat. Okpo Records –“Ekaette Ibak”

Lirik Lagu Rohani Glory Haleluya - Yochen Amos

Lady Gaga – MAYHEM (2025) [FLAC 24bit/44,1kHz]

Principal’s past includes domestic violence case

FINAL LESSON

Adolescence Paragraph

Moondru Mudichu 27-05-2016 – Polimer tv Serial

Tigers to Lions: San Beda names Kungfu Reyes as Lady Red Spikers head coach...

Pasulong o Paurong? (Col. 2:1-7)

FIFA 15 PPSSPP Android Download

Huzurabad Municipality into 30 wards

Download EFF Song –“Azania”, led by Mbuyiseni Dlozi

Arrow Flash 2 Sinhala Teledrama – Last Episode 33 – 24th April 2016