Experience with temporal difference learning or TD-Gammon?

According to this tutorial, I implemented the algorithm of temporal difference learning on Connect Four, because the rules of Connect Four are easy to implement. But my neural net is not being trained properly. The original paper by Tesauro could be found here.

Here's my setup and the problem:

The input is a vector of length 43, i.e. status of the 6-times-7 board and the next player. The output layer has 3 neurons standing for the chance of "the first player will win", "this game will be a draw", and "the second player will win" respectively. After randomly initialize the weight, this net will play a game with itself, then learn from this game.

According to the original learning algorithm, we will just focus on decreasing the difference between two plys, and decrease the difference between the prediction and the true outcome of this game only in the end. The last step indicates the proper direction to train the weight. However, After about 100 games, I find out that the output values look like [0.99,0.05,0.99], i.e., in order to decrease the difference between two plys, the neural net chose to fix its output and ignore the last outcome. I thought this was caused by insufficient training, but the output will still fix to the [1,0,1] even after thousands of games.

I think there could be some reasons:

I am doing it in a totally wrong way
I chose a wrong game to play(connect four)
I need to tune parameters
The tutorial is misleading
I need to increase the penalty of the last step
Or , in the paper Why did TD-Gammon Work? by Jordan B. Pollack & Alan D. Blair:
It (Tesauro's TD-Gammon) has not led to similar impressive breakthroughs in temporal difference learning for other applications or even other games.

Does anybody has experiences on this topic? Thanks.

submitted by hetong_007
[link][comment]

Experience with temporal difference learning or TD-Gammon?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112