Understanding Q-learning in Neural networks

Hey all, I've been struggling to learn how to apply Q-learning to ANN's. I understand that they work mostly by using MLP feed forward neural nets using gradient descent back propagation. My problem is understanding the right way to use the Q-values I get to update the neural network.

Take for instance the mountain car problem, it is continuous states with 3 actions

Car_position = [-1.2 0.6] Car_velocity= [-0.07 0.07] Possible actions =[Rev, Neutral(do nothing), Fwd] the car starts every episode in state -0.5 position and 0.0 velocity

Now the idea is to create a neural network to replace the Q-table that I would normally have right? Therefor a neural network with 2 inputs(real numbers for position and velocity), a hidden layer of nodes( 5-25 or so) and 3 output nodes corresponding to the actions seems like a good idea.

Is this the right process now:

Run the network( Feed forward the state -0.5, 0.0 ) to get 3 q values, one for each action . These are Q-values for state s (the current state)
Choose an action a using E-greedy, either pick the highest Q-value or random
Simulate the Mountain Car one step and obtain a reward and new state s' from the executed action
Run the network with state s' to get 3 new Q-values for s'
Calculate QTarget = reward + gamma* Max Q-value for s'
The Target pattern for the update of the weights is then either 0;0;QTarget or 0;QTarget or QTarget;0;0 since we don't know how good the Q-values are of the actions we did not take, and we want to move the Q-value of s corresponding to action taken
set s = s' and repeat the process until # of learning episodes elapsed

I'm using Matlab with NN toolbox to create, init and update the weights. So therefore I use the newff with sigmoid in the hidden and linear in the output.

Updating is done using the net=train(net,s,Targets) function? The parameter s is a matrix like so [-0.5; 0.0]. I selected the traingdm as the training function

Thanks

submitted by SevrenBG
[link][1 comment]

Understanding Q-learning in Neural networks

Trending Articles

Bath man appears in court charged with attempted murder of a man...

MACLEAN, Allan

Black Angus Grilled Artichokes

Practice Sheet of Right form of verbs for HSC Students

Police blotter for Jan. 12

99 God Status for Whatsapp, Facebook

Rajasthan Board 12th Science Result 2018 name wise- RBSE 12th commerce result...

Notorious Naushad of Ippa gang nabbed

Child Kidnapping: Amy McNeil was kidnapped on her way to school by 5 adults;...

Sonible Smartlimit v1.1.5-R2R

NCERT Solutions for Class 9th Sanskrit Chapter 3 पाथेयम्

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Arrow Flash 2 – Sinhala Dubbed – Episode 23 – 20th March 2016

[GET] AI Traffic Goldmine

[E² Plugin] HDF-Radio

Universal Multi-Patch v1.3 By RADIXX11

IWAN – Thanks and Praise ( Throw Back Thursday )

RONALD P SONDERGAARD Arrested by Miami-Dade County Corrections on Mar 03, 2017

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

HSSC Excise & Taxation Inspector Result 2017 Scorecard/ Category Wise Merit List