Help with TD(lambda) Reinforcement Learning algorithm for Neural Networks

My professor gave me a copy of a journal paper to help me with a personal project requiring Reinforcement Learning and I'm having trouble understanding a small part of one of the algorithms, and while he is very knowledgeable about supervised learning, he has informed me that he has never attempted Reinforcement Learning and that the extent of his ability to help me would mostly consist of theoretical explanations. Unfortunately the journal in question is behind a paywall so it's difficult to distribute, however I can provide an alternative (free) paper that deals with essentially the same equation.

As far as I can tell equation (1) is extremely similar to the function in question, or at the very least contains the portion I am confused about

(from now on anything in {} is considered subscript) The part I'm having trouble understanding is: (P{t+1} - P{t}) How is P{t+1} calculated exactly? My assumption is that P{t} is the output of the state at the current time (i.e. before the action is taken) and P{t+1} is the output of what the next time-step would be if the weights were to stay the same (i.e. the action is taken, then the resulting state is fed as the input to the network and P{t+1} is the resulting output) Is my assumption correct?

I've read both Journals (the actual journal I'm talking about is: http://dl.acm.org/citation.cfm?id=2298811 and the formulas are (5) and (6) in case some of you have access to journals through academic facilities etc.) and I've read through the TD (and several other) sections in: http://webdocs.cs.ualberta.ca/~sutton/book/ebook/the-book.html however it seems none of them actually explain how the prediction for {t+1} is calculated (and I assume it's probably common knowledge among RL circles hence their not explaining it).

Any help would be greatly appreciated, and if this is in the wrong sub I am deeply sorry. I don't visit this sub often.

submitted by albatrossnecklassftw
[link][comment]

Help with TD(lambda) Reinforcement Learning algorithm for Neural Networks

Trending Articles

Scuffham Amps - S-GEAR 2.6.0 VST, AAX, STANDALONE x86 x64 (R2R NO iLok2, +NO...

Practice Sheet of Right form of verbs for HSC Students

VHSE First (1st) Allotment 2025 - vhscap.kerala.gov.in

UNIVERSE LEAGUE – UNIVERSE LEAGUE – WAR (We Are Ready) – EP [iTunes Plus M4A]

City Hunter Teledrama – Episode 18 – 07th May 2016

Comment on Proposed Criteria for Identifying Predatory Conferences by Luke...

Bureau of Internal Revenue: Regional Offices (Directory)

Kendrick Lamar – Not Like Us (2024) [24Bit-88.2kHz] [PMEDIA] ⭐️

Inception 2010 Hindi Dual Audio 650MB BRRip 720p ESubs HEVC

East Hull MD admits sexual assaults after another victim comes forward

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

R. v. Sargeant, 2023 ONSC 6406 (CanLII)

Rajasthan Board 10th Result 2016 Roll No wise & Name Wise

Who’s been sentenced at Northampton Magistrates’ Court

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

Family cries out as traditional ruler allegedly abducts brother, extorts N2.5m

Long-Running Conflict In Springfield (MA) Gangland Sphere Has Manzi Family &...

Wondershare Filmora X v10.1.20.16 x64

Man arrested after fracas in flat

Man charged in ongoing Sexual Assault Investigation Derek Nyilas, 46, Faces...