I'm working on a general planning system. It observes a sliding window of world states and it's own past actions. I apply a stacked de-noising auto-encoder to the whole input to decompose it into a few variables, then add a final set of weights on top that lead to a single node which codes for desirable/undesirable. I fine tune from this node whenever information about the desirability of a situation is available. (Like at the end of a game)
To decide an action, I take the sliding window, fill in random choices of action, and use the SDA to estimate its desirability. Then I take the greatest one.
I need a simple experiment to determine whether this system has a plan, or is engaging in "planning". For example, in a situation where short term sacrifice can lead to greater long term payout, and the term is shorter than the sliding windows, I would expect the system to perform that sacrifice.
What kind of task can I use here?
Any other criticism you have is also welcome. Thanks ML
[link][comment]