I'm having trouble finding a good resource for this type of problem, so I thought I'd ask you guys. I'm already familiar with vanilla value function learning when the agent is told what state it is in all the time.
Here is the setup I am interested in. At each time step, the agent receives a noisy and incomplete reading from its environment, and a real valued reward. It is up to the agent to construct an internal representation of the environment, so that it can select actions that maximize the rate that it receives reward. In order to build a useful model of the environment, the agent must be able to put together information it received from many time steps in the past.
This seems like a problem that would be encountered in most real life reinforcement learning problems (not all the information relevant to your decisions is a current percept; that's why you have explicit memories), so I'm sure there is plenty of research on it; I just don't know terminology.
[link][4 comments]