I'm trying to understand how to model a learner, that can understand objects and the pose of the object (This is an idea for a research problem). The output I'd want would be a 'cup on it side' (90 degrees tilted), 'a cup upright' (normal position), 'a cup inverted' (180 degree tilted), 'a knife laying flat' (on a table), 'a knife horizontal' (while being used by someone). It would be okay to give output like 'a knife - 180 degrees'. I used natural language to get the point across.
I think the subtlety here would be that the algorithm learns through looking at objects and the natural pose of it. How can you guide a learner, (maybe a deep net using sparse auto encoders as an unsupervised learning method would be great!), to learn pose as well as the objects?
[link][2 comments]