I'd like to take a dataset and answer the question "what's the least amount of data I need to reproduce the whole dataset with, say, 95% accuracy?" Then, I'd like to know how/how much each variable contributes to the dataset recreation. This is a little bit like what I think the eye movement/visual attention is doing, and I assume it could be useful for data reduction and/or dealing with missing values. Anyone know of such a technique?
[link][8 comments]