So basically it'd be like an autoencoder but the desired output is not the input, but rather another distribution of the input data.
So for problems of non-stationary data sources where the distributions may change between applications we try to map the new data onto the old data so that the old classifier should still work fairly well without needing to retrain the entire model.
Basically adding a layer to the network for new data that maps it as close as possible to the old data that was used to train the remainder of the network, so we only have to train one layer, rather than retrain the whole model when we get new data.
Does this even make sense?
[link][8 comments]