I am trying to train a deep neural network, the hidden layers of which are shared between multiple tasks. My networks are pre-trained with RBM generative pre-training with data from all the tasks.
Though I do get reasonable error rates on my validation set, I see that my model overfits quickly on all the tasks. The overfitting behaviour is consistent amongst all of the tasks.
I tried a blanket L2 regularization for the soft-max layer only for all the tasks. Now my validation errors behave differently for each task. Though they do not necessarily overfit, on one task the classification performance is relatively bad. Is there any literature that talks about L2 regularization for multi-task learning with neural networks?
[link][1 comment]