I know that for image recognition and other visual learning tasks dropout (with max col norm) is a very effective form of regularization and has the extra benefit of being simple to implement and understand.
I am interested in economic time series, where there is much more noise. Is dropout still the most competitive form of regularization or does L1/L2 perform better or even more complicated higher order methods which penalize curvature to induce smoothness in the function learnt by the network?
[link][5 comments]