Need help getting dropout to work

I've been building my own NN library simply because I find this is the easiest way to learn about things. My approach has been to add every feature under the sun (dropout, momentum, adaptive learning, regularization, GPU, etc...). I've been using/testing it primarily on a Kaggle competition where there is a relatively small amount of labelled data (15k).

So far things work quite well for standard training but I've never successfully trained a network using dropout. With the latter I either get:

1 - Stuck at around 60-70% training/validation error.

2 - Numerical instabilities (NaN's)

3 - Stuck in a bias-dominated regime.

By (3) I mean the network will always select a particular output (from a softmax layer) for all validation cases. The actual choice of output will change each epoch but the network somehow gets stuck being dominated by bias.

I've tried a whole bunch of things:

Increase learning rate: leads either to (2) or (3). this happens already at smallish rates like 0.5.
Reclu or channelout layers: problem happens for both
Momentum: usually makes things worse
Fan-in regulator: makes some weights vanishingly small (1e-100) and tends to lead to bias domination (but avoids numerical instabilities)
L2 regulator: doesn't do too much.
Adaptive per-weights learning rate: tends to lead to more numerical instabilities. this really speeds up normal learning but with dropout it seems to exacerbate (2) and (3).
Random amount of dropout: between e.g. 0.3-0.5 still gives all the same problems.

I realize that the best thing to do would probably be to retreat to a more standard dataset like MNIST (which I will probably do) but the thing is that without dropout I can acheive 80% accuracy on the validation set (and 100% on the training set) so its clear that there's enough data to do better.

To give more details I'm using a relatively large network (4-5 layers of size 200 with inputs of size ~50 and output ~10) using either Reclu or channelout and without any pretraining.

I refresh the choice of dropout units every minibatch (size 10-150) and I also do 0.2 dropout on the input layer.

I do intend to go back to MNIST and compare to some published results but if anyone can provide some thoughts or inspirations I would really appreciate it!

submitted by spurious_recollectio
[link][8 comments]

Need help getting dropout to work

Trending Articles

Police confirm man stabbed to death in Selsdon was Andrew David Else of Croydon

Prison officer charged!

Moondru Mudichu 20-07-2016 – Polimer tv Serial

James Carpinello

Anthony Wahome Biography, Family, Wife and Children

Who’s been sentenced at Northampton Magistrates’ Court

Reply: Betrayal at House on the Hill:: Rules:: Re: Haunt #6 - Spoilers Within

Jamani mm nauliza hivi second selection za form five zinatoka lini?

Wiz Khalifa – Kush + Orange Juice 2 [iTunes Plus M4A]

(Notes & Audio) The 26 Promises of Allah to the Ummah

Angry father ordered to compensate daughter’s male friend

FLASHBACK WITH SIRASA FM AT GALGAMUWA 2022

Jessica Carradero Lopez Arrested by Miami-Dade County Corrections on Dec 17,...

Download: Rich Bizzy -Panono Ukwenda (Cover)

Sniper: Ghost Warrior 3: Трейнер/Trainer (+17) [1.0 - 1.02] {FLiNG}

IN COURT: Full list of people sentenced at Northampton Magistrates’ Court

GERVASE JOHN

Gordian S01e01-73 [H264 - Ita Jap Ac3 - SoftSub Ita]

Ndebele names

Hyper-V replication "Enabling Replication Failed"