Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62673

Backpropagation - how much training data do I need?

$
0
0

Hello,

For the last few weeks I've been working on a backprop network and posting a few questions to this forum; I thank you for all the help so far. I've gone from concept, to buggy implementation, to something that works.

As a quick recap of my network - my network takes input/feature vectors of length 43, has 25 nodes in the hidden layer (arbitrary parameter choice I can change), and has a single output node. I want to train my network to take the 43 features and output a single value between 0 and 100.

Unfortunately, I currently only have a very small pool or training data - 162 sets of feature vectors with corresponding scores out of 100 (I have to manually label this lol! Working on creating more data though obviously). So I take this limited training set, and here's a snapshot of how well my network adapts to it:

Output value:0.90406 | Test value:0.9 (pretend to multiply all values by 100)

Output value:0.21558 | Test value:0.2

Output value:0.60394 | Test value:0.6

Output value:0.79604 | Test value:0.8

Output value:0.99846 | Test value:0.85

Output value:0.23444 | Test value:0.2

Output value:0.19609 | Test value:0.2

Output value:0.88889 | Test value:0.9

Output value:0.19178 | Test value:0.2

Output value:0.20549 | Test value:0.2

Output value:0.63248 | Test value:0.64

Output value:0.74367 | Test value:0.74

Output value:0.15477 | Test value:0.17

Output value:0.17084 | Test value:0.18

Output value:0.21143 | Test value:0.19

Output value:0.16179 | Test value:0.17

Output value:0.081413 | Test value:0.18

Output value:0.18287 | Test value:0.19

Output value:0.19118 | Test value:0.17

Output value:0.20018 | Test value:0.18

Output value:0.19222 | Test value:0.19

Output value:0.20719 | Test value:0.2

Output value:0.18718 | Test value:0.2

Output value:0.18064 | Test value:0.2

Output value:0.20925 | Test value:0.2

Output value:0.20731 | Test value:0.2

Output value:0.19914 | Test value:0.2

Output value:0.6033 | Test value:0.6

Output value:0.63723 | Test value:0.64

Output value:0.77831 | Test value:0.78

Output value:0.23468 | Test value:0.2

Output value:0.87713 | Test value:0.9

Output value:0.23822 | Test value:0.2

Output value:0.18954 | Test value:0.15

Output value:0.19912 | Test value:0.2

At first I'm like, "wow this is sick!" The results are much, much better than when I originally tried gradient descent on its own. Like, this is too good to be true. Hmm, maybe it is. So I decide to try something - use the same test/target values, but create 162 completely random feature vectors.

Uh oh - my network was able to fit the random training data even better than my actual training data! In fact, it fit the random data perfectly. Shit:

Output value:0.92 | Test value:0.92

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.2 | Test value:0.2

Output value:0.62 | Test value:0.62

Output value:0.7 | Test value:0.7

Output value:0.77 | Test value:0.77

Now I'm thinking one of two possibilities:

1) Because I have so few training samples (only 162), my 3-layer network of 43->25->1 is able to over-fit the data with all its weights.

2) My original feature vectors are absolutely worthless, and just as good as inputting plain garbage. These feature vectors I hand-coded based on what I researched would be appropriate to my problem domain.

What do you guys think is going on, and will I only know once I have more training data? Given the topology of my network, any idea how much data I'll actually need?

Cheers.

submitted by bvcxxcvb
[link][2 comments]

Viewing all articles
Browse latest Browse all 62673

Trending Articles