Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 63697

Data preparation for doing classification using Neural Netowrks

$
0
0

I am trying to solve the Kaggle digit recognizer problem using Neural Network.

I am planning to use a very simple 3 layers Neural Network (1 input, 1 hidden and 1 output layer). The input layer will have 784 neurons (one for each pixel) and the output unit will have 10 neurons (1 for each digit).

The training dataset has 42,000 labeled digits. I am thinking of randomly splitting it into 3 datasets

  • training dataset (60%)
  • cross validation set (20%)
  • test dataset (20%)

My question is how important is label distribution in this case. Can I just randomly split the data into (60%, 20%, 20%) or do I have to make sure that every label (10 digits in this case) to be even distributed in the splits?

Thanks.

submitted by sudarmuthu
[link][2 comments]

Viewing all articles
Browse latest Browse all 63697

Trending Articles