Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62625

Help Me Understand K-Fold Validation

$
0
0

So we have (let's say) 1 million rows of data to fit a model to. We're going to fit a Lasso Regression model, so we need to set the regularization parameter. We do K-fold validation with... 10(?) folds to set lambda. Then we use the entire(?) data to set the coefficients.

But then we want to report an MSE for this model. Should we have set aside 10% of the data to begin with, and keep it for the reporting? What's the name for that part of the dataset?

Or should I just stick to cross-validation with a 60/30/10 split on training/validation/reporting?

TL;DR: 4 questions

1) How do you set the # (k) of folds?

2) Do you fit the non-regularization parameters with the full data, or do you average the parameters from your k folds?

3) How do you report the MSE for your entire model? What do we call the data set aside for that part?

4) Should I just stick with cross-validation?

submitted by towerofterror
[link][3 comments]

Viewing all articles
Browse latest Browse all 62625

Trending Articles