Ooooook. This is embarassingly trivial for most of you. I however need help parsing what's happening below:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=33)
This seems to pop up frequently in the scikit-learn documentation. Why would you do this? What I think is happening is that the train_test_split method is creating two training sets and two test sets. Is that correct? I'm familiar with the notion of having a training set, validation set and test set. But in this case it seems we're building two sets of training & validation sets. Why would you not just do bootstrapping or a k-fold run?
[link][comment]