I'm building a Random Forest classifier in R, that makes predictions based on a number of unordered categorical/factor variables. The training set doesn't have any missing values/NAs.
Unfortunately, the real-time data, when the model is deployed in production, will have NAs from time to time. When testing in R, the model doesn't make a prediction if a row of data is passed with missing values for certain columns (it throws an <NA>).
My question is - how would I go about handling the missing values in production? Barring any sort of imputation, would explicitly modelling them, i.e. declaring them as an extra category for each variable make any sort of sense?
EDIT: Spelling.
[link][4 comments]