Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62811

Handling missing values in real-time production ML model?

$
0
0

I'm building a Random Forest classifier in R, that makes predictions based on a number of unordered categorical/factor variables. The training set doesn't have any missing values/NAs.

Unfortunately, the real-time data, when the model is deployed in production, will have NAs from time to time. When testing in R, the model doesn't make a prediction if a row of data is passed with missing values for certain columns (it throws an <NA>).

My question is - how would I go about handling the missing values in production? Barring any sort of imputation, would explicitly modelling them, i.e. declaring them as an extra category for each variable make any sort of sense?

EDIT: Spelling.

submitted by srkiboy83
[link][4 comments]

Viewing all articles
Browse latest Browse all 62811

Trending Articles