This may be a a really simple question, but it has me pretty stumped.
I want to fit a Classification Tree or maybe SVM on some data to predict income, but my worry is that some of my new variables that using (e.g.Height) may be redundant with other previous variables that have already been established (Gender), and so the new variables wouldn't be meaningful/interesting because the only reason its predicting income is due to a variable we already know is important and probably more effective.
What would be the best way to see if the new variables are giving extra predictive power above and beyond the old variables?
Relatedly, is there any machine learning technique fit predictors controlling for another variable? For example, if I want to predict Income, but I want to do it in such a way where a person's gender (a known predictor) is controlled for so that the results are not driven by gender differences.
[link] [3 comments]