Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62829

Strategies for automatic feature generation on heterogeneous data set

$
0
0

I work with public records data to predict consumer credit risk and fraud. My datasets have roughly 1,000 features of mixed data types. Following the maxim that a standard learning algorithm applied to quality features will outperform a cutting-edge algorithm applied to mediocre features I am trying to engineer some features automatically.

I am nearly always trying to predict a binary, good/bad outcome. My features are comprised of binary flags, ordinal, nominal, discrete and continuous fields in roughly equal proportions.

My question is what might be some effective strategies for creating new features on such a dataset? Most of the methods I read about are applied to mostly homogeneous datasets. Should I break my dataset into sets of related features?

Right now, I am playing around with naive bayes classifiers as well as gradient boosting classifiers. My idea is to send small batches of related features through these algorithms to create compound features. Is this a sound approach?

submitted by Zelazny7
[link][2 comments]

Viewing all articles
Browse latest Browse all 62829

Trending Articles