Quantcast
Viewing all articles
Browse latest Browse all 62673

K-nearest neighbor question

I'm trying to predict how much insurance an employee will potentially buy based on employees who are already covered, using only the age and salary. On a previous data set, I found that the length of employment had the biggest predictive impact. I am using the k-NN algorithm in SPSS Modeler and using the default settings (3-5 clusters). I have the salary and age for both covered and uncovered employees and am using the salary and age of the covered to predict the potential amount of coverage that will be purchased for the uncovered group. The problem I am having is that the way the data is distributed, it is skewed toward the lower end of the pay scale and there are thousands of people who are more or less making the same salary.

In another model I created using a different data set, I had the tenure of employment which provided much better results and the predictions looked more accurate as opposed to the results I am getting now, such as predicting a 19 year old who makes $25000 a year will buy an insurance policy that will cover their income 4x (100K). In the other data, I took the predicted amounts tthat were less than the emplyoees salary and tagged them accordingly as I feel these people are unlikely to buy insurance because of who they're 'closet' too, and wanted to use the predicted amounts that were equal or greater than to target.

Am I going about this the right way? or should I be using a different algorithm?

submitted by watersign
[link][7 comments]

Viewing all articles
Browse latest Browse all 62673

Trending Articles