Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62716

How would you approach this problem?

$
0
0

Hey all,

Let's say I'm faced with this problem: A user has to choose among 10 options. Each option draws from a 5D random variable, let's say X = [is_black, is_sedan, is_cheap, is_fast, is_efficient] (pretty sure that's not standard notation, sorry). Each dimension is binary. If you are given 100 users, how would you model this problem in order to make good predictions on what future users would choose? This is what I've tried so far:

Combining the data from all of the observations (where one observation is one user selecting among 10 options), and unrolling each observation such that 10 rows were produced per observation, each containing X's outcome, and an additional output column y where y = 1 if the user chose the car and 0 otherwise, I split the data into a training and test set and fed the training set into a random forest, logistic regression, and SVM. The test results were pitiful.

I then figured that I was discarding important information by "unrolling" each user choice into independent rows. So then I processed the test data into chunks of 10 (since we know that a user must make a choice), and picked the highest raw scoring row as the predicted choice using SVM and logreg. This too performed poorly. For fun I also tried just using the raw counts of each possible outcome of X seen in the training data and using that as a "probability" of each option being chosen, and selecting the highest. This too did poorly.

I don't know how I would represent each observation without unrolling it as I had done to make it suitable for the algorithms I have chosen. I.e., training in a way that preserves the idea that a choice was made among each observation. My first thought is, if that were done, you would be training multiple models on tiny batches of 10, and that doesn't seem like a good idea.

Am I on the wrong track here? Does anyone have any better ideas or suggestions?

Also, if this is the wrong subreddit for this type of question I apologize. Thanks!

submitted by harfharf11
[link][1 comment]

Viewing all articles
Browse latest Browse all 62716

Trending Articles