Quantcast
Channel: Machine Learning
Viewing all articles
Browse latest Browse all 62858

This seems to good to be true, but I think I've found a way to make the input attributes in a dataset statistically independent, making any dataset meet the assumptions of naive bayes, and it's much simpler than a bayesian network learner - tell me I'm wrong...

$
0
0

I wanted to get some preliminary feedback about this before spending (and possibly wasting) a few days implementing it.

As you know, a Naive Bayes Classifier makes the assumption that all input attributes are statistically independent (meaning that given the value of one attribute, you can't predict anything about the values of the other attributes). This is almost never true, but in many situations Naive Bayes works reasonably well despite this.

The typical solution is to use a Bayesian network learner which captures the interdependencies between attributes, but this is far more complicated than Naive Bayes.

I think I've thought of an alternate approach using a technique from economics for removing "selection bias" from a dataset.

Let's say we have 4 nominal input attributes, A, B, C, and D, and an output attribute Z. We don't know the relationships between the input attributes, but they are probably somewhat dependent on each other.

My proposed approach is to effectively "filter" the interdependence out of the input attributes. How?

Let's take A and B first. If A and B were independent, then knowledge of A's value would not affect the probabilities of the various values that B might take.

By looking at the data we can see the impact that A has on B. For example, we might see that if A is "dog", then the likelihood of B being "house" is 0.3, but if A is "cat", then the likelihood of B being "house" is 0.4.

We can view this as there being a selection bias for the value of B, and economics gives us a well-understood way to remove this bias called Heckman correction.

While the theory behind it is more complicated, applying this correction is very simple. We take the probability of B having it's current value given A's current value - P(B|A), and we weight that sample by 1/P(B|A) (setting a maximum weight of, say, 20 - to guard against very small values for P(B|A) screwing things up).

Note that these weights only apply when calculating probabilities for the attribute B, so the weight is associated with this specific attribute, not with the entire sample (which is more common).

So now in our dataset we have weighted our attribute B such that it is independent from A. Next we want to do the same thing for the attribute C, but in this case we need to weight its samples by the probability that C will have it's value given A and B's values, or 1/P(C|A,B). Since A and our corrected B are now statistically independent, we can safely use a naive bayes classifier with A and B as inputs and C as the out to determine this weight for each sample.

And then once we've got weights for all the C attributes, we can repeat this for attribute D, assigning weights to each of its samples using naive bayes with A, B, and C as inputs, and D as the output.

Finally, we can use Naive Bayes to predict our output attribute Z using A, B, C, and D as inputs - and we are now statistically justified in doing so because we know that A, B, C, and D are now independent. The result, I would hope, would be superior predictive performance.

This seems a bit too good to be true though, where have I screwed up?

submitted by sanity
[link] [3 comments]

Viewing all articles
Browse latest Browse all 62858

Trending Articles