I'm putting together a big data mining project and it's time to break out the big guns instead of just using R. Cloudera's CDH looks pretty good because the little pieces of the Hadoop pie (Pig, Oozie, what-have-you) are already packaged & compatability tested with each other.
Thoughts? What do you guys use?
[link][2 comments]