Hi everyone, I just got into kaggle competitions this weekend, and like everyone I started with the titanic challenge. First I used scikit-learn's randomForests ( along with other classifiers that performed worse ) for my first submissions, I did quite a lot of feature engineering with pandas before that, but I never managed to break the 77% accurarcy wall. Then I switched to R ( but still wrangling data with pandas ) and by using the party package ( and even rpart ), I got much better scores ( pas the 80% bar ). I used even the exact same numbers of trees in scikit-learn and R, but still got much better results in R. ( I repeated the experience multiple times to be sure ) How can it be ?
[link][20 comments]