Hi, I'm looking for implementation of Decision Tree algorithm which is
- very scalable,
- supports classification / regression,
- customizable (for example selection of masure - entropy based / Chi-square Statistic / ...),
- in C++ / Java
- open source & free
It will be used on supercomputer with very large data.
Now I'm observing
Can You suggest any other implementation of DT algorithm? (No Mahout/Hadoop and idally with some references / real-word use cases.)
EDIT:
Supervisor about size of data: "For massive datasets, we remark that our basic requirement is to efficiently handle datasets of at least hundreds of thousands patterns in a higher than 10-dimension feature space. In other words we must be able to ingest data files higher than hundreds of MB (TB is the final goal when survey projects will prompt observed data)."
[link] [18 comments]