I'm implementing Decision Trees in python, eventually to become Gradient Boosted Decision Trees.
My question is about pruning a decision tree using Weakest Link Pruning.
In Elements of Statistical Learning (ESL) p 308 (pdf p 326), it says: "we successively collapse the internal node that produces the smallest per-node increase in [error]." My understanding is that an internal node is any non-terminal node.
Then, I open Programming Collective Intelligence (p 155) and while the text therein makes it sound like we're looking at any non-terminal node, the function they define ('prune') only tests entropy reduction on nodes that are direct parents of terminal nodes.
Did I understand ESL correctly, and I should be looking at every non-terminal node, not just those that are direct parents of terminal nodes?
[link] [3 comments]