General Decision Tree learning algorithms:
Employ top-down greedy search through the space of possible solutions.
1, Perform a statistical test of each attribute to determine how well it classifies the training examples when considered alone
2.Select the attributes that perform the best and use it as the root of the tree.
3. To decide the descendant node down each branch of the root, sort the training examples according to value related to the current branch and repeat the process in steps 1 and 2.
ID3 uses Information Gain to determine how informative an attribute.
Information Gain is based on a measure that we call Entropy: which characterizes the impurity of a collection of examples S. (The larger the Entropy, the larger the impurity)
1. Every discrete classification function can be represented by a decision tree.
2. Instead of making decisions based on individual training examples( e.g. Find-S), ID3 uses statistical property of all the examples(Information Gain), therefore less sensitive to errors (compare to Find-S, Candidate-Elimination).
1. ID3 determines a single hypothesis, not a space of consistent hypotheses.
2. No back tracking in its search, therefore ID3 may overfit the training data and converge to local optimal solution that is not globally optimal.
How to stop Overfitting?
1. Stop the training process before the learner reaches the point where it perfectly classifies the training data.
2. Apply backtracking – post pruning of overfitted tree
3. Cross Validation