Eager Learning:

Eager learning methods construct a general and explicit description of target function based on the provided training examples.

Eager learning methods use the same approximation to the target function, which must be learned based on training examples and before input queries are observed

Lazy Learning

Lazy learning methods simply store the data and generalizing beyond these data is postponed until an explicit request is made.

Lazy learning methods can construct a different approximation to the target function for each encountered query instance.

Suitable for complex and incomplete problem domains, where a complex target function can be represented by a collection of less complex local approximations.

Eager Learning normally requires less space than Lazy Learning does

General Decision Tree learning algorithms:

Employ top-down greedy search through the space of possible solutions.

1, Perform a statistical test of each attribute to determine how well it classifies the training examples when considered alone

2.Select the attributes that perform the best and use it as the root of the tree.

3. To decide the descendant node down each branch of the root, sort the training examples according to value related to the current branch and repeat the process in steps 1 and 2.

ID3:

ID3 uses Information Gain to determine how informative an attribute.

Information Gain is based on a measure that we call Entropy: which characterizes the impurity of a collection of examples S.  (The larger the Entropy, the larger the impurity)

Advantage:

1. Every discrete classification function can be represented by a decision tree.

2. Instead of making decisions based on individual training examples( e.g. Find-S), ID3 uses statistical property of all the examples(Information Gain), therefore less sensitive to errors (compare to Find-S, Candidate-Elimination).

Disadvantages:

1. ID3 determines a single hypothesis, not a space of consistent hypotheses.

2. No back tracking in its search, therefore ID3 may overfit the training data and converge to local optimal solution that is not globally optimal.

How to stop Overfitting?

1. Stop the training process before the learner reaches the point where it perfectly classifies the training data.

2. Apply backtracking – post pruning of overfitted tree

3. Cross Validation

Find-S :

Find-S is guaranteed to output the most specific hypothesis h that best fits positive training examples.

The hypothesis h returned by Find-S will also fit negative examples as long as training examples are correct.

Candidate-Elimination:

Outputs a description of set of all hypotheses consistent with the training examples.