What observations do you have on the two classifiers you

Post New Homework

Data Mining

Coursework

Suppose that the following table of instances (cases) were recorded for an insurance company's promotions for its life assurance product. The attributes are self-explanatory, and the values in the two product promotion attributes should be read as follows: a Yes means that the individual was offered that particular promotion only if s/he would take out the insurance and No not offered the promotion.

ID Income Range Gender Age Range  Holiday Promotion  Wine Promotion  Life Insurance Take Up
1 40-50K Male 30-40 No Yes Yes
2 30-40K Female 30-40 No Yes No
3 40-50K Male 30-40 No No No
4 30-40K Male 30-40 Yes Yes Yes
5 50-60K Female 20-30 No No No
6 20-30K Female 40-50 No No No
7 30-40K Male 20-30 Yes No No
8 20-30K Male 20-30 No Yes Yes
9 30-40K Male 30-40 No Yes Yes
10 30-40K Female 30-40 No No Yes
11 40-50K Female 30-40 No No No
12 20-30K Male 20-30 No Yes Yes
13 50-60K Female 20-30 No No No
14 40-50K Male 40-50 No Yes No
15 20-30K Female 20-30 Yes Yes No
16 40-50K Female 30-40 No No No
17 50-60K Male 40-50 Yes Yes Yes
18 20-30K Female 30-40 No Yes No
19 20-30K Male 40-50 Yes Yes Yes
20 30-40K Female 20-30 Yes Yes No

Questions:

Use the ID3 decision tree induction method available in the Weka package (with the default setting) to derive a classifier (decision tree) from this set of data. The class attribute is Life Assurance Take-up.

What should be the class value for the following unseen case based on the derived tree? Justify your answer.

Income Range Gender Age Range Holiday Promotion Wine Promotion Life Insurance Take-up
40-50K Male 20-30 No Yes ?

How would you deal with such cases in general? Outline your solution algorithmically using the structure given below:

algorithm DT-based Classification
# traversing the tree to reach a leaf node N

if N's class value is null then
:
: write your pseudo code to implement your solution here
:
else
return the class value
end

A decision tree derived from data can be used not only to predict class values for unseen cases, but also to summarize data for analysis. Based on the tree derived in 1), comment on whether the company has conducted its promotion effectively.

In the default setting in Weka, there is a setting of "Cross-Validation Folds 10" in the test options. Briefly explain how Cross Validation tests a model derived from training data and why we use it for testing.

Now perform the following tests: you vary "fold" from 2 to 10, run ID3 and observe classification accuracy for each setting. You then change the test options setting to "Use training set" and run ID3 and observe classification accuracy. You can record and present these test results as a table or a bar chart. Comment on your test results: which method (cross validation or using training set) is better for testing your derived tree and why?

Use the JRip rule induction method available in the Weka package (with the default setting) to derive a classifier (classification rules) from this set of data.

What observations do you have on the two classifiers you have obtained in terms of using them for business analysis (as in 3) and for classification of an unseen case (as in 2)?

Attachment:- data.rar

Post New Homework
Captcha

Looking tutor’s service for getting help in UK studies or college assignments? Order Now