0% found this document useful (0 votes)
241 views

GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We

The document provides information about Exercise 1 using the PVA dataset to build a decision tree model to classify customers who donated. It describes the dataset variables and metrics to examine from the decision tree model including the top split predictor, node statistics, variable importance, and lift chart interpretation.

Uploaded by

dinesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
241 views

GWU MSBA Analyzing Data With SAS Visual Statistics Student Homework Exercises We

The document provides information about Exercise 1 using the PVA dataset to build a decision tree model to classify customers who donated. It describes the dataset variables and metrics to examine from the decision tree model including the top split predictor, node statistics, variable importance, and lift chart interpretation.

Uploaded by

dinesh kumar
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Week 6 – Homework Exercise 1

In Exercise 1, you work with a data set named PVA. It contains data that represents charitable donations made to
an American veterans’ association. The data represent the results of a mail campaign to solicit donations. The data
set contains the following information:
 a flag to indicate respondents to the appeal and the dollar amount of their donations ( Target Gift Flag and
Target Gift Amount)
 respondents’ PVA promotion and giving history
 demographic data of the respondents

PVA Metadata Table

Measurement
Name Level Description
DemAge Interval Age
DemCluster Nominal Demographic Cluster
DemGender Nominal Gender
DemHomeOwner Binary Home Owner
DemMedHomeValue Interval Median Home Value Region
DemMedIncome Interval Median Income Region
DemPctVeterans Interval Percent Veterans Region
GiftAvg36 Interval Gift Amount Average 36 Months
GiftAvgAll Interval Gift Amount Average All Months
GiftAvgCard36 Interval Gift Amount Average Card 36 Months
GiftAvgLast Interval Gift Amount Last
GiftCnt36 Interval Gift Count 36 Months
GiftCntAll Interval Gift Count All Months
GiftCntCard36 Interval Gift Count Card 36 Months
GiftCntCardAll Interval Gift Count Card All Months
GiftTimeFirst Interval Time Since First Gift
GiftTimeLast Interval Time Since Last Gift
ID Nominal Control Number
PromCnt12 Interval Promotion Count 12 Months
PromCnt36 Interval Promotion Count 36 Months
PromCntAll Interval Promotion Count All Months
PromCntCard12 Interval Promotion Count Card 12 Months
PromCntCard36 Interval Promotion Count Card 36 Months
PromCntCardAll Interval Promotion Count Card All Months
StatusCat96NK Nominal Status Category 96NK
StatusCatStarAll Binary Status Category Star All Months
TARGET_B Binary Target Gift Flag
TARGET_D Interval Target Gift Amount

In this exercise, you continue to use the PVA data set to build a decision tree to classify those customers who
donated.
1. Building a Decision Tree in SAS Visual Statistics
a. Start Visual Analytics or start a new report. Then select and open the PVA data source.
b. Add a decision tree to the canvas.
c. If you did not already do so, in the Measure column, edit Target Gift Flag and select Category to create a
binary target variable for donations.
d. Add Target Gift Flag as the response.
e. Under Predictors, click Add. In the Add Data Items window, select all 28 predictor variables except for these
four:
 Control Number
 Demographic Cluster
 Target Gift Amount
 Target Gift Amount with Zero
(You add 24 columns.)
f. Review the decision tree.
 How many customers made donations?
There are 53273 customers made donations.
 What proportion of individuals does that represent in the target node?
There are 50% in the target node.
g. Zoom in toward the root node.
 On what predictor does the top split occur?
The top split occur for Gift Count 36 Months.
 What is the split point that determines to which branch a customer belongs?

2
The split point is 3.2
 In which branch does the majority of customers fall at this split point?
The majority of customer fall into not donate in this split point.
 How many customers were less than this value and belong to Node 2?
There are 68,607 customer less than this value and belong to Node 2.
 What proportion of the customers in Node 2 made donations?
45.17% of customers made donations.
2. Examining Additional Decision Tree Results
a. Open the summary table to examine the node statistics.
1) Examine the last column to see whether there are any 100% donor nodes. If so, which node or nodes?
Yes there are. They are node 35 and node 24
2) Click the Node Rules tab. Is Node 27 a class node or a leaf node?
Node 27 is a leaf node.
3) Use the node rules to describe the customers in Node 27 in simple business language?
Customers who are first have gift count more than 3.2 times donations and then have
promotion count more than 10.55 times. Then they have more than 64 times of since the first
gift and more than 17.8 times since the last gift.
4) Examine the Variable Importance table to see whether home ownership appears to be
an important factor when you classify customers who make a donation. Is it important?
I think home ownership is not an impoartant facotr when classify to make a donation since its
importance value is 0 and has no standard deviation.
b. Close the details table.
c. Maximize the Assessment window and change the variable importance plot to leaf statistics.
1) Other than the two 100% nodes, which node has the next highest percentage of donors?
Node 32.
2) Which node has the most customers?
Node 14 has the most customers.
3) Select the node with the most customers and change the chart back to percent
to determine the proportion of donors.
The proportion of donors is 40.28%
4) Examine the lift chart. What can you determine about the top 10% (percentile)
of the data? Explain.

3
According to the lift chart, the top 10% of the data is about 0.4 times better than the random
sample of the same size. Since the Best model exist at 2 and the model point exist at 1.46.
d. Save your report.

4
Week 6 – Homework Exercise 2

1. Which property setting can you set when modeling decision trees in SAS Visual Statistics?
a. Seed
b. Logworth criterion
c. Variable standardization
d. Reuse predictors
Answer: D Reuse predictors

(Simply copy & paste your answer from the options provided)

2. Which statement is true when creating decision trees in SAS Visual Statistics?
a. The predictor variables are restricted to measures only
b. The response variable can be a category or a measure
c. The predictor variables may include interaction terms
d. The response variable is limited to category variables only
Answer:B The response variable can be a category or a measure

3. To select useful predictors, what type of algorithm do decision trees use?


a. K-means
b. Split-search
c. Principal components
d. Bootstrapping
Answer: B Split-search

4. How do classification decision trees identify a good predictor split?


a. Entropy
b. Method of least squares

5
c. Method of maximum likelihood
d. F-test
Answer: A Entropy

5. How do decision trees identify formulate predictions?


a. Log-odds
b. If-Then-Else conditional rules
c. Sum of squared errors
d. Convergence criterion
e. Answer: A Log-odds

6. In decision trees analysis, why is pruning useful?


a. Reduces model overfitting
b. Increases the complexity of the final classifier
c. Reduces predictive accuracy
d. Eliminates the need to perform cross validation
e. Answer: A Reduces model overfitting

7. You have created a supervised segmentation using a decision tree. You would like to perform further analysis
by building a group-by predictive model using another algorithm (for example, logistic regression) for the
segments derived from your decision tree. How would you segment your data for this scenario?
a. Use the default number of bins
b. Turn on the rapid growth setting
c. Export score code
d. Derive a Leaf ID variable
e. Answer: D Derive a Leaf ID variable

6
8. Refer to the exhibit below:

Using the Lift chart above to evaluate a decision tree model where the event level is Purchase, what is the
expected performance of the model for the best 20% of cases predicted?
a. About 4 times better than a random sample of the same size
b. About 1.5 times better than a random sample of the same size
c. About 1.25 times better than a random sample of the same size
d. About 1.1 times better than a random sample of the same size
e. Answer: B About 1.5 times better than a random sample of the same size

7
9. Refer to the exhibit below:

What is the correct interpretation of the decision tree nodes that are the color green (highlighted with the red
boxes)
a. The probability of that an observation is “yes” is 9.58%
b. The probability of that an observation is “no” is 9.58%

c. The proportion of observations that are “no” is larger than the observations that are “yes”
d. The proportion of observations that are “yes” is larger than the observations that are “no”
e. Answer: D The proportion of observations that are “yes” is larger than the observations that are “no”

10. Refer to the exhibit below:

8
For this decision tree model with a binary response variable, which outcome highlights the model’s largest
number of predictive errors?
a. True positives
b. True negatives
c. False positives
d. False negatives
Answer: C False Positive

Powered by TCPDF (www.tcpdf.org)

You might also like