0% found this document useful (0 votes)
50 views3 pages

Trees and Neural Networks in Salary Prediction

The document outlines an assignment focused on tree-based methods and neural networks using the Hitters dataset. It includes tasks for data preparation, fitting regression and classification trees, applying bagging, random forests, and boosting, as well as fitting a neural network and comparing its performance with previous models. The assignment emphasizes model evaluation through test MSE and encourages exploration of regularization effects on the neural network.

Uploaded by

Takeshi Castle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views3 pages

Trees and Neural Networks in Salary Prediction

The document outlines an assignment focused on tree-based methods and neural networks using the Hitters dataset. It includes tasks for data preparation, fitting regression and classification trees, applying bagging, random forests, and boosting, as well as fitting a neural network and comparing its performance with previous models. The assignment emphasizes model evaluation through test MSE and encourages exploration of regularization effects on the neural network.

Uploaded by

Takeshi Castle
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

EE353 Assignment 4

Date: 1-Nov-2025
Coding Assignment: Trees and Neural Networks

Trees
1. The goal of this exercise is to explore a variety of tree-based regression and classification
methods using the Hitters dataset from the ISLP package. We will predict both the
quantitative and qualitative aspects of player salaries using regression trees, bagging,
random forests, and boosting.
1. Data Preparation.
(a) Remove all observations for which the Salary variable is missing.
(b) Create a new variable HighSalary, defined as Yes if Salary exceeds the median
salary and No otherwise.
(c) Split the data into appropriate training and test sets.
(d) For regression tasks, use log(Salary) as the response. For classification tasks,
use HighSalary as the response.
2. Regression Tree.
(a) Fit a regression tree predicting log(Salary) using the training data.
(b) Plot the tree and interpret the main splits.
(c) Compute and report the test MSE.
(d) Use cross-validation to determine the optimal tree size, and prune the tree
accordingly.
(e) Report and compare the test MSE before and after pruning.
3. Classification Tree.
(a) Fit a classification tree predicting HighSalary.
(b) Report the training and test error rates, and display a confusion matrix for the
test data.
(c) Plot the tree and discuss the key predictors.
(d) Perform cross-validation to find the optimal tree size, and prune the tree if
appropriate.
(e) Compare the classification accuracy between the pruned and unpruned trees.
4. Bagging and Random Forests.
(a) Apply bagging to predict log(Salary) using the training data. Report the test
MSE and display variable importance values.
(b) Fit a random forest model for the same prediction problem. Experiment with

different values of m, the number of variables considered at each split (e.g., p,
p/2, and p).
(c) Report the test MSE for each case and discuss how m affects performance.
(d) Plot and interpret the variable importance measures from the random forest.
5. Boosting.
(a) Perform boosting on the training set with 1000 trees for a range of shrinkage
parameters λ (e.g., from 0.001 to 0.5).
(b) Produce plots of training and test MSE versus λ.
(c) Report the test MSE for the best-performing model.
(d) Identify the most important predictors in the boosted model.
(e) Compare the boosting test MSE to those obtained from the regression tree,
bagging, and random forest models.
6. Comparison with Linear Methods.
(a) Fit a multiple linear regression model and a ridge or lasso regression model
(from Chapters 3 and 6) predicting log(Salary).
(b) Report their test MSE values.
(c) Compare the performance of all models — regression tree, pruned tree, bagging,
random forest, boosting, and linear models — and summarize your findings in
a short paragraph.

Neural Network
2. In this exercise, you will fit a single-layer neural network to the Hitters dataset and
compare its predictive performance with the models developed in the previous exercise.

1. Data Preparation.
(a) Use the same training and test sets created in the previous exercise.
(b) Remove all missing salary observations and use log(Salary) as the quantitative
response variable.
(c) Standardize all numeric predictors so that each has mean zero and standard
deviation one.
2. Neural Network Architecture.
(a) Construct a feed-forward neural network with a single hidden layer.
(b) Let the number of hidden units h take values in {1, 3, 5, 10, 20}.
(c) Use the ReLU activation function in the hidden layer and a linear activation in
the output layer.
(d) Train the network using the training data to predict log(Salary).
(e) Use stochastic gradient descent (SGD) with an appropriate learning rate and
either early stopping or a fixed number of epochs (e.g., 50).
3. Model Selection and Evaluation.
(a) Compute the training and test MSE for each value of h.
(b) Plot the test MSE as a function of the number of hidden units.
(c) Report the test MSE corresponding to the best-performing network.
4. Comparison with Previous Models.
(a) Compare the neural network’s test MSE with the results obtained from:
• the regression tree and pruned tree,
• bagging and random forest,
• boosting,
• and linear methods (OLS, ridge, or lasso).
(b) Summarize your observations in a short paragraph. Discuss whether the neural
network provides any improvement in predictive accuracy or captures nonlinear
patterns missed by the linear models.
5. Regularization
(a) Explore the effect of adding an ℓ2 regularization term (weight decay) to the
network.
(b) Report how regularization affects the test MSE and the stability of the model.

You might also like