Excel Guide For CAES9821 Part 1 - Data Analysis Tool and Correlation
Excel Guide For CAES9821 Part 1 - Data Analysis Tool and Correlation
Please note that instructions below are for the Windows version of Microsoft Excel. If you have the Mac
version the commands may be different and may be in different menus and sidebars.
1
3. Select Analysis ToolPak (and Analysis ToolPak VBA if desired). Then click OK
2
Now you are read to start work on the practice case. Follow the steps below to clean the data set.
1. Download the file Sample Regression Case Data File and open it.
2. Before doing any analysis, we need to clean the data set as some of the colleges have missing information. To do this we will first make a copy of the
original data set.
1. Click on the A cell. This will
highlight everything in
column A. Now hold and
drag across column A to
column M.
3
4. It should look like this when you have selected all the
columns.
4
7. Click on the plus icon to create a new
worksheet. The default name will be Sheet2.
5
3. We will now clean the data set. Some of the colleges have missing information (Median SAT, Median ACT, % with need who get grants) and these
cannot be used in the regression analysis. These colleges need to be deleted from the data set. Find all the universities with missing information (NA)
and delete them from the data set.
1. Find all the universities with missing information (NA)
and delete them from the data set.
6
3. When you have deleted the colleges with missing data (should be 13 colleges)
the data should look like this. There should be 714 colleges in all. When you
deleted the colleges the values in the second column (rank) did not change.
4. We will now delete the non-numerical data as this cannot be used in regression. It’s good practice to make another copy of the data in case we wish to
go back to the full, cleaned data set later. Therefore, we will create a new worksheet.
7
2. Click on the A cell. This will
highlight everything in
column A. Now hold and
drag across column A to
column M.
3. It should look like this when you have selected all the
columns.
8
4. Right click and select copy (copy the whole dataset).
9
7. Rename the new worksheet as “Cleaned Data Set Numeric.”
10
We will do a Correlation Test first to see how the variables relate to each other and whether this relationship may cause any problems.
1. Click on the Data Tab on the menu bar at the top of the screen then click on Data Analysis (top right of screen)
2. The Data Analysis window will appear. Scroll down, select Correlation and click OK.
11
3. In the regression window select Labels in First Row and make sure Grouped by Columns and New Worksheet Ply are selected.
6. Quick select columns A to H (click on cell A and drag across to H) and then click on the arrow icon in the Regression box.
12
7. The Correlation window should look like this. Click on OK to perform the calculation.
8. You will be directed to the new worksheet which should look like this:
13
9. First rename the new worksheet as shown below and drag it the right (click, hold, drag).
10. Tidy up the data. Double click on the border between columns to widen the columns to more easily view the data.
14
2. The data should now look like this.
11. Now the data is ready for interpretation. Think about the questions below to help you interpret the data:
Note: You may wish to refer to Statistics in Plain English Chapter 13 Regression (Urden, 2016) to help you answer the above questions.
When you have interpreted the data, you can watch the video Part 1 Correlation Test to check your answers.
15