Excel Handbook
Excel Handbook
Data Analysis
Table of Contents
Introduction .................................................................................................... 1
Throughout this handbook, a common example is used: data representing the results from six
surveys completed by fictional participants of a fictional training program. A copy of the
completed surveys can be found in Appendix I. The reader may find it helpful to review the
surveys before continuing with the rest of the handbook. In fact, it might be beneficial to pull
out the six surveys and refer to them periodically while reviewing the handbook.
The individual conducting the data analysis is referred to in this handbook as the “analyst.” This
person may be a program staff member, volunteer, board member or other stakeholder willing to
accomplish this task. There is no job description for this analyst. He or she needs only to have a
basic understanding of Microsoft Excel, know how to perform calculations using the contents of
multiple cells, and be familiar with formulas. Reminders about using Excel are found in text
boxes throughout the handbook.
Good luck!
Excel for Data Analysis was written by National Research Center, Inc.
3005 30th Street, Boulder, Colorado 80301
Phone: 303-444-7863 Fax: 303-444-1145 www.n-r-c.com
Unique IDs
Before beginning the data entry, it is advisable to put a unique identifier on each survey or data
form. This will allow the analyst to keep track of his/her progress, and will also make it easier to
track down and set straight any data entry errors. This “identifier” is not one that actually
associates or identifies the survey with a particular person; rather, it is only to make it easier to
find a specific survey at a later date. The surveys do not need to be in any particular order, just
begin at the top of the stack with 1, and number consecutively.
Each survey will then be entered into one row; the first survey in row 2 (ID #1), the second
survey in row 3, and so on.
Question #1 (shown below) from the example survey represents a single-response, closed-ended
question.
When entering and analyzing data, it is easiest to work with numbers. To do this, a number is
assigned to each possible response option:
“1 to 2” = 1,
“3 to 4” = 2,
“5” = 3, and
“6 or more” = 4.
2) How did you hear about this training? (Please check all that apply.)
Neighborhood newsletter
Bulletin boards in community buildings
Flyers
Your child’s school
Word of mouth
Other
There are two ways the data could be entered from a question of this type. In the first method, a
number is assigned to each response, similar to a single-response question. However, more than
one column is assigned to the question. The number of columns assigned should be as many as
the highest number of answers the analyst
believes that the respondent may give; if
necessary, assign as many columns as there
are possible responses (in case a respondent
checks every box). In the example at left, 3
columns were assigned to question #2, and
the answers entered as shown.
6) Do you have any other comments you would like to make about this training?
________________________________________________________________________
________________________________________________________________________
Depending on the type of open-ended question asked, the analyst may or may not wish to enter
these responses into the dataset at the same time as the other questions are entered. These
questions could be entered later into an appendix for a report, or they could be read and assigned
“codes;” that is, like answers could be grouped into categories. Each category or code could be
assigned a number, and these codes entered into the dataset in a manner similar to the examples
shown above.
For this fictional survey, the answers to Question #6 were deemed short enough to enter verbatim
into the dataset, as shown in the example below:
The answers were entered into the dataset as written in by respondents, as shown below, but then
codes were assigned: 1=Latino/a; 2=Asian; 3=White/Caucasian.
Appendix II shows the codebook for the fictional survey being used as an example in this
handbook. The ID is in column A (shown with a circle around it), question #1 is in column B,
question #2, using the first version of multiple-response data entry, is in columns C through E,
while question #2 using the second version of multiple-response data entry is in columns F
through K (in this example, the “others” were ignored), the three parts of question #3 are in
columns L, M and N, and so on. This codebook also shows the numeric equivalents assigned to
each question response.
It is a good idea to hang on to this codebook. It will serve as a customized guide in data entry,
and in the analysis of the data once the dataset has been created. The example below shows the
entered data for the surveys shown in Appendix I.
(Note: the columns for the open-ended questions were shrunk to allow all the columns to show.)
Reminder: Formulas
Formulas are used to perform calculations within a spreadsheet. To insert a formula, as opposed to a
number or text, type an equals sign (“=”) in the cell where the calculation is to be performed, and then
type in the rest of the formula. A formula can perform mathematical calculations or execute a wide variety
of functions (see below for more on functions). To add or subtract, use the plus (+) or minus (-) symbol.
To multiply, use an asterisk (*) and to divide use a slash (/). Use parentheses as necessary to indicate
the desired order of operations.
For example, if the analyst wanted to know how many seconds there were in 3 hours, he or she could
type in the formula: =3*60*60. The result displayed in the cell would be 10,800.
There might have been a cell somewhere on the page that had a value of “3” to indicate three hours; for
the sake of an example, this cell is T21. To know how many seconds that represented, use the same
formula as above, but exchange the “3” for the cell reference: =T21*60*60. If the number of hours in cell
T21 changed, the result of the formula would also change.
Calculating an Average
Calculating the average of a range of cells is a fairly simple procedure within Excel, and
appropriate for certain types of data. For example, in the fictional survey for our training
program, one of the questions asks respondents to report their annual household income. The
average annual income of participants could be calculated and reported.
The first step is to count how many respondents gave each response. There is a function within
Excel that will help automate this step: “COUNTIF.” To use this function, specify two items:
- What range of cells contains the answers to the question of interest, and
- Which particular answer should be counted (“the criterion”).
The function is set up as: =COUNTIF(range of cells, criterion). To know how many people
attended the training program just one or two times, the analyst would want to count how many
times “1” (the numeric assignment for question #1 to the response “1 to 2”) was entered as the
answer to question #1. The data for question #1 are in column B, and specifically in rows 2
through 7. The formula to enter to find out how many respondents said they attended one or two
sessions would be:
=COUNTIF(B2:B7,1)
The results can be seen in the table below in cell B13. The formula is shown to the right in cell C13.
To get a count of the number of responses to each of the other possible answers, use the same
formula, but change the criteria each time. (See the formulas in cells C14, C15, and C16.)
To determine the proportion of people giving that answer, the contents of cell B16 would need to
be divided by cell B11. As shown below, those results are displayed in cell B22. The formulas
showing the formulas for calculating the proportion giving each answer to question #1 are also
shown.
The analyst may then wish to also calculate the proportion of youth served who are 15-17 years old. If
the contents of cell C2 were copied to cell C3, the formula would look like this: =B3/B6. This is because
in Excel the cell references in this formula are “relative” references; that is, Excel has assumed that
because in cell C2 the calculated number was derived by dividing the number in the same row and one
column to the left by the number three rows below and one column to the left, the same thing should
happen in the cell to which the formula is copied. However, cell B6 is blank, so an invalid number would
be calculated in cell C3 using this formula. This can be fixed by changing the formula after it has been
copied, so that the denominator refers to B5. But, if the formula is then copied to cell C4, the
denominator would again have to be manually changed in the formula to refer to the correct cell that
contains the total number of youth served. If this manual change was not made, the formulas in
column C would look like the formulas in column D in the table below.
If, however, an “absolute” reference was used to refer to the row that contains the total number of youth
served, when the formula was copied, the denominator would always refer to row 5. The dollar sign ($) is
used to indicate an absolute reference. In this example, it is only used for the row designation, not for the
column designation. It can be used for both the row and column designation, or only one or the other.
Excel defaults to assuming that all cell references are relative, unless the change is made manually.
Knowing how to use relative and absolute references can greatly speed up creation of spreadsheets in
Excel.
If the data have been entered such using the first approach described, where a numeric
assignment is made for each possible response, but more than one column is designated for entry
of the results (as in columns D, E and F in the table below), then the counts and proportions can
be calculated in a manner quite similar to that of an single-response question. The change would
be in the definition of the range of cells to include in the count. Instead of covering only one
column, it would cover multiple columns. In this example, the number of people who said they
heard of the program through the neighborhood newsletter would be determined using the
formula:
=COUNTIF(D2:F7,1)
Calculating the percent of respondents who heard of the program through the neighborhood
newsletter would also be changed slightly. Instead of dividing the number of respondents giving
a specific answer by the sum of the cells F13 through F17 (which would be the total number of
responses, not respondents answering the question), the denominator is the total number of
respondents answering the question.
To determine this, the number of valid answers entered in column D would need to be examined.
This can be done using the COUNT function. This formula is not shown in the table below, but
would be entered in cell D11 as follows:
=COUNT(D2:F7)
This function counts the number of non-blank answers in the range of cells specified. In this
case, every respondent gave at least one answer, so the total is 6, the same as the number of
returned surveys. This same formula (with the correct cell range specification) was used in
cells E11 and F11. The numbers displayed there designate the number of people who gave two
or more answers (4 people, see cell E11) or three answers (1 person, see cell F11).
It should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.
First, to get the total number of respondents who gave an answer, column H needs to be
appropriately analyzed. In this instance, a “1” was entered if a respondent gave no answer to the
question, and a “2” was entered if a respondent gave at least one answer. The formula in
cell H11 (not shown in the table below) was =COUNTIF($H$2:$H$7,2), to count the number of
valid answers to question #2. This formula was copied to cells I11, J11, K11, L11 and M11.
To determine the number of people who indicated each potential source of familiarity with the
training, the number of “1” responses in each column was counted, using the COUNTIF
function. The formula for cell M13 (the number of respondents indicating they heard of the
program by word of mouth) is shown in cell N13. A similar formula was used for each of the
other responses.
Next, to determine the proportion of respondents each of those counts represented, the counts
were divided by the number of valid responses to question #2. As shown in cell M19, 33% of
respondents reported they had heard of the training by word of mouth. The formula used to
make that calculation is shown in cell N19. A similar formula was used for each of the other
responses.
Again, it should be noted when reporting the percentages to a multiple response question that the
percents will add to more than 100%, as respondents can give more than one answer.
One idea is to create an “annotated instrument;” that is, typing the results into a blank
questionnaire.1 Most evaluation forms or surveys have been created using word processing
software such as Word or WordPerfect, and thus are well-suited to this approach. A new file
should be created from the electronic version of the survey. The check boxes can then be
replaced with the proportion of respondents giving each answer. For example:
Staff can write a cover memo or report to accompany the annotated instrument that explains the
methods used to obtain the data and interprets the results.
1
The term “annotated instrument” is one created by and used by staff at National Research Center, Inc. It is NOT a
commonly used evaluation term, but one that we think is descriptive.
A useful first step before creating a pivot table is to name the range of cells that will be used for
the analyses. This range of cells should include the first row with the variable names.
In general, when the named range of cells will be used for creating pivot tables, it is a good idea to name
the range “Database.” This is the default name used by Excel in the pivot table wizard. The “Define
Name” dialogue box above shows that the name “Database” has been typed in. The field labeled “Refers
to:” shows that Database will refer to the cells starting at A1 and going to W7 in the worksheet labeled
“Data Entry.” These are the cells that contain the data entered for the fictional survey.
Once a range of cells has been defined, pivot tables can be created from those data. It is easiest
to create the pivot tables on another worksheet within the workbook.
To create a pivot table that displays the frequency of training attendances, the button q1 (“How
many of the training sessions did you attend?”) would be dragged into the row area, so that the
values in q1 will be listed vertically as rows. A field is also needed for the data section. It does
not really matter what button is dragged into the data section, as it will be used simply as a
counter. However, it should be a field that has no missing data; the ID field is ideal for this
situation. As shown above, the ID field was dragged into the data area. Usually by default the
field in the data area will be shown as a “Count.”
If a different summary is desired, double-click the button, and a dialogue box displaying various
options will be displayed.
The example to the right shows the PivotTable layout and resulting table to perform a
crosstabulation of the results to question #5 “How would you rate the overall quality of this
training?” by the gender of the respondent. (Of course, crosstabulations are recommended with
larger datasets than that created for these
examples, with sufficient number of cases
within each subgroup examined.)
2) How did you hear about this training? (Please check all that apply.)
33% Neighborhood newsletter 50% Your child’s school
17% Bulletin boards in community buildings 33% Word of mouth
50% Flyers 0% Other
4) Rate the extent to which you agree or disagree with each of the following statements.
Strongly Strongly
Disagree Disagree Agree Agree
I would strongly recommend this training for my friend............................. 0% 20% 60% 20%
This training will help improve the quality of like for my family................. 0% 17% 50% 33%
6) Do you have any other comments you would like to make about this training?
• I think we spent too much time reviewing the background information.
• I had a lot of fun. I thought Angela was great.
• This was great! I will definitely apply what I learned at work and at home!