Descriptive Statistics Using Microsoft Excel
Descriptive Statistics Using Microsoft Excel
These tutorials briefly explain the use and interpretation of standard statistical analysis techniques. The
examples include how-to instructions for Excel. Although there are different version of Excel in use, these should
work about the same for most recent versions. They also assume that you have installed the Excel Analysis Pak
which is free and comes with Excel (Go to Tools, Addins... if it is not already installed in your version of Excel.)
For this example, we will look at the data set called EXAMPLE.XLS. The first few records are shown here:
1. In Excel, select Tools/Data Analysis/Descriptive Statistics. (If the Data Analysis option is
not on your Tools menu, you must first install it using Tools/Add ins)
2. Select the input range for the AGE variable. In this case it is $B$2:$B$51.
3. Be sure to select the check boxes Summary Statistics and Confidence level for mean (95% is
okay). The output created is shown here:
Column1
Mean 10.46
Standard Error 0.343107052
Median 11
Mode 12
Standard Deviation 2.42613323
Sample Variance 5.886122449
Kurtosis -0.261061479
Skewness -0.511921947
Range 11
Minimum 4
Maximum 15
Sum 523
Count 50
Confidence
Level(95.0%) 0.689499422
Information you should notice includes:
1. Search for outliers: Look at the Minimum and Maximum values to see if these values fall
within your expected range for these data. If a value is unexpectedly small or large, you
should examine your original data to see if it was miscoded. If there are corrections that need
to be made, make them before continuing. If you have values that are unexpectedly large are
small, but are actual values, it may indicate that your data are not normally distributed. In that
case you may consider using nonparametric procedures in further analyses. It may also
indicate that the mean is not the best value to report to describe the central tendency of this
data set.
2. Symmetry: Another measure that helps you decide normality is Skewness and Kurtosis. The
Skewness measure indicates the level of non-symmetry. If the distribution of the data are
symmetric then skewness will be close to 0 (zero). The further from 0, the more skewed the
data. A negative value indicates a skew to the left. How do you tell if the skewness is large
enough to case concern. Excel doesn�t give you this value, but a measure of the standard
error of skewness can be calculated as =SQRT(6/N) or =SQRT(6/50) which is 0.346. If the
skewness is more than twice this amount, then it indicates that the distribution of the data is
non-symmetric. In this case 0.346 * 2 = 0.69. The skewness reported by Excel is -0.512, so
the data can be assumed to be fairly symmetric (although somewhat marginally so.)
However, this does NOT indicate that the data are normally distributed.
3. Kurtosis is a measure of the peakedness of the data. Again, for normally distributed data the
kurtosis is 0 (zero). As with skewness, if the value of kurtosis is too big or too small, there is
concern about the normality of the distribution. In this case, a rough formula for the standard
error for kurtosis is =SQRT(24/N) = 0.692. Twice this amount is 1.39. Since the value of
kurtosis falls within two standard errors (-0.26) the data may be considered to meet the
criteria for normality by this measure. These measures of skewness and kurtosis are one
method of examining the distribution of the data. However, they are not definitive in
concluding normality. You should also examine a graph (histogram) of the data and consider
performing other tests for normality such as the Shapiro-Wilk or Kolmogorov-Smirnov test
(not provided by Excel).
4. Estimate of central tendency: For normally distributed data the mean (arithmetic average)
is the typical value to use in a report or journal article. However, this point estimate is of
limited value without some estimate of the varibility of the data. Therefore you should at least
report three values � the mean, the standard error of the mean, and the sample size. In this
case, those values are 10.5, 0.34 and 50. (Typically report means to one more decimal place
than what was measured and the standard error to two decimal places beyond the unit
measure.) Many journals also prefer that you report a confidence limit on the mean. This is
calculated (estimated) as MEAN + or � t(.05,N-1) * SEM. The value of t is reported by Excel as
the confidence level = 0.6895. Thus, the lower bound for the 95 % CI is given as = 10.46 �
0.6895 = 9.77 and the upper limit � 10.46 + 0.6895 = 11.15. The median is another measure
of central tendency and is usually reported when the data are not normally distributed. The
mode, or the most frequent value, is a third measure of central tendency. In the case of the
median or mode, the range is often given as a measure of variability, although a better
measure is the interquartile range (not reported by Excel).
This CV value can be used when comparing two samples that have different means and
standard deviations. When the mean is close to 0, the CV value becomes of little use.
7. Visualizing your data: It is always a good idea to examine your data visually to understand
what�s going on. To produce a histogram of the AGE variable, select Tools/Data
Analysis/Histogram. Select the range for the AGE variable, $B$2:$B$51, and check the
option Chart output. This produces the following table and chart:
4 1
5.57142
9 0
7.14285
7 6
8.71428
6 4
10.2857
1 11
11.8571
4 8
13.4285
7 17
More 3
This is not the best looking histogram because Excel selects its own bin sizes (class intervals)
on the X-axis, which in this case are less than desirable. You can improve on the plot by
creating a list of bin sizes such as in the table below:
Bin
Sizes
5.00
7.00
9.00
11.00
13.00
15.00
Redo the histogram, this time indicating the bin sizes as the �Bin Range.� This produces
the following histogram:
This plot is much cleaner looking and suitable for reporting. It shows that the data are
visually not symmetric with more values appearing to the right of the mean (10.46) than we
would expect in normally distributed data.
This tutorial shows ways for you to examine a variable for normality, symmetry, and to
visually inspect the distribution of your data. These are important beginning exercises you
should perform on your data before using it any other analysis.